# Data Importing

A data set "best-selling-video-games.csv" is downloaded from Kaggle at [here]("https://www.kaggle.com/datasets/tayyarhussain/best-selling-video-games-of-all-time"). The following is demonstrated in this jupyter notebook:
- Import data frame from CSV.
- Convert data fram into series.
- Customize data importing.

## Import Packages

In [1]:
import numpy as np
import pandas as pd

print("Numpy version: {}.".format(np.__version__))
print("Pandas version: {}".format(pd.__version__))

Numpy version: 1.23.5.
Pandas version: 1.5.3


## Import Data Frame from CSV

Pandas provide variety of ways to import data from different resources, including plain texts, CSV files, databases, etc. One of the most commonly seen data sources is CSV files. Importing data from CSV files is introduced here.

Pandas provide `.read_csv()` function to import data from CSV files. Its basic usage is introduced below. More details are given [here]("https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html").

The entire CSV can be take in as a data frame table as follows. An additional auto-increment index column is automatically added to the left of the data frame.

In [9]:
best_selling_games_df = pd.read_csv("best-selling-video-games.csv")
best_selling_games_df

Unnamed: 0,Rank,Title,Sales,Series,Platform(s),Initial release date,Developer(s),Publisher(s)
0,1,Minecraft,238000000,Minecraft,Multi-platform,"November 18, 2011",Mojang Studios,Xbox Game Studios
1,2,Grand Theft Auto V,175000000,Grand Theft Auto,Multi-platform,"September 17, 2013",Rockstar North,Rockstar Games
2,3,Tetris (EA),100000000,Tetris,Multi-platform,"September 12, 2006",EA Mobile,Electronic Arts
3,4,Wii Sports,82900000,Wii,Wii,"November 19, 2006",Nintendo EAD,Nintendo
4,5,PUBG: Battlegrounds,75000000,PUBG Universe,Multi-platform,"December 20, 2017",PUBG Corporation,PUBG Corporation
5,6,Mario Kart 8 / Deluxe,60460000,Mario Kart,Wii U / Switch,"May 29, 2014",Nintendo EAD,Nintendo
6,7,Super Mario Bros.,58000000,Super Mario,Multi-platform,"September 13, 1985",Nintendo R&D4,Nintendo
7,8,Red Dead Redemption 2,50000000,Red Dead,Multi-platform,"October 26, 2018",Rockstar Studios,Rockstar Games
8,9,Pokémon Red / Green / Blue / Yellow,47520000,Pokémon,Multi-platform,"February 27, 1996",Game Freak,Nintendo
9,10,Terraria,44500000,,Multi-platform,"May 16, 2011",Re-Logic,Re-Logic / 505 Games


To import only selected columns of the CSV file, use

In [3]:
best_selling_games_df = pd.read_csv("best-selling-video-games.csv", usecols = ["Title", "Sales", "Publisher(s)"])
best_selling_games_df

Unnamed: 0,Title,Sales,Publisher(s)
0,Minecraft,238000000,Xbox Game Studios
1,Grand Theft Auto V,175000000,Rockstar Games
2,Tetris (EA),100000000,Electronic Arts
3,Wii Sports,82900000,Nintendo
4,PUBG: Battlegrounds,75000000,PUBG Corporation
5,Mario Kart 8 / Deluxe,60460000,Nintendo
6,Super Mario Bros.,58000000,Nintendo
7,Red Dead Redemption 2,50000000,Rockstar Games
8,Pokémon Red / Green / Blue / Yellow,47520000,Nintendo
9,Terraria,44500000,Re-Logic / 505 Games


The auto-increment index column is added to the table because no index column is specified. To specify an index column, use

In [4]:
best_selling_games_df = pd.read_csv("best-selling-video-games.csv", index_col = "Title", usecols = ["Title", "Sales", "Publisher(s)"])
best_selling_games_df

Unnamed: 0_level_0,Sales,Publisher(s)
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
Minecraft,238000000,Xbox Game Studios
Grand Theft Auto V,175000000,Rockstar Games
Tetris (EA),100000000,Electronic Arts
Wii Sports,82900000,Nintendo
PUBG: Battlegrounds,75000000,PUBG Corporation
Mario Kart 8 / Deluxe,60460000,Nintendo
Super Mario Bros.,58000000,Nintendo
Red Dead Redemption 2,50000000,Rockstar Games
Pokémon Red / Green / Blue / Yellow,47520000,Nintendo
Terraria,44500000,Re-Logic / 505 Games


## Convert Data Frame into Series

Pandas series is a very import data structure in Pandas. It is the building block of data frames. Each column in a data frame is a series.

Select a column from a data frame and convert it into a series as follows. Notice that the index column from the data frame is automatically added to the series. The index column itself cannot be selected to create a series.

In [5]:
best_selling_games_df = pd.read_csv("best-selling-video-games.csv")
best_selling_games_titles = best_selling_games_df["Title"]
best_selling_games_titles

0                                            Minecraft
1                                   Grand Theft Auto V
2                                          Tetris (EA)
3                                           Wii Sports
4                                  PUBG: Battlegrounds
5                                Mario Kart 8 / Deluxe
6                                    Super Mario Bros.
7                                Red Dead Redemption 2
8                  Pokémon Red / Green / Blue / Yellow
9                                             Terraria
10                                      Wii Fit / Plus
11                                       Tetris (1989)
12                                             Pac-Man
13                       Animal Crossing: New Horizons
14                                    Human: Fall Flat
15    The Witcher 3 / Hearts of Stone / Blood and Wine
16                                      Mario Kart Wii
17                                   Wii Sports Resort
18        

Don't confuse the above with the selection of sub data frame from a data frame, for which an example is given below. Notice that when `[[]]` is used, the return becomes a data frame,  regardless of the number of colums selected.

In [6]:
best_selling_games_df = pd.read_csv("best-selling-video-games.csv")
sub_games_df = best_selling_games_df[["Title", "Sales", "Publisher(s)"]]
sub_games_df

Unnamed: 0,Title,Sales,Publisher(s)
0,Minecraft,238000000,Xbox Game Studios
1,Grand Theft Auto V,175000000,Rockstar Games
2,Tetris (EA),100000000,Electronic Arts
3,Wii Sports,82900000,Nintendo
4,PUBG: Battlegrounds,75000000,PUBG Corporation
5,Mario Kart 8 / Deluxe,60460000,Nintendo
6,Super Mario Bros.,58000000,Nintendo
7,Red Dead Redemption 2,50000000,Rockstar Games
8,Pokémon Red / Green / Blue / Yellow,47520000,Nintendo
9,Terraria,44500000,Re-Logic / 505 Games


In [7]:
sub_games_df = best_selling_games_df[["Title"]]
sub_games_df

Unnamed: 0,Title
0,Minecraft
1,Grand Theft Auto V
2,Tetris (EA)
3,Wii Sports
4,PUBG: Battlegrounds
5,Mario Kart 8 / Deluxe
6,Super Mario Bros.
7,Red Dead Redemption 2
8,Pokémon Red / Green / Blue / Yellow
9,Terraria


To convert a single-column data frame into a series, consider using

In [8]:
best_selling_games_titles = sub_games_df.squeeze('columns')
best_selling_games_titles

0                                            Minecraft
1                                   Grand Theft Auto V
2                                          Tetris (EA)
3                                           Wii Sports
4                                  PUBG: Battlegrounds
5                                Mario Kart 8 / Deluxe
6                                    Super Mario Bros.
7                                Red Dead Redemption 2
8                  Pokémon Red / Green / Blue / Yellow
9                                             Terraria
10                                      Wii Fit / Plus
11                                       Tetris (1989)
12                                             Pac-Man
13                       Animal Crossing: New Horizons
14                                    Human: Fall Flat
15    The Witcher 3 / Hearts of Stone / Blood and Wine
16                                      Mario Kart Wii
17                                   Wii Sports Resort
18        