## How to Create pandas DataFrames

*[Coding along with the Udemy course [Cryptocurrency Algorithmic Trading with Python and Binance](https://www.udemy.com/course/cryptocurrency-algorithmic-trading-with-python-and-binance/) by Alexander Hagman; [How to Create Pandas DataFrames: A Hands-On Guide](https://blog.udemy.com/how-to-create-pandas-dataframes-a-hands-on-guide/)]*

### Creating and loading data into Pandas DataFrames

The first steps in a project that relies on data: 

- defining the data that is needed for the project
- finding and identifying the right data source(s)
- loading the data from the data source into Pandas DataFrames
- cleaning, processing and manipulating the data

In [1]:
import pandas as pd

#### 1. Having the columns in lists ("dictionary scenario")

In [3]:
names = ["Lionel Messi", "Cristiano Ronaldo", "Neymar Junior", "Kylian Mbappe", "Manuel Neuer"]
country = ["Argentina", "Portugal", "Brazil", "France", "Germany"]
club = ["FC Barcelona", "Juventus FC", "Paris SG", "Paris SG", "FC Bayern"]
wc = [False, False, False, True, True]
height = [1.70, 1.87, 1.75, 1.78, 1.93]
goals = [51, 28, 23, 39, 0]

In [4]:
# creating a dictionary with all columns
data = {"Country": country,
   	"Club_2019": club,
   	"WC": wc,
   	"Height_m": height,
   	"Goals_2019": goals
}
data

{'Country': ['Argentina', 'Portugal', 'Brazil', 'France', 'Germany'],
 'Club_2019': ['FC Barcelona',
  'Juventus FC',
  'Paris SG',
  'Paris SG',
  'FC Bayern'],
 'WC': [False, False, False, True, True],
 'Height_m': [1.7, 1.87, 1.75, 1.78, 1.93],
 'Goals_2019': [51, 28, 23, 39, 0]}

In [6]:
# creating a DataFrame object
df = pd.DataFrame(data = data, index = names)
df

Unnamed: 0,Country,Club_2019,WC,Height_m,Goals_2019
Lionel Messi,Argentina,FC Barcelona,False,1.7,51
Cristiano Ronaldo,Portugal,Juventus FC,False,1.87,28
Neymar Junior,Brazil,Paris SG,False,1.75,23
Kylian Mbappe,France,Paris SG,True,1.78,39
Manuel Neuer,Germany,FC Bayern,True,1.93,0


In [7]:
# assigning a name for the index
df.index.name = "Name"
df

Unnamed: 0_level_0,Country,Club_2019,WC,Height_m,Goals_2019
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Lionel Messi,Argentina,FC Barcelona,False,1.7,51
Cristiano Ronaldo,Portugal,Juventus FC,False,1.87,28
Neymar Junior,Brazil,Paris SG,False,1.75,23
Kylian Mbappe,France,Paris SG,True,1.78,39
Manuel Neuer,Germany,FC Bayern,True,1.93,0


#### 2. Having the rows in lists (‘nested lists scenario’)

In [9]:
messi = ["Lionel Messi", "Argentina", "FC Barcelona", False, 1.7, 51]
ronaldo = ["Cristiano Ronaldo", "Portugal", "Juventus FC", False, 1.87, 28]
neymar = ["Neymar Junior", "Brazil", "Paris SG", False, 1.75, 23]
mbappe = ["Kylian Mbappe", "France", "Paris SG", True, 1.78, 39]
neuer = ["Manuel Neuer", "Germany", "FC Bayern", True, 1.93, 0]

In [10]:
# plus, having the desired column headers in the list “headers.”
headers = ["Name", "Country", "Club_2019", "WC", "Height_m", "Goals_2019" ]

In [11]:
# creating a list of lists
data = [messi, ronaldo, neymar, mbappe, neuer]
data

[['Lionel Messi', 'Argentina', 'FC Barcelona', False, 1.7, 51],
 ['Cristiano Ronaldo', 'Portugal', 'Juventus FC', False, 1.87, 28],
 ['Neymar Junior', 'Brazil', 'Paris SG', False, 1.75, 23],
 ['Kylian Mbappe', 'France', 'Paris SG', True, 1.78, 39],
 ['Manuel Neuer', 'Germany', 'FC Bayern', True, 1.93, 0]]

In [20]:
# creating the DataFrame object df with pd.DataFrame()
# passing the nested list “data” to the parameter data 
# defining that “headers” should be the column headers of the DataFrame with columns = headers.
players = pd.DataFrame(data=data, columns=headers)
players 

Unnamed: 0,Name,Country,Club_2019,WC,Height_m,Goals_2019
0,Lionel Messi,Argentina,FC Barcelona,False,1.7,51
1,Cristiano Ronaldo,Portugal,Juventus FC,False,1.87,28
2,Neymar Junior,Brazil,Paris SG,False,1.75,23
3,Kylian Mbappe,France,Paris SG,True,1.78,39
4,Manuel Neuer,Germany,FC Bayern,True,1.93,0


In [21]:
players.set_index("Name")

Unnamed: 0_level_0,Country,Club_2019,WC,Height_m,Goals_2019
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Lionel Messi,Argentina,FC Barcelona,False,1.7,51
Cristiano Ronaldo,Portugal,Juventus FC,False,1.87,28
Neymar Junior,Brazil,Paris SG,False,1.75,23
Kylian Mbappe,France,Paris SG,True,1.78,39
Manuel Neuer,Germany,FC Bayern,True,1.93,0


#### 3. Starting with a dictionary data that has the wrong data organization

In [22]:
# messy data
data = {"Lionel Messi": ["Argentina", "FC Barcelona", False, 1.7, 51],
    	"Cristiano Ronaldo": ["Portugal", "Juventus FC", False, 1.87, 28],
    	"Neymar Junior": ["Brazil", "Paris SG", False, 1.75, 23],
    	"Kylian Mbappe": [ "France", "Paris SG", True, 1.78, 39],
    	"Manuel Neuer": ["Germany", "FC Bayern", True, 1.93, 0]
  	}

In [23]:
# reorganize the dictionary data and create a nested list
nested_list = []
for key, value in data.items():
	value.insert(0, key)
	nested_list.append(value)
nested_list

[['Lionel Messi', 'Argentina', 'FC Barcelona', False, 1.7, 51],
 ['Cristiano Ronaldo', 'Portugal', 'Juventus FC', False, 1.87, 28],
 ['Neymar Junior', 'Brazil', 'Paris SG', False, 1.75, 23],
 ['Kylian Mbappe', 'France', 'Paris SG', True, 1.78, 39],
 ['Manuel Neuer', 'Germany', 'FC Bayern', True, 1.93, 0]]

In [24]:
# continue with scenario 2
df = pd.DataFrame(data = nested_list, columns = headers)
df.set_index("Name")
df

Unnamed: 0,Name,Country,Club_2019,WC,Height_m,Goals_2019
0,Lionel Messi,Argentina,FC Barcelona,False,1.7,51
1,Cristiano Ronaldo,Portugal,Juventus FC,False,1.87,28
2,Neymar Junior,Brazil,Paris SG,False,1.75,23
3,Kylian Mbappe,France,Paris SG,True,1.78,39
4,Manuel Neuer,Germany,FC Bayern,True,1.93,0


### How to load datasets from local files into Pandas DataFrames 

Datasets can be loaded from local files on your computer into Pandas with methods from the `pd.read_X()` family:

- Reading CSV files with pd.read_csv()
- Reading Excel files with pd.read_excel()
- Reading JSON files with pd.read_json()