# Dictionary to Dataframe

Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. Sounds promising! </br>

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.</br> </br>

In the exercises that follow you will be working with vehicle data from different countries. Each observation corresponds to a country and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on. </br> </br>

Three lists are defined in the script: </br>

names, containing the country names for which data is available. </br>
dr, a list with booleans that tells whether people drive left or right in the corresponding country. </br>
cpc, the number of motor vehicles per 1000 people in the corresponding country. </br>
Each dictionary key is a column label and each value is a list which contains the column elements. </br>

In [2]:
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

In [1]:
# Import pandas as pd
import pandas as pd

In [3]:
# Create dictionary my_dict with three key:value pairs: my_dict

my_dict = {
    'country' : names,
    'drives_right' : dr,
    'cars_per_cap' : cpc
}

In [4]:
# Build a DataFrame cars from my_dict: cars

cars = pd.DataFrame(my_dict)

In [6]:
# Print cars

print(cars)

         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45


In [10]:
# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']

# Specify row labels of cars
cars.index = row_labels

In [11]:
print(cars)

           country  drives_right  cars_per_cap
US   United States          True           809
AUS      Australia         False           731
JPN          Japan         False           588
IN           India         False            18
RU          Russia          True           200
MOR        Morocco          True            70
EG           Egypt          True            45


**_Importing CSV File_**

In [16]:
cars2 = pd.read_csv('D:/documents/xlsx/cars_pandas.csv', index_col = 0)

In [17]:

print(cars2)

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JAP           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True


In [20]:
# Select only 1st column of the dataframe
# use double square brackets
cars2[['cars_per_cap']]

Unnamed: 0,cars_per_cap
US,809
AUS,731
JAP,588
IN,18
RU,200
MOR,70
EG,45


In [21]:
# Select 2 columns of the dataframe

cars[['country', 'drives_right']]

Unnamed: 0,country,drives_right
US,United States,True
AUS,Australia,False
JPN,Japan,False
IN,India,False
RU,Russia,True
MOR,Morocco,True
EG,Egypt,True


In [22]:
type(cars)

pandas.core.frame.DataFrame

In [27]:
# Select 1st three rows of entire dataframe
cars[0:3]

Unnamed: 0,country,drives_right,cars_per_cap
US,United States,True,809
AUS,Australia,False,731
JPN,Japan,False,588


**_Row Access loc_**

In [30]:
# Accessing individual rows
cars.loc[["JPN", "IN", "EG"]]

Unnamed: 0,country,drives_right,cars_per_cap
JPN,Japan,False,588
IN,India,False,18
EG,Egypt,True,45


In [31]:
# Accessing individual rows and selected columnsb
cars.loc[["JPN", "IN", "EG"], ["country", "drives_right"]]

Unnamed: 0,country,drives_right
JPN,Japan,False
IN,India,False
EG,Egypt,True


In [33]:
# Accessing all rows of two selected columns using loc function
cars.loc[:, ["country", "drives_right"]]

Unnamed: 0,country,drives_right
US,United States,True
AUS,Australia,False
JPN,Japan,False
IN,India,False
RU,Russia,True
MOR,Morocco,True
EG,Egypt,True


**_Row access with iloc_**

In [34]:
cars.iloc[[0,1,2]]

Unnamed: 0,country,drives_right,cars_per_cap
US,United States,True,809
AUS,Australia,False,731
JPN,Japan,False,588


In [36]:
cars.iloc[[1, 4, 3]]

Unnamed: 0,country,drives_right,cars_per_cap
AUS,Australia,False,731
RU,Russia,True,200
IN,India,False,18


In [37]:
# First access the rows, second access to the columns
cars.iloc[[0, 1, 2], [0, 2]]

Unnamed: 0,country,cars_per_cap
US,United States,809
AUS,Australia,731
JPN,Japan,588


In [44]:
# Print the fourth, fifth and sixth rows
cars[3:6]

Unnamed: 0,country,drives_right,cars_per_cap
IN,India,False,18
RU,Russia,True,200
MOR,Morocco,True,70


In [45]:
cars.iloc[:, [0, 2]]

Unnamed: 0,country,cars_per_cap
US,United States,809
AUS,Australia,731
JPN,Japan,588
IN,India,18
RU,Russia,200
MOR,Morocco,70
EG,Egypt,45


### Loop Over DataFrame(1)

Iterating over a Pandas DataFrame is typically done with the **iterrows()** method. Used in a for loop, every observation is iterated over and on every iteration the row label and actual row contents are available:

In [None]:
for lab, row in cars.iterrows():
    print(lab)
    print(row)

In [12]:
# Adapt for loop
for lab, row in cars.iterrows() :
    print(lab + ": " + str(row['cars_per_cap']))

US: 809
AUS: 731
JPN: 588
IN: 18
RU: 200
MOR: 70
EG: 45


In [15]:
# Code for loop that adds COUNTRY column

# Code for loop that adds COUNTRY column
for lab, row in cars.iterrows() :
    cars.loc[lab, "COUNTRY"] = row["country"].upper()
    
# Print cars
print(cars)

           country  drives_right  cars_per_cap        COUNTRY
US   United States          True           809  UNITED STATES
AUS      Australia         False           731      AUSTRALIA
JPN          Japan         False           588          JAPAN
IN           India         False            18          INDIA
RU          Russia          True           200         RUSSIA
MOR        Morocco          True            70        MOROCCO
EG           Egypt          True            45          EGYPT


In [17]:
# Use .apply(str.upper)

cars["COUNTRY"] = cars["country"].apply(str.upper)
print(cars)

           country  drives_right  cars_per_cap        COUNTRY
US   United States          True           809  UNITED STATES
AUS      Australia         False           731      AUSTRALIA
JPN          Japan         False           588          JAPAN
IN           India         False            18          INDIA
RU          Russia          True           200         RUSSIA
MOR        Morocco          True            70        MOROCCO
EG           Egypt          True            45          EGYPT
