# Dictionary to DataFrame (1)

Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. Sounds promising!

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.

In the exercises that follow you will be working with vehicle data from different countries. Each observation corresponds to a country and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on.

Three lists are defined in the script:

* names, containing the country names for which data is available.
* dr, a list with booleans that tells whether people drive left or right in the corresponding country.
* cpc, the number of motor vehicles per 1000 people in the corresponding country.
* Each dictionary key is a column label and each value is a list which contains the column elements.


* Import pandas as pd.
* Use the pre-defined lists to create a dictionary called my_dict. There should be three key value pairs:
* key 'country' and value names.
* key 'drives_right' and value dr.
* key 'cars_per_cap' and value cpc.
* Use pd.DataFrame() to turn your dict into a DataFrame called cars.
* Print out cars and see how beautiful it is.

In [1]:
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Import pandas as pd
import pandas as pd

# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = {'country':names, 'drives_right': dr, 'cars_per_cap':cpc}

# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)

# Print cars
print(cars)

         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45


 Notice that the columns of cars can be of different types. This was not possible with 2D NumPy arrays!

**Dictionary to DataFrame (2)**

The Python code that solves the previous exercise is included in the script. Have you noticed that the row labels (i.e. the labels for the different observations) were automatically set to integers from 0 up to 6?

To solve this a list row_labels has been created. You can use it to specify the row labels of the cars DataFrame. You do this by setting the index attribute of cars, that you can access as cars.index.


* Hit Run Code to see that, indeed, the row labels are not correctly set.
* Specify the row labels by setting cars.index equal to row_labels.
* Print out cars again and check if the row labels are correct this time.

In [11]:
import pandas as pd

# Build cars DataFrame
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
cars1 = pd.DataFrame(cars_dict)
print(cars1)

# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']

# Specify row labels of cars
cars1.index = row_labels

# Print cars again
print(cars1)

         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45
           country  drives_right  cars_per_cap
US   United States          True           809
AUS      Australia         False           731
JPN          Japan         False           588
IN           India         False            18
RU          Russia          True           200
MOR        Morocco          True            70
EG           Egypt          True            45


**CSV to DataFrame (1)**

Putting data in a dictionary and then building a DataFrame works, but it's not very efficient. What if you're dealing with millions of observations? In those cases, the data is typically available as files with a regular structure. One of those file types is the CSV file, which is short for "comma-separated values".

To import CSV data into Python as a Pandas DataFrame you can use read_csv().

Let's explore this function with the same cars data from the previous exercises. This time, however, the data is available in a CSV file, named cars.csv. It is available in your current working directory, so the path to the file is simply 'cars.csv'.


* To import CSV files you still need the pandas package: import it as pd.
* Use pd.read_csv() to import cars.csv data as a DataFrame. Store this DataFrame as cars.
* Print out cars. Does everything look OK?



In [6]:
# Import pandas as pd
import pandas as pd

# Import the cars.csv data: cars
cars = pd.read_csv("/kaggle/input/cars-dataset/used_cars_data.csv")

# Print out cars
print(cars)

      S.No.                                               Name    Location  \
0         0                             Maruti Wagon R LXI CNG      Mumbai   
1         1                   Hyundai Creta 1.6 CRDi SX Option        Pune   
2         2                                       Honda Jazz V     Chennai   
3         3                                  Maruti Ertiga VDI     Chennai   
4         4                    Audi A4 New 2.0 TDI Multitronic  Coimbatore   
...     ...                                                ...         ...   
7248   7248                  Volkswagen Vento Diesel Trendline   Hyderabad   
7249   7249                             Volkswagen Polo GT TSI      Mumbai   
7250   7250                             Nissan Micra Diesel XV     Kolkata   
7251   7251                             Volkswagen Polo GT TSI        Pune   
7252   7252  Mercedes-Benz E-Class 2009-2013 E 220 CDI Avan...       Kochi   

      Year  Kilometers_Driven Fuel_Type Transmission Owner_Type

**CSV to DataFrame (2)**

Your read_csv() call to import the CSV data didn't generate an error, but the output is not entirely what we wanted. The row labels were imported as another column without a name.

Remember index_col, an argument of read_csv(), that you can use to specify which column in the CSV file should be used as a row label? Well, that's exactly what you need here!

Python code that solves the previous exercise is already included; can you make the appropriate changes to fix the data import?


* Run the code with Run Code and assert that the first column should actually be used as row labels.
* Specify the index_col argument inside pd.read_csv(): set it to 0, so that the first column is used as row labels.
* Has the printout of cars improved now?

In [8]:
# Import pandas as pd
import pandas as pd

# Fix import by including index_col
cars = pd.read_csv('/kaggle/input/cars-dataset/used_cars_data.csv', index_col = 0)

# Print out cars
print(cars)

                                                    Name    Location  Year  \
S.No.                                                                        
0                                 Maruti Wagon R LXI CNG      Mumbai  2010   
1                       Hyundai Creta 1.6 CRDi SX Option        Pune  2015   
2                                           Honda Jazz V     Chennai  2011   
3                                      Maruti Ertiga VDI     Chennai  2012   
4                        Audi A4 New 2.0 TDI Multitronic  Coimbatore  2013   
...                                                  ...         ...   ...   
7248                   Volkswagen Vento Diesel Trendline   Hyderabad  2011   
7249                              Volkswagen Polo GT TSI      Mumbai  2015   
7250                              Nissan Micra Diesel XV     Kolkata  2012   
7251                              Volkswagen Polo GT TSI        Pune  2013   
7252   Mercedes-Benz E-Class 2009-2013 E 220 CDI Avan...       K

**Square Brackets (1)**

In the video, you saw that you can index and select Pandas DataFrames in many different ways. The simplest, but not the most powerful way, is to use square brackets.

In the sample code, the same cars data is imported from a CSV files as a Pandas DataFrame. To select only the cars_per_cap column from cars, you can use:

cars['cars_per_cap']
cars[['cars_per_cap']]

The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.


 * Use single square brackets to print out the country column of cars as a Pandas Series.
* Use double square brackets to print out the country column of cars as a Pandas DataFrame.
* Use double square brackets to print out a DataFrame with both the country and drives_right columns of cars, in this order.

In [13]:
# # Import cars data
# import pandas as pd
# cars = pd.read_csv('cars.csv', index_col = 0)

# Print out country column as Pandas Series
print(cars['Location'])

# Print out country column as Pandas DataFrame
print(cars[['Location']])

# Print out DataFrame with country and drives_right columns
print(cars[['Location', 'Mileage']])

S.No.
0           Mumbai
1             Pune
2          Chennai
3          Chennai
4       Coimbatore
           ...    
7248     Hyderabad
7249        Mumbai
7250       Kolkata
7251          Pune
7252         Kochi
Name: Location, Length: 7253, dtype: object
         Location
S.No.            
0          Mumbai
1            Pune
2         Chennai
3         Chennai
4      Coimbatore
...           ...
7248    Hyderabad
7249       Mumbai
7250      Kolkata
7251         Pune
7252        Kochi

[7253 rows x 1 columns]
         Location     Mileage
S.No.                        
0          Mumbai  26.6 km/kg
1            Pune  19.67 kmpl
2         Chennai   18.2 kmpl
3         Chennai  20.77 kmpl
4      Coimbatore   15.2 kmpl
...           ...         ...
7248    Hyderabad  20.54 kmpl
7249       Mumbai  17.21 kmpl
7250      Kolkata  23.08 kmpl
7251         Pune   17.2 kmpl
7252        Kochi   10.0 kmpl

[7253 rows x 2 columns]


**Square Brackets (2)**

Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. The following call selects the first five rows from the cars DataFrame:

cars[0:5]
The result is another DataFrame containing only the rows you specified.

Pay attention: You can only select rows using square brackets if you specify a slice, like 0:4. Also, you're using the integer indexes of the rows here, not the row labels!


* Select the first 3 observations from cars and print them out.
* Select the fourth, fifth and sixth observation, corresponding to row indexes 3, 4 and 5, and print them out.

In [15]:
# Print out first 3 observations
print(cars[0:3])

# Print out fourth, fifth and sixth observation
print(cars[3:6])

                                   Name Location  Year  Kilometers_Driven  \
S.No.                                                                       
0                Maruti Wagon R LXI CNG   Mumbai  2010              72000   
1      Hyundai Creta 1.6 CRDi SX Option     Pune  2015              41000   
2                          Honda Jazz V  Chennai  2011              46000   

      Fuel_Type Transmission Owner_Type     Mileage   Engine      Power  \
S.No.                                                                     
0           CNG       Manual      First  26.6 km/kg   998 CC  58.16 bhp   
1        Diesel       Manual      First  19.67 kmpl  1582 CC  126.2 bhp   
2        Petrol       Manual      First   18.2 kmpl  1199 CC   88.7 bhp   

       Seats  New_Price  Price  
S.No.                           
0        5.0        NaN   1.75  
1        5.0        NaN  12.50  
2        5.0  8.61 Lakh   4.50  
                                  Name    Location  Year  Kilometers_Driv

We can get interesting information, but using square brackets to do indexing is rather limited. Experiment with more advanced techniques in the following exercises

**loc and iloc (1)**

With loc and iloc you can do practically any data selection operation on DataFrames you can think of. loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.

Try out the following commands in the IPython Shell to experiment with loc and iloc to select observations. Each pair of commands here gives the same result.

cars.loc['RU']
cars.iloc[4]

cars.loc[['RU']]
cars.iloc[[4]]

cars.loc[['RU', 'AUS']]
cars.iloc[[4, 1]]
As before, code is included that imports the cars data as a Pandas DataFrame.


* Use loc or iloc to select the observation corresponding to Japan as a Series. The label of this row is JPN, the index is 2. Make sure to print the resulting Series.
* Use loc or iloc to select the observations for Australia and Egypt as a DataFrame. You can find out about the labels/indexes of these rows by inspecting cars in the IPython Shell. Make sure to print the resulting DataFrame.

In [16]:
print(cars1)

# Print out observation for Japan
print(cars1.loc['JPN'])
# cars.iloc[2]

# Print out observations for Australia and Egypt
# print(cars)
# cars.loc['AUS']
print(cars1.loc[['AUS', 'EG']])

           country  drives_right  cars_per_cap
US   United States          True           809
AUS      Australia         False           731
JPN          Japan         False           588
IN           India         False            18
RU          Russia          True           200
MOR        Morocco          True            70
EG           Egypt          True            45
country         Japan
drives_right    False
cars_per_cap      588
Name: JPN, dtype: object
       country  drives_right  cars_per_cap
AUS  Australia         False           731
EG       Egypt          True            45


In [20]:
print(cars.iloc[[2, 10]])

                   Name Location  Year  Kilometers_Driven Fuel_Type  \
S.No.                                                                 
2          Honda Jazz V  Chennai  2011              46000    Petrol   
10     Maruti Ciaz Zeta    Kochi  2018              25692    Petrol   

      Transmission Owner_Type     Mileage   Engine       Power  Seats  \
S.No.                                                                   
2           Manual      First   18.2 kmpl  1199 CC    88.7 bhp    5.0   
10          Manual      First  21.56 kmpl  1462 CC  103.25 bhp    5.0   

        New_Price  Price  
S.No.                     
2       8.61 Lakh   4.50  
10     10.65 Lakh   9.95  


**loc and iloc (2)**

loc and iloc also allow you to select both rows and columns from a DataFrame. To experiment, try out the following commands in the IPython Shell. Again, paired commands produce the same result.

cars.loc['IN', 'cars_per_cap']
cars.iloc[3, 0]

cars.loc[['IN', 'RU'], 'cars_per_cap']
cars.iloc[[3, 4], 0]

cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']]
cars.iloc[[3, 4], [0, 1]]

* Print out the drives_right value of the row corresponding to Morocco (its row label is MOR)
* Print out a sub-DataFrame, containing the observations for Russia and Morocco and the columns country and drives_right.

In [22]:
print(cars1.loc['IN', 'cars_per_cap'])
print(cars1.iloc[3, 0])

18
India


In [29]:


# Print out drives_right value of Morocco
print(cars1.loc['MOR', 'drives_right'])
print(cars1.iloc[5, 1])

# Print sub-DataFrame
print(cars1.loc[['RU', 'MOR'], ['country', 'drives_right']])

True
True
     country  drives_right
RU    Russia          True
MOR  Morocco          True


In [31]:

# Print out drives_right value of Morocco
print(cars1.iloc[[4, 5], [0,1]])

# # Print sub-DataFrame
# print(cars1.loc[['RU', 'MOR'], ['country', 'drives_right']])

     country  drives_right
RU    Russia          True
MOR  Morocco          True


Great work! .loc[] and .iloc[] are excellent tools for selecting DataFrame values by label and index. In the next exercise, you'll select entire columns using them!

In [33]:
# Print out drives_right column as Series
print(cars1.loc[:, 'drives_right'])  # This returns a Series

# Print out drives_right column as DataFrame
print(cars1.iloc[:, [2]])  # This returns a DataFrame
print(cars1.iloc[:, [1]])
# Print out cars_per_cap and drives_right as DataFrame
print(cars1.loc[:, ['cars_per_cap', 'drives_right']])  # This returns a DataFrame

US      True
AUS    False
JPN    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool
     cars_per_cap
US            809
AUS           731
JPN           588
IN             18
RU            200
MOR            70
EG             45
     drives_right
US           True
AUS         False
JPN         False
IN          False
RU           True
MOR          True
EG           True
     cars_per_cap  drives_right
US            809          True
AUS           731         False
JPN           588         False
IN             18         False
RU            200          True
MOR            70          True
EG             45          True


In [35]:
print(cars)

                                                    Name    Location  Year  \
S.No.                                                                        
0                                 Maruti Wagon R LXI CNG      Mumbai  2010   
1                       Hyundai Creta 1.6 CRDi SX Option        Pune  2015   
2                                           Honda Jazz V     Chennai  2011   
3                                      Maruti Ertiga VDI     Chennai  2012   
4                        Audi A4 New 2.0 TDI Multitronic  Coimbatore  2013   
...                                                  ...         ...   ...   
7248                   Volkswagen Vento Diesel Trendline   Hyderabad  2011   
7249                              Volkswagen Polo GT TSI      Mumbai  2015   
7250                              Nissan Micra Diesel XV     Kolkata  2012   
7251                              Volkswagen Polo GT TSI        Pune  2013   
7252   Mercedes-Benz E-Class 2009-2013 E 220 CDI Avan...       K

In [39]:
print(cars.iloc[:, 0])  # this returns a series
print(cars.iloc[:, [0]]) # this returns a df

S.No.
0                                  Maruti Wagon R LXI CNG
1                        Hyundai Creta 1.6 CRDi SX Option
2                                            Honda Jazz V
3                                       Maruti Ertiga VDI
4                         Audi A4 New 2.0 TDI Multitronic
                              ...                        
7248                    Volkswagen Vento Diesel Trendline
7249                               Volkswagen Polo GT TSI
7250                               Nissan Micra Diesel XV
7251                               Volkswagen Polo GT TSI
7252    Mercedes-Benz E-Class 2009-2013 E 220 CDI Avan...
Name: Name, Length: 7253, dtype: object
                                                    Name
S.No.                                                   
0                                 Maruti Wagon R LXI CNG
1                       Hyundai Creta 1.6 CRDi SX Option
2                                           Honda Jazz V
3                              

In [44]:
print(cars.iloc[10,:])
print(cars.iloc[[10],:])

Name                 Maruti Ciaz Zeta
Location                        Kochi
Year                             2018
Kilometers_Driven               25692
Fuel_Type                      Petrol
Transmission                   Manual
Owner_Type                      First
Mileage                    21.56 kmpl
Engine                        1462 CC
Power                      103.25 bhp
Seats                             5.0
New_Price                  10.65 Lakh
Price                            9.95
Name: 10, dtype: object
                   Name Location  Year  Kilometers_Driven Fuel_Type  \
S.No.                                                                 
10     Maruti Ciaz Zeta    Kochi  2018              25692    Petrol   

      Transmission Owner_Type     Mileage   Engine       Power  Seats  \
S.No.                                                                   
10          Manual      First  21.56 kmpl  1462 CC  103.25 bhp    5.0   

        New_Price  Price  
S.No.                

In [45]:
print(type(cars.iloc[10,:]))
print(type(cars.iloc[[10],:]))

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
