# Using Pandas

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 200)
## to make it possible to display multiple output inside one cell 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

<b>load the data from the vehicles.csv file into pandas data frame

In [2]:
cars_df = pd.read_csv("vehicles.csv")
cars_df

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


First exploration of the dataset:

- How many observations does it have?
- Look at all the columns: do you understand what they mean?
- Look at the raw data: do you see anything weird?
- Look at the data types: are they the expected ones for the information the column contains?

In [35]:
# How many observations does it have?
## Count the rows of the first column
count_row = cars_df[cars_df.columns[0]].count()
print(f"It has {count_row} observations.")

It has 35952 observations.


In [42]:
# Look at all the columns: do you understand what they mean?
## Use .columns method to show column name
## NOTE: the data type of cars_df.columns is Index
cars_df.columns
print("The column names are straightforward to understand, except for some specific technical terms related to motor vehicle parameters.")

Index(['Make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
       'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
       'Fuel Barrels/Year', 'City MPG', 'Highway MPG', 'Combined MPG',
       'CO2 Emission Grams/Mile', 'Fuel Cost/Year'],
      dtype='object')

The column names are straightforward to understand, except for some specific technical terms related to motor vehicle parameters.


In [5]:
# Look at the raw data: do you see anything weird?
print("No, I don't find anything weird in the data itself, however, it would be better if collumn names were standardized to lowercase and if spaces were avoided.")

No, I don't find anything weird in the data itself, however, it would be better if collumn names were standardized to lowercase and if spaces were avoided.


In [6]:
# Look at the data types: are they the expected ones for the information the column contains?
print("Some are not. Cylinders should be int64 since this column only contain integers.")

Some are not. Cylinders should be int64 since this column only contain integers.


### Cleaning and wrangling data

- Some car brand names refer to the same brand. Replace all brand names that contain the word "Dutton" for simply "Dutton". If you find similar examples, clean their names too. Use `loc` with boolean indexing.

- Convert CO2 Emissions from Grams/Mile to Grams/Km

- Create a binary column that solely indicates if the transmission of a car is automatic or manual. Use `pandas.Series.str.startswith` and .

- convert MPG columns to km_per_liter

Note:
<br>Converting Grams/Mile to Grams/Km

1 Mile = 1.60934 Km

Converting Gallons to Liters

1 Gallon = 3.78541 Liters



In [7]:
# Replace all brand names that contain the word "Dutton" for simply "Dutton". 
## .str.find() returns -1 if no searched string not found
## cars_df['Make'].str.find("Dutton") != -1 serves as a boolean filter to locate Make names containing "Dutton"
cars_df.loc[cars_df['Make'].str.find("Dutton") != -1, "Make"] = "Dutton"


In [8]:
# Convert CO2 Emissions from Grams/Mile to Grams/Km
cars_df_km = cars_df.rename(columns = {"CO2 Emission Grams/Mile": "CO2 Emission Grams/Km"})
cars_df_km["CO2 Emission Grams/Km"] = cars_df_km["CO2 Emission Grams/Km"] / 1.60934
cars_df_km

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Km,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,324.831736,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,424.779962,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,345.133719,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,424.779962,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,345.133719,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,151.614948,1100
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,150.993575,1100
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,151.614948,1100
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,152.857693,1100


In [9]:
# Create a binary column that solely indicates if the transmission of a car is automatic or manual.
# Using .str.find()
cars_df_km["Is Auto"] = cars_df_km["Transmission"].str.find("Auto") != -1
cars_df_km

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Km,Fuel Cost/Year,Is Auto
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,324.831736,1950,True
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,424.779962,2550,True
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,345.133719,2100,True
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,424.779962,2550,True
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,345.133719,2550,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,151.614948,1100,True
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,150.993575,1100,True
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,151.614948,1100,True
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,152.857693,1100,True


In [10]:
# Create a binary column that solely indicates if the transmission of a car is automatic or manual.
# Using .str.startswith
cars_df_km["Is Auto 2"] = cars_df_km["Transmission"].str.startswith('Auto')
cars_df_km

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Km,Fuel Cost/Year,Is Auto,Is Auto 2
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,324.831736,1950,True,True
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,424.779962,2550,True,True
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,345.133719,2100,True,True
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,424.779962,2550,True,True
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,345.133719,2550,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,151.614948,1100,True,True
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,150.993575,1100,True,True
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,151.614948,1100,True,True
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,152.857693,1100,True,True


In [11]:
# Convert MPG columns to km_per_liter.
cars_df_KPL = cars_df_km.rename(columns = {"City MPG": "City KPL", "Highway MPG": "Highway KPL", "Combined MPG": "Combined KPL"})
cars_df_KPL

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City KPL,Highway KPL,Combined KPL,CO2 Emission Grams/Km,Fuel Cost/Year,Is Auto,Is Auto 2
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,324.831736,1950,True,True
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,424.779962,2550,True,True
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,345.133719,2100,True,True
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,424.779962,2550,True,True
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,345.133719,2550,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,151.614948,1100,True,True
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,150.993575,1100,True,True
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,151.614948,1100,True,True
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,152.857693,1100,True,True


In [12]:
# m/s = 1/1
# km/h = 1000/360

# 1 m/s * 1/(1000/360) = 3.6 km/h

# M/G = 1/1 
# K/L = 1.6/3.8 

# 1 M/G * 1/(1.6/3.8) = (3.8/1.6)K/L

In [13]:
cars_df_KPL[["City KPL", "Highway KPL", "Combined KPL"]] *= (3.78541/1.60934)
cars_df_KPL

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City KPL,Highway KPL,Combined KPL,CO2 Emission Grams/Km,Fuel Cost/Year,Is Auto,Is Auto 2
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,42.338710,39.986560,39.986560,324.831736,1950,True,True
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,30.577957,30.577957,30.577957,424.779962,2550,True,True
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,37.634409,39.986560,37.634409,345.133719,2100,True,True
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,30.577957,30.577957,30.577957,424.779962,2550,True,True
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,32.930108,49.395162,37.634409,345.133719,2550,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,79.973119,89.381722,84.677421,151.614948,1100,True,True
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,79.973119,89.381722,84.677421,150.993575,1100,True,True
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,79.973119,89.381722,84.677421,151.614948,1100,True,True
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,79.973119,91.733872,84.677421,152.857693,1100,True,True


### Gathering insights:

- How many car makers are there? How many models? Which car maker has the most cars in the dataset?

- When were these cars made? How big is the engine of these cars?

- What's the frequency of different transmissions, drivetrains and fuel types?

- What's the car that consumes the least/most fuel?

In [14]:
#How many car makers are there?
count_maker = len(cars_df_KPL["Make"].unique())
count_model = len(cars_df_KPL["Model"].unique())
print(f"There are {count_maker} makers and {count_model} models.")

There are 125 makers and 3608 models.


In [15]:
#Which car maker has the most cars in the dataset?

top_5_makers = cars_df_KPL['Model'].groupby(cars_df_KPL['Make']).count().sort_values(ascending=False).head(5)
top_maker = top_5_makers.idxmax()

print(f"{top_maker} has the most cars in the dataset. Please refer to the following table for details.")
top_5_makers

Chevrolet has the most cars in the dataset. Please refer to the following table for details.


Make
Chevrolet    3643
Ford         2946
Dodge        2360
GMC          2347
Toyota       1836
Name: Model, dtype: int64

In [16]:
# When were these cars made?
top_maker_filter = cars_df_KPL["Make"] == top_maker
top_maker_years = cars_df_KPL.loc[top_maker_filter, ["Make","Model","Year"]].sort_values(by='Year')
top_maker_years_unique = top_maker_years["Year"].unique()
top_maker_years_max = np.max(top_maker_years_unique)
top_maker_years_min = np.min(top_maker_years_unique)
print(f"These cars were made over the period from {top_maker_years_min} to {top_maker_years_max}. Please refer to the table below for details.")
top_maker_years

These cars were made over the period from 1984 to 2017. Please refer to the table below for details.


Unnamed: 0,Make,Model,Year
4745,Chevrolet,C20 Pickup 2WD,1984
7426,Chevrolet,Suburban K10 4WD,1984
6319,Chevrolet,K20 Pickup 4WD,1984
6318,Chevrolet,K20 Pickup 4WD,1984
6317,Chevrolet,K20 Pickup 4WD,1984
...,...,...,...
4970,Chevrolet,Camaro,2017
6955,Chevrolet,SS,2017
6954,Chevrolet,SS,2017
5725,Chevrolet,Equinox FWD,2017


In [17]:
# How big is the engine of these cars?
top_maker_engine_size = cars_df_KPL.loc[top_maker_filter, ["Make","Model","Engine Displacement"]].sort_values(by='Engine Displacement', ascending = False)
top_maker_engine_size_unique = top_maker_engine_size["Engine Displacement"].unique()
top_maker_engine_size_max = np.max(top_maker_engine_size_unique)
top_maker_engine_size_min = np.min(top_maker_engine_size_unique)
print(f"Engine size of these cars ranging from {top_maker_engine_size_min} L to {top_maker_engine_size_max} L. Please refer to the table below for details.")
top_maker_engine_size

Engine size of these cars ranging from 1.0 L to 7.4 L. Please refer to the table below for details.


Unnamed: 0,Make,Model,Engine Displacement
4699,Chevrolet,C1500 Pickup 2WD,7.4
4663,Chevrolet,C1500 Pickup 2WD,7.4
4677,Chevrolet,C1500 Pickup 2WD,7.4
4689,Chevrolet,C1500 Pickup 2WD,7.4
5551,Chevrolet,Corvette,7.0
...,...,...,...
7336,Chevrolet,Sprint Plus,1.0
7337,Chevrolet,Sprint Plus,1.0
6513,Chevrolet,Metro,1.0
6510,Chevrolet,Metro,1.0


In [18]:
# What's the frequency of different transmissions, drivetrains and fuel types?

transmissions_counts = cars_df_KPL["Transmission"].value_counts()
drivetrain_counts = cars_df_KPL["Drivetrain"].value_counts()
fuel_type_counts = cars_df_KPL["Fuel Type"].value_counts()

print(f"The three tables below show the frequency of different transmissions, drivetrains and fuel types respectively.")
print("\n" + "Table 1: Frequency of Different Transmissions")
print(transmissions_counts)
print("\n" + "Table 2: Frequency of Different Drivetrains")
print(drivetrain_counts)
print("\n" + "Table 3: Frequency of Different Fuel Types")
print(fuel_type_counts)

The three tables below show the frequency of different transmissions, drivetrains and fuel types respectively.

Table 1: Frequency of Different Transmissions
Automatic 4-spd                     10585
Manual 5-spd                         7787
Automatic (S6)                       2631
Automatic 3-spd                      2597
Manual 6-spd                         2423
Automatic 5-spd                      2171
Automatic 6-spd                      1432
Manual 4-spd                         1306
Automatic (S8)                        960
Automatic (S5)                        822
Automatic (variable gear ratios)      675
Automatic 7-spd                       662
Automatic (S7)                        261
Auto(AM-S7)                           256
Automatic 8-spd                       243
Automatic (S4)                        229
Auto(AM7)                             157
Auto(AV-S6)                           145
Auto(AM6)                             110
Auto(AM-S6)                            92
Au

In [39]:
# What's the car that consumes the least/most fuel?
fuel_cost_desc_rank = cars_df_KPL.sort_values(by='Fuel Cost/Year')
fuel_cost_asc_rank = cars_df_KPL.sort_values(by='Fuel Cost/Year', ascending = False)

fuel_cost_max = cars_df_KPL["Fuel Cost/Year"].max()
fuel_cost_min = cars_df_KPL["Fuel Cost/Year"].min()

fuel_cost_max_cars = cars_df_KPL.loc[cars_df_KPL["Fuel Cost/Year"] == fuel_cost_max]
fuel_cost_min_cars = cars_df_KPL.loc[cars_df_KPL["Fuel Cost/Year"] == fuel_cost_min]

fuel_cost_max_make = fuel_cost_max_cars["Make"].unique()[0]
fuel_cost_max_model = fuel_cost_max_cars["Model"].unique()[0]
fuel_cost_min_make = fuel_cost_min_cars["Make"].unique()[0]
fuel_cost_min_model = fuel_cost_min_cars["Model"].unique()[0]

# fuel_cost_max_make = fuel_cost_asc_rank.reset_index().loc[0,"Make"]
# fuel_cost_max_model = fuel_cost_asc_rank.reset_index().loc[0,"Model"]
# fuel_cost_min_make = fuel_cost_desc_rank.reset_index().loc[0,"Make"]
# fuel_cost_min_model = fuel_cost_desc_rank.reset_index().loc[0,"Model"]

print(f"{fuel_cost_max_make} {fuel_cost_max_model} has the highest annual fuel consumption at {fuel_cost_max} gallons, whereas {fuel_cost_min_make} {fuel_cost_min_model} has the lowest annual fuel consumption at {fuel_cost_min} gallons. Please refer to the table below for details.")
fuel_cost_asc_rank

Lamborghini Countach has the highest annual fuel consumption at 5800 gallons, whereas Toyota Prius Eco has the lowest annual fuel consumption at 600 gallons. Please refer to the table below for details.


Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City KPL,Highway KPL,Combined KPL,CO2 Emission Grams/Km,Fuel Cost/Year,Is Auto,Is Auto 2
20898,Lamborghini,Countach,1990,5.2,12.0,Manual 5-spd,Rear-Wheel Drive,Two Seaters,Premium,47.087143,14.112903,23.521506,16.465054,788.877073,5800,False,False
20896,Lamborghini,Countach,1988,5.2,12.0,Manual 5-spd,Rear-Wheel Drive,Two Seaters,Premium,47.087143,14.112903,23.521506,16.465054,788.877073,5800,False,False
20897,Lamborghini,Countach,1989,5.2,12.0,Manual 5-spd,Rear-Wheel Drive,Two Seaters,Premium,47.087143,14.112903,23.521506,16.465054,788.877073,5800,False,False
20894,Lamborghini,Countach,1986,5.2,12.0,Manual 5-spd,Rear-Wheel Drive,Two Seaters,Premium,47.087143,14.112903,23.521506,16.465054,788.877073,5800,False,False
20895,Lamborghini,Countach,1987,5.2,12.0,Manual 5-spd,Rear-Wheel Drive,Two Seaters,Premium,47.087143,14.112903,23.521506,16.465054,788.877073,5800,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33272,Toyota,Prius,2011,1.8,4.0,Automatic (variable gear ratios),Front-Wheel Drive,Midsize Cars,Regular,6.592200,119.959679,112.903227,117.607529,110.442790,650,True,True
17522,Honda,Insight,2000,1.0,3.0,Manual 5-spd,Front-Wheel Drive,Two Seaters,Regular,6.219057,115.255378,143.481185,124.663980,104.191312,650,False,False
17532,Honda,Insight,2005,1.0,3.0,Manual 5-spd,Front-Wheel Drive,Two Seaters,Regular,6.338654,112.903227,136.424733,122.311830,106.194991,650,False,False
33280,Toyota,Prius Eco,2017,1.8,4.0,Automatic (variable gear ratios),Front-Wheel Drive,Midsize Cars,Regular,5.885893,136.424733,124.663980,131.720432,98.176892,600,True,True


In [43]:
type(cars_df.columns)

pandas.core.indexes.base.Index

In [37]:
type(fuel_cost_max_cars["Make"].unique())

numpy.ndarray

<b> (Optional)

What brand has the worse CO2 Emissions on average?

Hint: use the function `sort_values()`

In [44]:
co2_avg_by_make = cars_df_KPL.groupby(["Make"])["CO2 Emission Grams/Km"].mean().sort_values(ascending=False)
top_co2_make = co2_avg_by_make.idxmax()
print(f"{top_co2_make} has the worse CO2 Emissions on average. Please refer to the table below for details.")
co2_avg_by_make

Vector has the worse CO2 Emissions on average. Please refer to the table below for details.


Make
Vector                              651.919248
Bugatti                             542.497235
Laforza Automobile Inc              502.012683
Dutton                              476.419879
Rolls-Royce                         475.397772
Lamborghini                         469.001266
Texas Coach Company                 460.178293
Maybach                             453.327003
Ferrari                             442.812798
Bentley                             426.290692
Ruf Automobile Gmbh                 424.779962
Tecstar, LP                         424.779962
Aston Martin                        417.946348
Pagani                              416.941106
Saleen Performance                  409.609249
Wallace Environmental               408.857065
Vixen Motor Company                 395.348404
Excalibur Autos                     394.438536
PAS, Inc                            394.438536
J.K. Motors                         390.994816
Roush Performance                   386.441107
Maserati

Do cars with automatic transmission consume more fuel than cars with manual transmission on average?

In [45]:
co2_avg_by_transmission = cars_df_KPL.groupby(["Is Auto"])["CO2 Emission Grams/Km"].mean().sort_values(ascending=False)
co2_avg_by_transmission
print("According to the result above, cars with automatic transmission consume more fuel than cars with manual transmission on average.")

Is Auto
True     302.853002
False    279.718227
Name: CO2 Emission Grams/Km, dtype: float64

According to the result above, cars with automatic transmission consume more fuel than cars with manual transmission on average.
