# Results with Merged Dataset
#### Q5: For all of the models that were produced in 2008 that are still being produced now, how much has the mpg improved and which vehicle improved the most?


Here are the steps for answering this question.

**1. Create a new dataframe, model_mpg, that contain the mean combined mpg values in 2008 and 2018 for each unique model**   
To do this, group by model and find the mean cmb_mpg_2008 and mean cmb_mpg for each.

**2. Create a new column, mpg_change, with the change in mpg**   
Subtract the mean mpg in 2008 from that in 2018 to get the change in mpg

**3. Find the vehicle that improved the most**
Find the max mpg change, and then use query or indexing to see what model it is!   
   
Remember to use your new dataset, `combined_dataset.csv`. You should've created this data file in the previous section: *Merging Datasets*.

In [1]:
import pandas as pd

In [2]:
# load dataset
df = pd.read_csv("combined_dataset.csv")
df.head(2)

Unnamed: 0,model_2008,displ_2008,cyl_2008,trans_2008,drive_2008,fuel_2008,veh_class_2008,air_pollut_2008,city_mpg_2008,hwy_mpg_2008,...,trans,drive,fuel,veh_class,air_pollution_score,city_mpg,hwy_mpg,cmb_mpg,greenhouse_gas_score,smartway
0,ACURA RDX,2.3,4,Auto-S5,4WD,Gasoline,SUV,7.0,17.0,22.0,...,SemiAuto-6,2WD,Gasoline,small SUV,3.0,20.0,28.0,23.0,5,No
1,ACURA RDX,2.3,4,Auto-S5,4WD,Gasoline,SUV,7.0,17.0,22.0,...,SemiAuto-6,4WD,Gasoline,small SUV,3.0,19.0,27.0,22.0,4,No


### 1. Create a new dataframe, `model_mpg`, that contain the mean combined mpg values in 2008 and 2018 for each unique model

To do this, group by `model` and find the mean `cmb_mpg_2008` and mean `cmb_mpg` for each.

In [3]:
model_mpg = df.groupby("model")["cmb_mpg_2008","cmb_mpg"].mean()
model_mpg.head()

Unnamed: 0_level_0,cmb_mpg_2008,cmb_mpg
model,Unnamed: 1_level_1,Unnamed: 2_level_1
ACURA RDX,19.0,22.5
AUDI A3,23.333333,28.0
AUDI A4,21.0,27.0
AUDI A6,19.666667,25.666667
AUDI A8 L,16.5,22.0


In [4]:
# another way to proceed
model_mpg = df.groupby("model").mean()[["cmb_mpg_2008","cmb_mpg"]]
model_mpg.head(2)

Unnamed: 0_level_0,cmb_mpg_2008,cmb_mpg
model,Unnamed: 1_level_1,Unnamed: 2_level_1
ACURA RDX,19.0,22.5
AUDI A3,23.333333,28.0


### 2. Create a new column, `mpg_change`, with the change in mpg
Subtract the mean mpg in 2008 from that in 2018 to get the change in mpg

In [5]:
model_mpg['mpg_change'] = model_mpg["cmb_mpg"] - model_mpg["cmb_mpg_2008"]
model_mpg.head(2)

Unnamed: 0_level_0,cmb_mpg_2008,cmb_mpg,mpg_change
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ACURA RDX,19.0,22.5,3.5
AUDI A3,23.333333,28.0,4.666667


### 3. Find the vehicle that improved the most
Find the max mpg change, and then use query or indexing to see what model it is!

In [6]:
model = model_mpg[ model_mpg["mpg_change"]== model_mpg["mpg_change"].max() ]
model

Unnamed: 0_level_0,cmb_mpg_2008,cmb_mpg,mpg_change
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
VOLVO XC 90,15.666667,32.2,16.533333


In [7]:
# another way to proceed 
model = model.query(" mpg_change == {}".format( model_mpg["mpg_change"].max() ))
model

Unnamed: 0_level_0,cmb_mpg_2008,cmb_mpg,mpg_change
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
VOLVO XC 90,15.666667,32.2,16.533333


In [8]:
# another way to proceed : using pandas .idxmax(), which return the index of the row containing a column's maximum value

In [9]:
idx = model_mpg["mpg_change"].idxmax()
idx

'VOLVO XC 90'

In [10]:
model_mpg.loc[idx]

cmb_mpg_2008    15.666667
cmb_mpg         32.200000
mpg_change      16.533333
Name: VOLVO XC 90, dtype: float64