## Fuel Economy Case Study

### Dataset
dataset related to fuel economy and the EPA. The Environmental Protection Agency (EPA) is a U.S. government agency that helps protect people from environmental risks by researching and collecting data. Fuel economy is one set of data they collect.

### Context

The fuel economy of an automobile is the fuel efficiency relationship between the distance traveled and the amount of fuel consumed by the vehicle. Consumption can be expressed in terms of volume of fuel to travel a distance, or the distance travelled per unit volume of fuel consumed.

### The problem
We have practiced a little on how to clean our data with toy datasets, but now we'll be working on our fuel economy case study. The data is actually from two different years, 2008 and 2018. We will be using these two throughout the rest of the lesson, iterating each time as we manipulate the data into our final clean dataset.

Although both datasets are about the same topic—fuel economy—they vary in what is provided.

### The solution: Drop Extraneous Columns
One way to normalize the columns between both datasets is by dropping anything that is unneeded or not relevant. Pandas provides this as the .drop() method for DataFrames.

### Columns to Drop:

    From the 2008 dataset: 'Stnd', 'Underhood ID', 'FE Calc Appr', 'Unadj Cmb MPG'
    From the 2018 dataset: 'Stnd', 'Stnd Description', 'Underhood ID', 'Comb CO2'

Starting up your workspace

# Dropping Columns

For this notebook, we will be learning about the `.drop()` method for pandas DataFrames.

Additional information about each option can be found here - [drop](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html).
We'll be using the fuel economy data dataset, which is provided in the workspace as `all_alpha_08.csv` (2008 dataset) and `all_alpha_18.csv` (2018 dataset).

In [None]:
# load datasets
import pandas as pd
df_08 = pd.read_csv("all_alpha_08.csv")
df_18 = pd.read_csv("all_alpha_18.csv")

In [None]:
# view 2008 dataset
df_08.head(1)

Unnamed: 0,Model,Displ,Cyl,Trans,Drive,Fuel,Sales Area,Stnd,Underhood ID,Veh Class,Air Pollution Score,FE Calc Appr,City MPG,Hwy MPG,Cmb MPG,Unadj Cmb MPG,Greenhouse Gas Score,SmartWay
0,ACURA MDX,3.7,(6 cyl),Auto-S5,4WD,Gasoline,CA,U2,8HNXT03.7PKR,SUV,7,Drv,15,20,17,22.0527,4,no


In [None]:
# view 2018 dataset
df_18.head(1)

Unnamed: 0,Model,Displ,Cyl,Trans,Drive,Fuel,Cert Region,Stnd,Stnd Description,Underhood ID,Veh Class,Air Pollution Score,City MPG,Hwy MPG,Cmb MPG,Greenhouse Gas Score,SmartWay,Comb CO2
0,ACURA RDX,3.5,6.0,SemiAuto-6,2WD,Gasoline,FA,T3B125,Federal Tier 3 Bin 125,JHNXT03.5GV3,small SUV,3,20,28,23,5,No,386


### Drop Extraneous Columns

In [None]:
# drop columns from 2008 dataset: 'Stnd', 'Underhood ID', 'FE Calc Appr', 'Unadj Cmb MPG'
df_08.drop(['Stnd', 'Underhood ID', 'FE Calc Appr', 'Unadj Cmb MPG'], axis=1, inplace=True)

# confirm changes
df_08.head(1)

Unnamed: 0,Model,Displ,Cyl,Trans,Drive,Fuel,Sales Area,Veh Class,Air Pollution Score,City MPG,Hwy MPG,Cmb MPG,Greenhouse Gas Score,SmartWay
0,ACURA MDX,3.7,(6 cyl),Auto-S5,4WD,Gasoline,CA,SUV,7,15,20,17,4,no


In [None]:
# drop columns from 2018 dataset: 'Stnd', 'Stnd Description', 'Underhood ID', 'Comb CO2'
df_18.drop(['Stnd', 'Stnd Description', 'Underhood ID', 'Comb CO2'], axis=1, inplace=True)

# confirm changes
df_18.head(1)

Unnamed: 0,Model,Displ,Cyl,Trans,Drive,Fuel,Cert Region,Veh Class,Air Pollution Score,City MPG,Hwy MPG,Cmb MPG,Greenhouse Gas Score,SmartWay
0,ACURA RDX,3.5,6.0,SemiAuto-6,2WD,Gasoline,FA,small SUV,3,20,28,23,5,No


In [None]:
# save new datasets for next section
df_08.to_csv('data_08_v1.csv', index=False)
df_18.to_csv('data_18_v1.csv', index=False)