## COMPILED DATASETS FOR VEHICLE FUEL EMISSIONS ANALYSIS

This notebook compiles publicly available datasets used for analyzing emissions from passenger electric vehicles (EVs)  and internal combustion engine (ICE) vehicles.

**IMPORTS**

In [33]:
import numpy as np
import pandas as pd
import path
import matplotlib.pyplot as plt

In [3]:
# Included to support accessing the variables and dataframes in this notebook from another notebook
import os
from datetime import datetime, timedelta

In [4]:
# Included to ignore any warning dialoge generated
import warnings
warnings.filterwarnings('ignore')

In [5]:
# Added to show all columns in df display
pd.options.display.max_columns = None 

## Vehicle Data

**EV DATASET 1: 2024 Global EV Outlook EV** \
_Includes historical and projected data aligned to stated policies scenario (STEPS) and announced pledges scenario (APS) \
for electric vehicles sales, stock, charging infrastructure and oil displacement \
Documentation available at: https://www.iea.org/data-and-statistics/data-product/global-ev-outlook-2024#global-ev-data_

In [44]:
# Reads in historic and projeted EV electric demand data and joins datafames
ev_outlook= pd.read_csv("Resources/IEA Global EV Data 2024.csv")
ev_outlook.head(5)

Unnamed: 0,region,category,parameter,mode,powertrain,year,unit,value
0,Australia,Historical,EV stock share,Cars,EV,2011,percent,0.00039
1,Australia,Historical,EV sales share,Cars,EV,2011,percent,0.0065
2,Australia,Historical,EV sales,Cars,BEV,2011,Vehicles,49.0
3,Australia,Historical,EV stock,Cars,BEV,2011,Vehicles,49.0
4,Australia,Historical,EV stock,Cars,BEV,2012,Vehicles,220.0


**DATASET 2: Fuel Efficiency Standards by Year** \
_Corporate Average Fuel Economy (CAFE) standards in miles per gallon (mpg) provided by National Highway Traffic Safety Administration
and  US EPA \
Documentation available at: https://afdc.energy.gov/data/10562_

In [6]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
vehicle_emission_standards = pd.read_csv("Resources/vehicle_efficiency_CAFE_requirements.csv")

# Display the first 5 rows
vehicle_emission_standards.fillna(0,inplace=True)

# Display the first 5 rows
vehicle_emission_standards.head(5)

Unnamed: 0,Model Year,Passenger Cars,Light-Duty Trucks
0,1978,18.0,0.0
1,1979,19.0,0.0
2,1980,20.0,0.0
3,1981,22.0,0.0
4,1982,24.0,17.5


**DATASET 3: Estimated Real-World Fuel Economy, CO2 Emissions and Vehicle Attributes** \
_2023 EPA Automotive Trends Report data in US from 2010 to 2023
Documentation available here: https://www.epa.gov/automotive-trends/about-automotive-trends-data_


In [46]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
real_world_emissions = pd.read_csv("Resources/estimated_real_world_fuel_economy (1975 to 2022).csv")

# Replaces missing values with 0
real_world_emissions.fillna(0,inplace=True)

# Display the first 5 rows
real_world_emissions.head(10)

Unnamed: 0,Model Year,Regulatory Class,Vehicle Type,Production Share,Real-World MPG,Real-World MPG_City,Real-World MPG_Hwy,Real-World CO2 (g/mi),Real-World CO2_City (g/mi),Real-World CO2_Hwy (g/mi),Weight (lbs),Horsepower (HP),Footprint (sq. ft.)
0,1975,All,All,1.0,13.0597,12.01552,14.61167,680.59612,739.738,608.3116,4060.399,137.3346,-
1,1975,Car,All Car,0.806646,13.45483,12.31413,15.17266,660.6374,721.82935,585.84724,4057.494,136.1964,-
2,1975,Car,Sedan/Wagon,0.805645,13.45833,12.31742,15.17643,660.46603,721.63673,585.70185,4057.565,136.2256,-
3,1975,Truck,All Truck,0.193354,11.63431,10.91165,12.659,763.86134,814.4506,702.03002,4072.518,142.0826,-
4,1975,Truck,Pickup,0.131322,11.91476,11.07827,13.12613,745.88139,802.2009,677.04643,4011.977,140.9365,-
5,1975,Truck,Minivan/Van,0.0447,11.10606,10.55642,11.86084,800.19398,841.85725,749.2722,4195.69,143.2245,-
6,1975,Truck,Truck SUV,0.017331,11.02071,10.62298,11.54921,806.39097,836.58258,769.49011,4213.574,147.8221,-
7,1975,Car,Car SUV,0.001001,11.12929,10.13552,12.64456,798.5239,876.81716,702.83214,4000.0,112.7733,-
8,1976,All,All,1.0,14.22136,13.18117,15.73946,625.02238,674.34147,564.74348,4079.198,135.0839,-
9,1976,Car,All Car,0.789164,14.86139,13.69643,16.58558,598.14122,649.00991,535.96838,4058.859,133.5588,-


In [52]:
# Grouped by Regulatory Class
grouped_real_world_emissions = real_world_emissions.groupby('Regulatory Class')
mean_real_world_emissions_by_class = grouped_real_world_emissions.mean('Real-World CO2 (g/mi)')

# Drops uneeded columns
mean_real_world_emissions_by_class.drop(columns=["Real-World MPG_City","Real-World MPG_Hwy","Real-World CO2_City (g/mi)",
                                                 "Real-World CO2_Hwy (g/mi)","Weight (lbs)","Horsepower (HP)"],
                                                 axis=1, inplace=True)
mean_real_world_emissions_by_class

Unnamed: 0_level_0,Real-World MPG,Real-World CO2 (g/mi)
Regulatory Class,Unnamed: 1_level_1,Unnamed: 2_level_1
All,21.162975,429.240752
Car,22.974299,407.142007
Truck,17.824065,514.288718


**DATASET 4: US Electricity Generation By Fuel Type** \
_EIA net electricity generation for all sectors energy in US from 2010 to 2023_

In [None]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
net_generation = pd.read_csv("Resources/Net_generation_for_all_sectors.csv",
                                   skiprows=4,
                                   header=0)

# Replaces missing values with 0
net_generation.fillna(0,inplace=True)

# Drops uneeded columns
net_generation.drop(columns="source key", axis=1, inplace=True)
net_generation['description'] = net_generation['description'].str.replace('United States : ', '', regex=False)
net_generation = net_generation.iloc[2:]
net_generation = net_generation.reset_index(drop=True)
net_generation = net_generation.set_index('description')

# Transposes dataframe to show records by year to align with other datasets
net_generation = net_generation.T
net_generation = net_generation.reset_index()

# Display the first 5 rows
net_generation.head(10)

**DATASET 4: US Electricity Consumption By Fuel Type** \
_EIA electricity consumption for all energy sectors in US from 2010 to 2023_

In [None]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
sector_elec_consumption = pd.read_csv("Resources/Consumption_for_electricity_generation_for_all_sectors.csv",
                                   skiprows=4,
                                   header=0)

# Replaces missing values with 0
sector_elec_consumption.fillna(0,inplace=True)

# Removes uneeded data
sector_elec_consumption.drop(columns="source key", axis=1, inplace=True)
sector_elec_consumption['description'] = sector_elec_consumption['description'].str.replace('United States : ', '', regex=False)
sector_elec_consumption = sector_elec_consumption.iloc[2:]
sector_elec_consumption = sector_elec_consumption.reset_index(drop=True)
sector_elec_consumption = sector_elec_consumption.set_index('description')

# Transposes dataframe to show records by year to align with other datasets
sector_elec_consumption = sector_elec_consumption.T
sector_elec_consumption = sector_elec_consumption.reset_index()


# Display the first 5 rows
sector_elec_consumption.head(10)

**DATASET 5: Emissions by Sector** \
_EPA US Greenhouse Gas Inventory by Economic Sector, MMT CO2 eq.\
Documentation available at https://afdc.energy.gov/data/10802_

In [None]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
emissions_by_econ_sector = pd.read_csv("Resources/GHG_emissions_by_econ_sector.csv",
                                       skiprows=2,
                                       header=0)
# Drop unnamed columns
emissions_by_econ_sector = emissions_by_econ_sector.drop(columns=[col for col in emissions_by_econ_sector.columns if col.startswith('Unnamed')])

# Display the first 5 rows and data types
display(emissions_by_econ_sector.head(5))
display(emissions_by_econ_sector.dtypes)


In [None]:
# Converts data columns to numeric
columns_to_convert = ['Transportation', 'Electricity Generation', 'Industry', 'Total']
emissions_by_econ_sector[columns_to_convert] = emissions_by_econ_sector[cols_to_convert].replace(',', '', regex=True).astype(float)

# Sets indes to Year
emissions_by_econ_sector = emissions_by_econ_sector.set_index('Year')

# Select the 2022 GHG data
data_2022 = emissions_by_econ_sector.loc['2022']
sectors = ['Transportation', 'Electricity Generation', 'Industry', 'Agriculture', 'Commercial', 'Residential']
values = data_2022[sectors]

# Creates pie chart 
plt.figure(figsize=(10, 8))
plt.pie(values, labels=sectors, autopct='%1.1f%%', startangle=90)
plt.title('GHG Emissions by Economic Sector in 2022')
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle
plt.show()

**DATASET 6: Coal vs EV Emissions Differences By Country** \
_dataset source and description_

In [None]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
coal_vs_EV_emissions = pd.read_csv("Resources/coal_power_vs_ev_emissions_with_difference.csv")

coal_vs_EV_emissions_by_country = coal_vs_EV_emissions.groupby(by="Country")
# Display the first 5 rows
coal_vs_EV_emissions_by_country.sum()

**EXTRACTING DATAFRAMES FOR USE** \
The %run magic command may be added to other .ipynb files to execute the this notebook

In [None]:
# EXECUTED FROM NEW NOTEBOOK OPTION
    # Runs all the code in vehicle_fuel_emissions_data.ipynb and make the community_profile dataframe available in other notebook
    # To use this option, paste the "%run vehicle_fuel_emissions_data" command into a notebook that will execute this notebook

# %run vehicle_fuel_emissions_data.ipynb 