**COMPILED DATASETS FOR VEHICLE FUEL EMISSIONS ANALYSIS**

This notebook compiles publicly available datasets used for analyzing emissions from passenger electric vehicles (EVs)  and internal combustion engine (ICE) vehicles.

**IMPORTS**

In [110]:
import numpy as np
import pandas as pd
import path

In [111]:
# Included to support accessing the variables and dataframes in this notebook from another notebook
import os
from datetime import datetime, timedelta

In [112]:
# Included to ignore any warning dialoge generated
import warnings
warnings.filterwarnings('ignore')

In [113]:
# Added to show all columns in df display
pd.options.display.max_columns = None 

**DATASET 1: Vehicle Fuel Efficiency Standards (1978 to 2032)** \
_Corporate Average Fuel Economy (CAFE) standards provided by National Highway Traffic Safety Administration (for standards from 1978-2010), \
the US EPA (for standards from 2012-2025), National Highway Traffic Safety Administration (for standards from 2026-2032)_

In [114]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
vehicle_emission_standards = pd.read_csv("Resources/vehicle_efficiency_CAFE_requirements.csv")

# Display the first 5 rows
vehicle_emission_standards.fillna(0,inplace=True)

# Display the first 5 rows
vehicle_emission_standards.head(5)

Unnamed: 0,Model Year,Passenger Cars,Light-Duty Trucks
0,1978,18.0,0.0
1,1979,19.0,0.0
2,1980,20.0,0.0
3,1981,22.0,0.0
4,1982,24.0,17.5


**EV DATASET 2: 2024 Global EV Outlook EV** \
_Includes historical and projected data aligned to stated policies scenario (STEPS) and announced pledges scenario (APS) \
for electric vehicles sales, stock, charging infrastructure and oil displacement \
Documentation available at: https://www.iea.org/data-and-statistics/data-product/global-ev-outlook-2024#global-ev-data_

In [115]:
# Reads in historic and projeted EV electric demand data and joins datafames
ev_outlook= pd.read_csv("Resources/IEA Global EV Data 2024.csv")
ev_outlook.head(5)

Unnamed: 0,region,category,parameter,mode,powertrain,year,unit,value
0,Australia,Historical,EV stock share,Cars,EV,2011,percent,0.00039
1,Australia,Historical,EV sales share,Cars,EV,2011,percent,0.0065
2,Australia,Historical,EV sales,Cars,BEV,2011,Vehicles,49.0
3,Australia,Historical,EV stock,Cars,BEV,2011,Vehicles,49.0
4,Australia,Historical,EV stock,Cars,BEV,2012,Vehicles,220.0


**DATASET 3: Total US Electricity Generation By Fuel Type** \
_EIA net electricity generation for all sectors energy in US from 2010 to 2023_

In [116]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
net_generation = pd.read_csv("Resources/Net_generation_for_all_sectors.csv",
                                   skiprows=4,
                                   header=0)

# Replaces missing values with 0
net_generation.fillna(0,inplace=True)

# Drops uneeded columns
net_generation.drop(columns="source key", axis=1, inplace=True)
net_generation['description'] = net_generation['description'].str.replace('United States : ', '', regex=False)
net_generation = net_generation.iloc[2:]
net_generation = net_generation.reset_index(drop=True)
net_generation = net_generation.set_index('description')

# Transposes dataframe to show records by year to align with other datasets
net_generation = net_generation.T
net_generation = net_generation.reset_index()

# Display the first 5 rows
net_generation.head(10)

description,index,all fuels (utility-scale),coal,petroleum liquids,petroleum coke,natural gas,other gases,nuclear,conventional hydroelectric,other renewables,wind,all utility-scale solar,geothermal,biomass,wood and wood-derived fuels,other biomass,hydro-electric pumped storage,other,all solar,small-scale solar photovoltaic,all utility-scale solar.1
0,units,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,0.0,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours,thousand megawatthours
1,2010,4125060,1847290,23337,13724,987697,11313,806968,260203,0.0,94652,1212,15219,56089,37172,18917,-5501,12855,--,--,1212
2,2011,4100141,1733430,16086,14096,1013689,11566,790204,319355,0.0,120177,1818,15316,56671,37449,19222,-6421,14154,--,--,1818
3,2012,4047765,1514043,13403,9787,1225894,11898,769331,276240,0.0,140822,4327,15562,57622,37799,19823,-4950,13787,--,--,4327
4,2013,4065964,1581115,13820,13344,1124836,12853,789016,268565,0.0,167840,9036,15775,60858,40028,20830,-4681,13588,--,--,9036
5,2014,4093564.0,1581710.0,18276.0,11955.0,1126635.0,12022.0,797166.0,259367.0,0.0,181655.0,17691.0,15877.0,63989.0,42340.0,21650.0,-6174.0,13393.0,28924.0,11233.0,17691.0
6,2015,4078714.0,1352398.0,17372.0,10877.0,1334668.0,13117.0,797178.0,249080.0,0.0,190719.0,24893.0,15918.0,63632.0,41929.0,21703.0,-5091.0,13955.0,39032.0,14139.0,24893.0
7,2016,4077574.0,1239149.0,13008.0,11197.0,1379271.0,12807.0,805694.0,267812.0,0.0,226993.0,36054.0,15826.0,62760.0,40947.0,21813.0,-6686.0,13689.0,54866.0,18812.0,36054.0
8,2017,4035443.0,1205835.0,12414.0,8976.0,1297703.0,12469.0,804950.0,300333.0,0.0,254303.0,53287.0,15927.0,62733.0,41124.0,21610.0,-6495.0,13008.0,77277.0,23990.0,53287.0
9,2018,4180988.0,1149487.0,16245.0,8981.0,1471843.0,13463.0,807084.0,292524.0,0.0,272667.0,63825.0,15967.0,61832.0,40936.0,20896.0,-5905.0,12973.0,93365.0,29539.0,63825.0


**DATASET 4: Electricity Consumption By Fuel Type** \
_EIA electricity consumption for all energy sectors in US from 2010 to 2023_

In [117]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
sector_elec_consumption = pd.read_csv("Resources/Consumption_for_electricity_generation_for_all_sectors.csv",
                                   skiprows=4,
                                   header=0)

# Replaces missing values with 0
sector_elec_consumption.fillna(0,inplace=True)

# Removes uneeded data
sector_elec_consumption.drop(columns="source key", axis=1, inplace=True)
sector_elec_consumption['description'] = sector_elec_consumption['description'].str.replace('United States : ', '', regex=False)
sector_elec_consumption = sector_elec_consumption.iloc[2:]
sector_elec_consumption = sector_elec_consumption.reset_index(drop=True)
sector_elec_consumption = sector_elec_consumption.set_index('description')

# Transposes dataframe to show records by year to align with other datasets
sector_elec_consumption = sector_elec_consumption.T
sector_elec_consumption = sector_elec_consumption.reset_index()


# Display the first 5 rows
sector_elec_consumption.head(10)

description,index,coal,petroleum liquids,petroleum coke,natural gas
0,units,thousand tons,thousand barrels,thousand tons,thousand Mcf
1,2010,979684.0,40103.0,4994.0,7680185.0
2,2011,934938.0,27326.0,5012.0,7883865.0
3,2012,825734.0,22604.0,3675.0,9484710.0
4,2013,860729.0,23231.0,4852.0,8596299.0
5,2014,853634.0,31531.0,4412.0,8544387.0
6,2015,739594.0,28925.0,4044.0,10016576.0
7,2016,677371.0,22405.0,4253.0,10170110.0
8,2017,663911.0,21696.0,3490.0,9508062.0
9,2018,636213.0,28614.0,3623.0,10842129.0


**DATASET 3: Fuel Efficiency Standards by Year** \
_Corporate Average Fuel Economy (CAFE) standards in miles per gallon (mpg) provided by National Highway Traffic Safety Administration\
(for standards from 1978-2010), the US EPA (for standards from 2012-2025), National Highway Traffic Safety Administration (for standards \
from 2026-2032)_

In [118]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
vehicle_emission_standards = pd.read_csv("Resources/vehicle_efficiency_CAFE_requirements.csv")

# Display the first 5 rows
vehicle_emission_standards.fillna(0,inplace=True)

# Display the first 5 rows
vehicle_emission_standards.head(5)

Unnamed: 0,Model Year,Passenger Cars,Light-Duty Trucks
0,1978,18.0,0.0
1,1979,19.0,0.0
2,1980,20.0,0.0
3,1981,22.0,0.0
4,1982,24.0,17.5


**DATASET 5: Emissions by Sector** \
_EPA U.S. Emissions by Inventory Sector, MMT CO2 eq._

In [148]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
emissions_by_sector = pd.read_csv("Resources/US_emissions_by_sector Sector.csv")
emissions_by_sector['Year'] = emissions_by_sector['U.S. Emissions by Inventory Sector, MMT CO2 eq.']
emissions_by_sector = emissions_by_sector.set_index('Year')
#emissions_by_sector = emissions_by_sector.droplevel('U.S. Emissions by Inventory Sector, MMT CO2 eq.', axis=0) 
emissions_by_sector = emissions_by_sector.drop('U.S. Emissions by Inventory Sector, MMT CO2 eq.', axis=1)

# Transposes dataframe to show records by year to align with other datasets
emissions_by_sector  = emissions_by_sector.T

# Display the first 5 rows
emissions_by_sector.head(5)

Year,Energy,Agriculture,Industrial processes and product use,Waste,"Land use, land-use change, and forestry",Net total,Gross total
1990,5381.018352,551.144889,368.804483,235.946956,-976.735924,5560.178756,6536.91468
1991,5339.11912,543.176284,351.098576,239.295372,-989.00839,5483.680962,6472.689352
1992,5435.723363,543.242299,357.513542,239.87249,-1007.380418,5568.971275,6576.351694
1993,5526.864571,564.394935,357.110977,238.085054,-991.367833,5695.087704,6686.455537
1994,5606.649249,567.48616,369.768506,238.734867,-1005.222025,5777.416756,6782.638781


**DATASET 6: Coal vs EV Emissions Differences By Country** \
_dataset source and description_

In [119]:
# Reads the CSV file, skipping the first 3 rows and using the 4th row as header
coal_vs_EV_emissions = pd.read_csv("Resources/coal_power_vs_ev_emissions_with_difference.csv")

coal_vs_EV_emissions_by_country = coal_vs_EV_emissions.groupby(by="Country")
# Display the first 5 rows
coal_vs_EV_emissions_by_country.sum()

Unnamed: 0_level_0,Year,Coal Power Emissions (metric tons CO2),Number of EVs,Emissions Saved per EV (metric tons CO2/year),Total Emissions Saved by EVs (metric tons CO2),Difference (Coal Emissions - EV Savings) (metric tons CO2)
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
China,6060,23700000000,15500000,13.8,71300000,23628700000
India,6060,7500000000,4500000,13.8,20700000,7479300000
United States,6060,3600000000,4900000,13.8,22540000,3577460000


**EXTRACTING DATAFRAMES FOR USE** \
The %run magic command may be added to other .ipynb files to execute the this notebook

In [120]:
# EXECUTED FROM NEW NOTEBOOK OPTION
    # Runs all the code in vehicle_fuel_emissions_data.ipynb and make the community_profile dataframe available in other notebook
    # To use this option, paste the "%run vehicle_fuel_emissions_data" command into a notebook that will execute this notebook

# %run vehicle_fuel_emissions_data.ipynb 