# COGS 108 - Data Checkpoint

# Names

- Aditya Sriram
- Nicole Liu
- Weston Chester
- Sophia Conti
- Katherine Gao

<a id='research_question'></a>
# Research Question

Is California on track to reach its goal of 100% zero emission sales by 2035?

Hypothesis: Based on previous data, we believe that California is on track to meet its 100% zero emission sales goal by 2035.

# Dataset(s)

- Dataset Name: New_ZEV_Sales_Last_updated_04-21-2023_ada 
- Link to the dataset: https://www.energy.ca.gov/files/zev-and-infrastructure-stats-data
- Number of observations: 18136

This dataset is a csv file containing information about the number of ZEVs (zero emission vehicles) sold per year. The data contains columns variables such as year, fuel type, county of sale, make and model of car, and total number of vehicles sold per year. 

In order to use this dataset along side the next, we plan on grouping by county and then merging the two sets together. 

# Setup

In [1]:
import pandas as pd

zev_sales = pd.read_csv('zev_sales.csv')

# Data Cleaning

We start with the dataframe for ZEV sales.

In [2]:
zev_sales.isna().groupby('Data Year').sum()

Unnamed: 0_level_0,County,FUEL_TYPE,MAKE,MODEL,Number of Vehicles
Data Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
False,0,0,0,0,0


Looks like theres no NaN data in this set. We should check for placeholder values ('-' or '-99'), though.

In [3]:
zev_sales[zev_sales.eq('-').any(1)].shape[0] + zev_sales[zev_sales.eq('-99').any(1)].shape[0]

0

Looks like theres no '-' or '-99' either! Good job DMV.

In either case, we are looking to merge this dataset with a secondary one about car registrations, which only has data up to the year 2021. We should filter this dataset (which is about number of cars sold) to the time period of cars sold in 2021 and before. 

In [4]:
zev_sales = zev_sales[zev_sales['Data Year'] <= 2021]

zev_sales

Unnamed: 0,Data Year,County,FUEL_TYPE,MAKE,MODEL,Number of Vehicles
0,1998,Los Angeles,Electric,Ford,Ranger,1
1,1998,Orange,Electric,Ford,Ranger,1
2,1998,San Bernardino,Electric,Ford,Ranger,2
3,1998,San Mateo,Electric,Ford,Ranger,1
4,1999,Santa Barbara,Electric,Ford,Ranger,1
...,...,...,...,...,...,...
12465,2021,Yuba,PHEV,Kia,Sorento PHEV,1
12466,2021,Yuba,PHEV,Subaru,Crosstrek,3
12467,2021,Yuba,PHEV,Toyota,Prius Prime,16
12468,2021,Yuba,PHEV,Toyota,RAV4 Prime,4


We also see that the age of the average car in 2021 is [12 years](https://www.caranddriver.com/news/a33457915/average-age-vehicles-on-road-12-years/). However, EV batteries are predicted to last between [12-15 years](https://cars.usnews.com/cars-trucks/advice/how-long-do-ev-batteries-last), meaning that they would potentially last longer than the body of the car itself. Thus, we will say that we expect the average lifetime of an electric vehicle to be 12 years. Our dataset has its oldest car dating to 1998. By the average 12 year lifespan, we no longer expect that car to be in use and so it should be removed from the dataset. <br>

If we look at total automobile registrations in 2021, which is the most recent year from the other dataset, we would expect the oldest car registered (on average) to be 12 years old. This leads us to the conclusion that we should only look at ZEVs sold in or after 2009 in order to remain in the proper time frame.

In [5]:
zev_sales = zev_sales[zev_sales['Data Year'] >= 2009]

zev_sales

Unnamed: 0,Data Year,County,FUEL_TYPE,MAKE,MODEL,Number of Vehicles
19,2009,Alameda,Electric,Tesla,Roadster,5
20,2009,Contra Costa,Electric,Tesla,Roadster,1
21,2009,Humboldt,Electric,Ford,Ranger,1
22,2009,Kern,Electric,Tesla,Roadster,1
23,2009,Los Angeles,Electric,MINI,Cooper,6
...,...,...,...,...,...,...
12465,2021,Yuba,PHEV,Kia,Sorento PHEV,1
12466,2021,Yuba,PHEV,Subaru,Crosstrek,3
12467,2021,Yuba,PHEV,Toyota,Prius Prime,16
12468,2021,Yuba,PHEV,Toyota,RAV4 Prime,4
