# Exploratory Data Analysis (EDA)

**Objective:**  
Explore the cleaned EV population dataset to understand adoption trends,
vehicle characteristics, and geographic distribution. This analysis
identifies patterns and relationships that will guide deeper analysis
and feature engineering.

**Data Source:**  
Processed dataset generated in `02_cleaning.ipynb`.


In [15]:
import pandas as pd
import numpy as np

In [16]:
df = pd.read_csv("../data/cleaned/ev_population_clean.csv")
df.head()

Unnamed: 0,vin_1_10,county,city,state,postal_code,model_year,make,model,electric_vehicle_type,clean_alternative_fuel_vehicle_cafv_eligibility,electric_range,legislative_district,dol_vehicle_id,vehicle_location,electric_utility,2020_census_tract
0,5YJYGDEE8L,Thurston,Tumwater,WA,98501.0,2020,TESLA,MODEL Y,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,291.0,35.0,124633715,POINT (-122.89165 47.03954),PUGET SOUND ENERGY INC,53067010000.0
1,5YJXCAE2XJ,Snohomish,Bothell,WA,98021.0,2018,TESLA,MODEL X,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,238.0,1.0,474826075,POINT (-122.18384 47.8031),PUGET SOUND ENERGY INC,53061050000.0
2,5YJ3E1EBXK,King,Kent,WA,98031.0,2019,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,220.0,47.0,280307233,POINT (-122.17743 47.41185),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033030000.0
3,7SAYGDEE4T,King,Issaquah,WA,98027.0,2026,TESLA,MODEL Y,Battery Electric Vehicle (BEV),Eligibility unknown as battery range has not b...,0.0,41.0,280786565,POINT (-122.03439 47.5301),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033020000.0
4,WAUUPBFF9G,King,Seattle,WA,98103.0,2016,AUDI,A3,Plug-in Hybrid Electric Vehicle (PHEV),Not eligible due to low battery range,16.0,43.0,198988891,POINT (-122.35436 47.67596),CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),53033000000.0


In [17]:
df.shape
df.info()
df.describe(include="all")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 270262 entries, 0 to 270261
Data columns (total 16 columns):
 #   Column                                           Non-Null Count   Dtype  
---  ------                                           --------------   -----  
 0   vin_1_10                                         270262 non-null  object 
 1   county                                           270252 non-null  object 
 2   city                                             270252 non-null  object 
 3   state                                            270262 non-null  object 
 4   postal_code                                      270252 non-null  float64
 5   model_year                                       270262 non-null  int64  
 6   make                                             270262 non-null  object 
 7   model                                            270262 non-null  object 
 8   electric_vehicle_type                            270262 non-null  object 
 9   clean_alternati

Unnamed: 0,vin_1_10,county,city,state,postal_code,model_year,make,model,electric_vehicle_type,clean_alternative_fuel_vehicle_cafv_eligibility,electric_range,legislative_district,dol_vehicle_id,vehicle_location,electric_utility,2020_census_tract
count,270262,270252,270252,270262,270252.0,270262.0,270262,270262,270262,270262,270257.0,269613.0,270262.0,270174,270252,270252.0
unique,16415,242,864,51,,,47,183,2,3,,,,1080,77,
top,7SAYGDEE7P,King,Seattle,WA,,,TESLA,MODEL Y,Battery Electric Vehicle (BEV),Eligibility unknown as battery range has not b...,,,,POINT (-122.13158 47.67858),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),
freq,1171,133903,42125,269613,,,111049,57335,215859,169872,,,,6588,96367,
mean,,,,,98176.713849,2021.964468,,,,,40.386332,28.850107,244119900.0,,,52972610000.0
std,,,,,2569.741818,3.05396,,,,,79.342202,14.895435,64308720.0,,,1625614000.0
min,,,,,1030.0,1999.0,,,,,0.0,1.0,4385.0,,,1001020000.0
25%,,,,,98052.0,2021.0,,,,,0.0,17.0,219441400.0,,,53033010000.0
50%,,,,,98133.0,2023.0,,,,,0.0,32.0,261505100.0,,,53033030000.0
75%,,,,,98382.0,2024.0,,,,,33.0,42.0,277621000.0,,,53053940000.0


The dataset contains structured records of electric vehicles with
temporal, geographic, and vehicle-level attributes. Missing values
remain in administrative and range-related fields and reflect
limitations in source reporting rather than data quality errors.

In [18]:
df["electric_vehicle_type"].value_counts(normalize=True)

electric_vehicle_type
Battery Electric Vehicle (BEV)            0.798703
Plug-in Hybrid Electric Vehicle (PHEV)    0.201297
Name: proportion, dtype: float64

In [19]:
df["make"].value_counts().head(10)

make
TESLA         111049
CHEVROLET      19032
NISSAN         15963
FORD           14819
KIA            13470
TOYOTA         11159
BMW            11036
HYUNDAI         9651
RIVIAN          8475
VOLKSWAGEN      7358
Name: count, dtype: int64

In [20]:
df["model_year"].value_counts().sort_index()

model_year
1999        2
2000        8
2002        1
2003        1
2008       20
2010       23
2011      603
2012     1402
2013     3989
2014     3223
2015     4430
2016     5139
2017     8459
2018    14007
2019    10811
2020    12099
2021    20628
2022    29622
2023    59324
2024    49138
2025    35954
2026    11379
Name: count, dtype: int64

Battery Electric Vehicles (BEVs) dominate the dataset, with a smaller share of Plug-in Hybrid Electric Vehicles (PHEVs). A limited number of manufacturers account a large proportion of EV registrations.

In [21]:
ev_by_year = df.groupby("model_year").size()
ev_by_year

model_year
1999        2
2000        8
2002        1
2003        1
2008       20
2010       23
2011      603
2012     1402
2013     3989
2014     3223
2015     4430
2016     5139
2017     8459
2018    14007
2019    10811
2020    12099
2021    20628
2022    29622
2023    59324
2024    49138
2025    35954
2026    11379
dtype: int64

In [30]:
df.groupby(["model_year", "electric_vehicle_type"]).size().unstack(fill_value=0)

electric_vehicle_type,Battery Electric Vehicle (BEV),Plug-in Hybrid Electric Vehicle (PHEV)
model_year,Unnamed: 1_level_1,Unnamed: 2_level_1
1999,2,0
2000,8,0
2002,1,0
2003,1,0
2008,20,0
2010,21,2
2011,544,59
2012,619,783
2013,2481,1508
2014,1560,1663


EV registrations increase sharply in recent model years, indicating
accelerating adoption. BEVs increasingly dominate newer registrations,
suggesting a shift away from hybrid models.


In [24]:
df["county"].value_counts().head(10)

county
King         133903
Snohomish     33531
Pierce        22213
Clark         16553
Thurston       9852
Kitsap         9057
Spokane        7593
Whatcom        6620
Benton         3792
Skagit         3166
Name: count, dtype: int64

In [26]:
df.groupby("county").size().sort_values(ascending=False).head(10)

county
King         133903
Snohomish     33531
Pierce        22213
Clark         16553
Thurston       9852
Kitsap         9057
Spokane        7593
Whatcom        6620
Benton         3792
Skagit         3166
dtype: int64

In [27]:
df["electric_range"].describe()

count    270257.000000
mean         40.386332
std          79.342202
min           0.000000
25%           0.000000
50%           0.000000
75%          33.000000
max         337.000000
Name: electric_range, dtype: float64

In [28]:
df.groupby("model_year")["electric_range"].mean()

model_year
1999     74.000000
2000     58.000000
2002     95.000000
2003     95.000000
2008    209.000000
2010    232.391304
2011     71.434494
2012     59.727532
2013     78.468288
2014     78.598200
2015     95.922348
2016    101.997860
2017    117.755054
2018    156.884058
2019    176.246323
2020    237.886850
2021     12.569517
2022      4.733711
2023      3.842239
2024      7.460764
2025      7.484089
2026      1.920724
Name: electric_range, dtype: float64

The average electric range across all EV types decreases in recent model
years due to the inclusion of Plug-in Hybrid Electric Vehicles and
incomplete range reporting for newer registrations.

In [32]:
df.groupby(["model_year", "electric_vehicle_type"])["electric_range"].mean().unstack(fill_value=0)

electric_vehicle_type,Battery Electric Vehicle (BEV),Plug-in Hybrid Electric Vehicle (PHEV)
model_year,Unnamed: 1_level_1,Unnamed: 2_level_1
1999,74.0,0.0
2000,58.0,0.0
2002,95.0,0.0
2003,95.0,0.0
2008,209.0,0.0
2010,245.0,100.0
2011,75.386029,35.0
2012,108.048465,21.527458
2013,110.485691,25.79244
2014,127.021795,33.173782


When electric range is analyzed by vehicle type, a clear divergence
emerges. Battery Electric Vehicles exhibit a strong increase in average
range through 2020, reflecting advances in battery technology. Plug-in
Hybrid Electric Vehicles maintain a relatively stable and low electric-
only range across years, consistent with their design intent. Apparent
declines in BEV range in recent model years are attributable to
incomplete reporting rather than true performance regression.

## Key Findings from EDA

- EV adoption accelerates sharply in recent model years.
- Battery Electric Vehicles increasingly dominate the market relative to Plug-in Hybrid Electric Vehicles.
- A small number of manufacturers and counties account for a disproportionate share of EV registrations.
- Battery Electric Vehicle range increases substantially through the late 2010s, reflecting advances in battery technology, while Plug-in Hybrid Electric Vehicle range remains relatively stable. Apparent declines in recent years are driven by incomplete reporting rather than true performance regression.
- Missing values are concentrated in administrative fields and do not materially affect high-level trend analysis.

These findings inform the feature engineering and deeper analytical steps that follow.