# EV Population Data - Audit

**Purpose:**
This notebook inspects the raw EV dataset to understand its structure, data quality issues, and its limitations. No data cleaning or analysis is performed here.

In [9]:
import pandas as pd

In [10]:
df = pd.read_csv("../data/raw/Electric_Vehicle_Population_Data.csv")
df.head()

Unnamed: 0,VIN (1-10),County,City,State,Postal Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Legislative District,DOL Vehicle ID,Vehicle Location,Electric Utility,2020 Census Tract
0,5YJYGDEE8L,Thurston,Tumwater,WA,98501.0,2020,TESLA,MODEL Y,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,291.0,35.0,124633715,POINT (-122.89165 47.03954),PUGET SOUND ENERGY INC,53067010000.0
1,5YJXCAE2XJ,Snohomish,Bothell,WA,98021.0,2018,TESLA,MODEL X,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,238.0,1.0,474826075,POINT (-122.18384 47.8031),PUGET SOUND ENERGY INC,53061050000.0
2,5YJ3E1EBXK,King,Kent,WA,98031.0,2019,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,220.0,47.0,280307233,POINT (-122.17743 47.41185),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033030000.0
3,7SAYGDEE4T,King,Issaquah,WA,98027.0,2026,TESLA,MODEL Y,Battery Electric Vehicle (BEV),Eligibility unknown as battery range has not b...,0.0,41.0,280786565,POINT (-122.03439 47.5301),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033020000.0
4,WAUUPBFF9G,King,Seattle,WA,98103.0,2016,AUDI,A3,Plug-in Hybrid Electric Vehicle (PHEV),Not eligible due to low battery range,16.0,43.0,198988891,POINT (-122.35436 47.67596),CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),53033000000.0


In [11]:
df.shape
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 270262 entries, 0 to 270261
Data columns (total 16 columns):
 #   Column                                             Non-Null Count   Dtype  
---  ------                                             --------------   -----  
 0   VIN (1-10)                                         270262 non-null  object 
 1   County                                             270252 non-null  object 
 2   City                                               270252 non-null  object 
 3   State                                              270262 non-null  object 
 4   Postal Code                                        270252 non-null  float64
 5   Model Year                                         270262 non-null  int64  
 6   Make                                               270262 non-null  object 
 7   Model                                              270262 non-null  object 
 8   Electric Vehicle Type                              270262 non-null  object

In [12]:
df.describe(include="all")

Unnamed: 0,VIN (1-10),County,City,State,Postal Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Legislative District,DOL Vehicle ID,Vehicle Location,Electric Utility,2020 Census Tract
count,270262,270252,270252,270262,270252.0,270262.0,270262,270262,270262,270262,270257.0,269613.0,270262.0,270174,270252,270252.0
unique,16415,242,864,51,,,47,183,2,3,,,,1080,77,
top,7SAYGDEE7P,King,Seattle,WA,,,TESLA,MODEL Y,Battery Electric Vehicle (BEV),Eligibility unknown as battery range has not b...,,,,POINT (-122.13158 47.67858),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),
freq,1171,133903,42125,269613,,,111049,57335,215859,169872,,,,6588,96367,
mean,,,,,98176.713849,2021.964468,,,,,40.386332,28.850107,244119900.0,,,52972610000.0
std,,,,,2569.741818,3.05396,,,,,79.342202,14.895435,64308720.0,,,1625614000.0
min,,,,,1030.0,1999.0,,,,,0.0,1.0,4385.0,,,1001020000.0
25%,,,,,98052.0,2021.0,,,,,0.0,17.0,219441400.0,,,53033010000.0
50%,,,,,98133.0,2023.0,,,,,0.0,32.0,261505100.0,,,53033030000.0
75%,,,,,98382.0,2024.0,,,,,33.0,42.0,277621000.0,,,53053940000.0


In [13]:
df.isna().sum().sort_values(ascending=False)

Legislative District                                 649
Vehicle Location                                      88
County                                                10
City                                                  10
2020 Census Tract                                     10
Electric Utility                                      10
Postal Code                                           10
Electric Range                                         5
Model                                                  0
Make                                                   0
Model Year                                             0
State                                                  0
VIN (1-10)                                             0
Clean Alternative Fuel Vehicle (CAFV) Eligibility      0
Electric Vehicle Type                                  0
DOL Vehicle ID                                         0
dtype: int64

In [14]:
df.duplicated().sum()

np.int64(0)

## Initial Observations

- Dataset contains ~270k EV registrations with 16 columns
- Each row represents a single registered electric vehicle
- Several location-related fields (Postal Code, Census Tract) are stored as floats, indicating missing values
- Electric Range has missing values, likely depending on vehicle type
- Model Year is available, but registration date is not, which limits true adoption trend analysis

## Missing Values & Duplicates

- Overall, missing data is minimal (<0.3% of rows)
- Legislative District has the highest missing count (649), likely due to administrative boundaries
- Location-related fields have small, consistent missing counts (10â€“88 rows)
- Electric Range is missing for a small number of vehicles
- No duplicate rows were detected, indicating strong data integrity
