# Predicting the price of Electric Cars

This dataset contains information on the Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs) that are currently registered with the Washington State Department of Licensing (DOL).

* VIN (1-10) - The 1st 10 characters of each vehicle's Vehicle Identification Number (VIN).
* County- The county in which the registered owner resides.
* City - The city in which the registered owner resides.
* State- The state in which the registered owner resides.
* ZIP Code - The 5-digit zip code in which the registered owner resides.
* Model Year - The model year of the vehicle is determined by decoding the Vehicle Identification Number (VIN).
* Make- The manufacturer of the vehicle, determined by decoding the Vehicle Identification Number (VIN).
* Model- The model of the vehicle is determined by decoding the Vehicle Identification Number (VIN).
* Electric Vehicle Type - This distinguishes the vehicle as all-electric or a plug-in hybrid.
* Clean Alternative Fuel Vehicle (CAFV) Eligibility - This categorizes vehicles as Clean Alternative Fuel Vehicles (CAFVs) based on the fuel requirement and electric-only range requirement.
* Electric Range - Describes how far a vehicle can travel purely on its electric charge.
* Base MSRP - This is the lowest Manufacturer's Suggested Retail Price (MSRP) for any trim level of the model in question.
* Legislative District - The specific section of Washington State that the vehicle's owner resides in, as represented in the state legislature.
* DOL Vehicle ID - Unique number assigned to each vehicle by the Department of Licensing for identification purposes.
* Vehicle Location - The center of the ZIP Code for the registered vehicle.
* Electric Utility - This is the electric power retail service territory serving the address of the registered vehicle.
* Expected Price - This is the expected price of the vehicle.

### Step 1: Import Libraries

In [21]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ydata_profiling import ProfileReport

  from .autonotebook import tqdm as notebook_tqdm


### Step 2: Load and Explore Data

In [4]:
cars_df = pd.read_csv('datasets/Electric_cars_dataset.csv')
cars_df.head()

Unnamed: 0,ID,VIN (1-10),County,City,State,ZIP Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Base MSRP,Legislative District,DOL Vehicle ID,Vehicle Location,Electric Utility,Expected Price ($1k)
0,EV33174,5YJ3E1EC6L,Snohomish,LYNNWOOD,WA,98037.0,2020.0,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,308,0,32.0,109821694,POINT (-122.287614 47.83874),PUGET SOUND ENERGY INC,50.0
1,EV40247,JN1AZ0CP8B,Skagit,BELLINGHAM,WA,98229.0,2011.0,NISSAN,LEAF,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,73,0,40.0,137375528,POINT (-122.414936 48.709388),PUGET SOUND ENERGY INC,15.0
2,EV12248,WBY1Z2C56F,Pierce,TACOMA,WA,98422.0,2015.0,BMW,I3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,81,0,27.0,150627382,POINT (-122.396286 47.293138),BONNEVILLE POWER ADMINISTRATION||CITY OF TACOM...,18.0
3,EV55713,1G1RD6E44D,King,REDMOND,WA,98053.0,2013.0,CHEVROLET,VOLT,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,38,0,45.0,258766301,POINT (-122.024951 47.670286),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),33.9
4,EV28799,1G1FY6S05K,Pierce,PUYALLUP,WA,98375.0,2019.0,CHEVROLET,BOLT EV,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,238,0,25.0,296998138,POINT (-122.321062 47.103797),BONNEVILLE POWER ADMINISTRATION||CITY OF TACOM...,41.78


In [7]:
df = cars_df.copy()
df.columns = df.columns.str.lower().str.replace(' ', '_')

In [8]:
df.head(1)

Unnamed: 0,id,vin_(1-10),county,city,state,zip_code,model_year,make,model,electric_vehicle_type,clean_alternative_fuel_vehicle_(cafv)_eligibility,electric_range,base_msrp,legislative_district,dol_vehicle_id,vehicle_location,electric_utility,expected_price_($1k)
0,EV33174,5YJ3E1EC6L,Snohomish,LYNNWOOD,WA,98037.0,2020.0,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,308,0,32.0,109821694,POINT (-122.287614 47.83874),PUGET SOUND ENERGY INC,50


In [12]:
df.rename(columns={
    'vin_(1-10)': 'vin',
    'clean_alternative_fuel_vehicle_(cafv)_eligibility': 'clean_alternative_fuel_vehicle_eligibility',
    'expected_price_($1k)':'expected_price'
},inplace=True)

In [13]:
df.head(1)

Unnamed: 0,id,vin,county,city,state,zip_code,model_year,make,model,electric_vehicle_type,clean_alternative_fuel_vehicle_eligibility,electric_range,base_msrp,legislative_district,dol_vehicle_id,vehicle_location,electric_utility,expected_price
0,EV33174,5YJ3E1EC6L,Snohomish,LYNNWOOD,WA,98037.0,2020.0,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,308,0,32.0,109821694,POINT (-122.287614 47.83874),PUGET SOUND ENERGY INC,50


In [15]:
df.shape

(64353, 18)

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64353 entries, 0 to 64352
Data columns (total 18 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   id                                          64353 non-null  object 
 1   vin                                         64353 non-null  object 
 2   county                                      64349 non-null  object 
 3   city                                        64344 non-null  object 
 4   state                                       64342 non-null  object 
 5   zip_code                                    64347 non-null  float64
 6   model_year                                  64346 non-null  float64
 7   make                                        64349 non-null  object 
 8   model                                       64340 non-null  object 
 9   electric_vehicle_type                       64353 non-null  object 
 10  clean_alte

In [17]:
df.describe(include='all')

Unnamed: 0,id,vin,county,city,state,zip_code,model_year,make,model,electric_vehicle_type,clean_alternative_fuel_vehicle_eligibility,electric_range,base_msrp,legislative_district,dol_vehicle_id,vehicle_location,electric_utility,expected_price
count,64353,64353,64349,64344,64342,64347.0,64346.0,64349,64340,64353,64353,64353.0,64353.0,64184.0,64353.0,63843,63631,64353.0
unique,64353,5644,139,544,38,,,34,107,2,3,,,,,668,68,210.0
top,EV33174,5YJYGDEE9M,King,SEATTLE,WA,,,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,,,,,POINT (-122.122018 47.678465),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),69.0
freq,1,340,33552,11887,64168,,,27903,13138,47869,39948,,,,,1712,22555,4816.0
mean,,,,,,98143.452888,2018.186212,,,,,106.948985,2524.990754,29.951904,197290500.0,,,
std,,,,,,2856.064329,2.726742,,,,,104.093919,12402.895104,14.661124,106946600.0,,,
min,,,,,,745.0,1993.0,,,,,0.0,0.0,0.0,4385.0,,,
25%,,,,,,98052.0,2017.0,,,,,14.0,0.0,19.0,137286500.0,,,
50%,,,,,,98121.0,2018.0,,,,,73.0,0.0,34.0,175377600.0,,,
75%,,,,,,98370.0,2021.0,,,,,215.0,0.0,43.0,229903900.0,,,


In [18]:
df.duplicated().sum()

np.int64(0)

In [19]:
df.isnull().sum()

id                                              0
vin                                             0
county                                          4
city                                            9
state                                          11
zip_code                                        6
model_year                                      7
make                                            4
model                                          13
electric_vehicle_type                           0
clean_alternative_fuel_vehicle_eligibility      0
electric_range                                  0
base_msrp                                       0
legislative_district                          169
dol_vehicle_id                                  0
vehicle_location                              510
electric_utility                              722
expected_price                                  0
dtype: int64

* There are 64353 rows and 18 columns in the dataset
* 12 columns are categorical with 6 being numerical
* There's missing values on the country, city, state, zip_code, model_year, make, model, legislative_district, vehicle_location and electric_utility

In [22]:
# Create a y_data_profiling report
profile = ProfileReport(df, title='Electric Cars in Washington State Department of Licensing Report')

In [23]:
#profile.to_file(output_file='profile-report.html')

Summarize dataset: 100%|█████████████| 64/64 [00:03<00:00, 18.96it/s, Completed]
Generate report structure: 100%|██████████████████| 1/1 [00:03<00:00,  3.39s/it]
Render HTML: 100%|████████████████████████████████| 1/1 [00:00<00:00,  1.79it/s]
Export report to file: 100%|█████████████████████| 1/1 [00:00<00:00, 217.73it/s]


### Step 3: Data Cleaning