### Extract Transform Load(ETL)

The project goal is to collect data to compare the proportions of the number of electrical vehicles in Washington state to the  number of available charging stations.

### Electrical Vehicle Title and Registration Activity-Data

The data "Electric Vehicle Title and Registration Activity" for Washington state was extracted from the Data.WA.gov (https://data.wa.gov/ ) using API. The sodaPy library was used to extract the data from the Data.WA.gov using the Api token.The obtained data in the dictionary format was converted to dataframe and was saved to "Electric_vehicle_df".For the analysis, the Battery Electric Vehicle (BEV) was filtered from the column electrical_vehicle_type.Once the data was cleaned, it was stored to "new_EV_df".

### Alternate Fuel station-Data

The data " Alternate Fuel"  for Washington state was extracted from the NREL site (https://developer.nrel.gov/).
Using Api, the following fields: station_name, zip code, citywere populated. The final dataset was stored in the dataframe "charging_station_df".


The dataframe ""new_EV_df" and "charging_station_df" were merged into one dataframe "evehicle_cs_data".

The data was loaded into MongoDB.

In [1]:
#import dependencies
import os
import requests
import json
import pandas as pd
import numpy as np
from api_keys import SODAPY_APPTOKEN, api_key
from sodapy import Socrata
import pymongo


### Store the Electric_Vehicle_Title_and_Registration_Activity API into Dataframe

In [2]:
#getting the domain and data_identifier
#https://data.wa.gov/resource/rpr4-cgyd.json?electric_vehicle_type=Battery Electric Vehicle (BEV)
#https://dev.socrata.com/foundry/data.wa.gov/rpr4-cgyd-Api keys

socrata_domain = 'data.wa.gov'
socrata_dataset_identifier = 'rpr4-cgyd'


#  export SODAPY_APPTOKEN
socrata_token = os.environ.get("SODAPY_APPTOKEN")

In [3]:
#get the link
client = Socrata(socrata_domain, socrata_token)
print("Domain: {domain:}\nSession: {session:}\nURI Prefix: {uri_prefix:}".format(**client.__dict__))




Domain: data.wa.gov
Session: <requests.sessions.Session object at 0x000001E950487F10>
URI Prefix: https://


In [4]:
#store the data into dataframe
results = client.get(socrata_dataset_identifier)
Electric_vehicle_df = pd.DataFrame.from_dict(results)
Electric_vehicle_df.head()


Unnamed: 0,electric_vehicle_type,vin_1_10,model_year,make,model,new_or_used_vehicle,sale_price,transaction_date,transaction_type,transaction_year,...,hb_2042_clean_alternative_fuel_vehicle_cafv_eligibility,meets_2019_hb_2042_electric_range_requirement,meets_2019_hb_2042_sale_date_requirement,meets_2019_hb_2042_sale_price_value_requirement,odometer_reading,odometer_code,transportation_electrification_fee_paid,hybrid_vehicle_electrification_fee_paid,census_tract_2020,date_of_vehicle_sale
0,Battery Electric Vehicle (BEV),WMWXP3C0XM,2021,MINI,Hardtop,Used,0,2022-08-04T00:00:00.000,Registration Renewal,2022,...,"TRANSACTION NOT ELIGIBLE: Non-sale, registrati...",True,False,False,0,Odometer reading is not collected at time of r...,Yes,No,53033021906,
1,Plug-in Hybrid Electric Vehicle (PHEV),KNDCM3LD1J,2018,KIA,Niro,Used,0,2021-11-19T00:00:00.000,Registration Renewal,2021,...,VEHICLE MODEL NOT ELIGIBLE: Low battery range;...,False,False,False,0,Odometer reading is not collected at time of r...,No,Yes,53033002900,
2,Plug-in Hybrid Electric Vehicle (PHEV),KNDCM3LD1J,2018,KIA,Niro,New,26999,2018-12-31T00:00:00.000,Original Title,2018,...,VEHICLE MODEL NOT ELIGIBLE: Low battery range;...,False,False,True,17,Actual Mileage,Not Applicable,Not Applicable,53033002900,2018-12-23T00:00:00.000
3,Plug-in Hybrid Electric Vehicle (PHEV),KNDCM3LD1J,2018,KIA,Niro,New,0,2018-12-31T00:00:00.000,Original Registration,2018,...,VEHICLE MODEL NOT ELIGIBLE: Low battery range;...,False,False,False,0,Odometer reading is not collected at time of r...,No,No,53033002900,
4,Plug-in Hybrid Electric Vehicle (PHEV),KNDCM3LD1J,2018,KIA,Niro,Used,0,2019-12-23T00:00:00.000,Registration Renewal,2019,...,VEHICLE MODEL NOT ELIGIBLE: Low battery range;...,False,False,False,0,Odometer reading is not collected at time of r...,No,Yes,53033002900,


In [5]:
Electric_vehicle_df.columns.values

array(['electric_vehicle_type', 'vin_1_10', 'model_year', 'make', 'model',
       'new_or_used_vehicle', 'sale_price', 'transaction_date',
       'transaction_type', 'transaction_year',
       'electric_vehicle_fee_paid', 'county', 'city', 'zip',
       'electric_range', 'base_msrp', 'non_clean_alternative_fuel',
       'vehicle_primary_use', 'state_of_residence', 'dol_vehicle_id',
       'legislative_district',
       'hb_2042_clean_alternative_fuel_vehicle_cafv_eligibility',
       'meets_2019_hb_2042_electric_range_requirement',
       'meets_2019_hb_2042_sale_date_requirement',
       'meets_2019_hb_2042_sale_price_value_requirement',
       'odometer_reading', 'odometer_code',
       'transportation_electrification_fee_paid',
       'hybrid_vehicle_electrification_fee_paid', 'census_tract_2020',
       'date_of_vehicle_sale'], dtype=object)

In [6]:
EV_df = Electric_vehicle_df[["electric_vehicle_type", "vin_1_10","make", "model_year","vehicle_primary_use","county","city","zip","state_of_residence","transaction_type"]].copy()
EV_df.head()

Unnamed: 0,electric_vehicle_type,vin_1_10,make,model_year,vehicle_primary_use,county,city,zip,state_of_residence,transaction_type
0,Battery Electric Vehicle (BEV),WMWXP3C0XM,MINI,2021,Passenger,King,WOODINVILLE,98072,WA,Registration Renewal
1,Plug-in Hybrid Electric Vehicle (PHEV),KNDCM3LD1J,KIA,2018,Passenger,King,SEATTLE,98103,WA,Registration Renewal
2,Plug-in Hybrid Electric Vehicle (PHEV),KNDCM3LD1J,KIA,2018,Passenger,King,SEATTLE,98103,WA,Original Title
3,Plug-in Hybrid Electric Vehicle (PHEV),KNDCM3LD1J,KIA,2018,Passenger,King,SEATTLE,98103,WA,Original Registration
4,Plug-in Hybrid Electric Vehicle (PHEV),KNDCM3LD1J,KIA,2018,Passenger,King,SEATTLE,98103,WA,Registration Renewal


In [9]:
orignal_df = EV_df[(EV_df["electric_vehicle_type"]=="Battery Electric Vehicle (BEV)")&(EV_df["transaction_type"]=="Original Registration")]
orignal_df

Unnamed: 0,electric_vehicle_type,vin_1_10,make,model_year,vehicle_primary_use,county,city,zip,state_of_residence,transaction_type
8,Battery Electric Vehicle (BEV),5YJ3E1EA3J,TESLA,2018,Passenger,King,KIRKLAND,98034,WA,Original Registration
14,Battery Electric Vehicle (BEV),5YJ3E1EC9N,TESLA,2022,Passenger,King,BELLEVUE,98006,WA,Original Registration
17,Battery Electric Vehicle (BEV),1G1FZ6S00K,CHEVROLET,2019,Passenger,King,BELLEVUE,98006,WA,Original Registration
24,Battery Electric Vehicle (BEV),1N4AZ0CP4D,NISSAN,2013,Passenger,Snohomish,LYNNWOOD,98087,WA,Original Registration
29,Battery Electric Vehicle (BEV),JN1AZ0CP8B,NISSAN,2011,Passenger,Snohomish,LYNNWOOD,98087,WA,Original Registration
...,...,...,...,...,...,...,...,...,...,...
964,Battery Electric Vehicle (BEV),5YJYGDEE7M,TESLA,2021,Passenger,King,SEATTLE,98103,WA,Original Registration
968,Battery Electric Vehicle (BEV),5YJXCBE24K,TESLA,2019,Passenger,King,RENTON,98059,WA,Original Registration
975,Battery Electric Vehicle (BEV),1N4BZ0CP6G,NISSAN,2016,Passenger,King,KIRKLAND,98033,WA,Original Registration
980,Battery Electric Vehicle (BEV),5YJ3E1EB1K,TESLA,2019,Passenger,King,KIRKLAND,98033,WA,Original Registration


In [10]:
new_EV_df = orignal_df.drop(['transaction_type'], axis=1)
new_EV_df 

Unnamed: 0,electric_vehicle_type,vin_1_10,make,model_year,vehicle_primary_use,county,city,zip,state_of_residence
8,Battery Electric Vehicle (BEV),5YJ3E1EA3J,TESLA,2018,Passenger,King,KIRKLAND,98034,WA
14,Battery Electric Vehicle (BEV),5YJ3E1EC9N,TESLA,2022,Passenger,King,BELLEVUE,98006,WA
17,Battery Electric Vehicle (BEV),1G1FZ6S00K,CHEVROLET,2019,Passenger,King,BELLEVUE,98006,WA
24,Battery Electric Vehicle (BEV),1N4AZ0CP4D,NISSAN,2013,Passenger,Snohomish,LYNNWOOD,98087,WA
29,Battery Electric Vehicle (BEV),JN1AZ0CP8B,NISSAN,2011,Passenger,Snohomish,LYNNWOOD,98087,WA
...,...,...,...,...,...,...,...,...,...
964,Battery Electric Vehicle (BEV),5YJYGDEE7M,TESLA,2021,Passenger,King,SEATTLE,98103,WA
968,Battery Electric Vehicle (BEV),5YJXCBE24K,TESLA,2019,Passenger,King,RENTON,98059,WA
975,Battery Electric Vehicle (BEV),1N4BZ0CP6G,NISSAN,2016,Passenger,King,KIRKLAND,98033,WA
980,Battery Electric Vehicle (BEV),5YJ3E1EB1K,TESLA,2019,Passenger,King,KIRKLAND,98033,WA


In [11]:
new_EV_df.count()

electric_vehicle_type    178
vin_1_10                 178
make                     178
model_year               178
vehicle_primary_use      178
county                   178
city                     178
zip                      178
state_of_residence       178
dtype: int64

In [12]:
new_EV_df['vehicle_primary_use'].value_counts()  


Passenger    176
Truck          2
Name: vehicle_primary_use, dtype: int64

In [13]:
new_EV_df['electric_vehicle_type'].value_counts() 

Battery Electric Vehicle (BEV)    178
Name: electric_vehicle_type, dtype: int64

In [14]:
new_EV_df['make'].value_counts()

TESLA         102
NISSAN         36
CHEVROLET       8
FORD            5
KIA             5
BMW             4
VOLKSWAGEN      3
VOLVO           3
HYUNDAI         3
POLESTAR        2
MINI            2
PORSCHE         1
SMART           1
FIAT            1
RIVIAN          1
AUDI            1
Name: make, dtype: int64

In [15]:
new_EV_df['county'].value_counts()

King         140
Snohomish     17
Clark         10
Thurston       5
Kitsap         4
Cowlitz        1
Pierce         1
Name: county, dtype: int64

### Store Alternate fuel data from NREL API to Dataframe

In [16]:
# Calling the data from the site "https://developer.nrel.gov" using api key
target_url = f"https://developer.nrel.gov/api/alt-fuel-stations/v1.json?ELEC&state=WA&api_key=api_keys"

In [17]:
# Run a request to endpoint and convert result to json
Electric_charging_data = requests.get(target_url).json()

# Print the json
print(len(Electric_charging_data["fuel_stations"]))
#print(Electric_charging_data["fuel_stations"]["city"])

2053


In [18]:
#for loop to get the desired column from the api
CS_data=[]
for i in range(len(Electric_charging_data["fuel_stations"])):
    
    zip_code = Electric_charging_data["fuel_stations"][i]["zip"]
    city = Electric_charging_data["fuel_stations"][i]["city"]
    station_name = Electric_charging_data["fuel_stations"][i]["station_name"]
    state = Electric_charging_data["fuel_stations"][i]["state"]
    
    # Append the City information into city_data list
    CS_data.append({"zip_code": zip_code, 
                    "city": city,
                    "state":state,
                    "station_name": station_name})


In [19]:
# Convert array of dictionaries to a pandas DataFrame.
charging_station_df = pd.DataFrame(CS_data)
charging_station_df.head(50)

Unnamed: 0,zip_code,city,state,station_name
0,99163,Pullman,WA,Avista Pullman Service Center
1,99202,Spokane,WA,Avista Central Operating Facility
2,99403,Clarkston,WA,Avista Clarkston Service Center
3,98233,Burlington,WA,Burlington Country Store
4,98277,Oak Harbor,WA,Skagit Farmers Country Store
5,98856,Twisp,WA,Hank's Mini Market
6,98902,Yakima,WA,AmeriGas
7,98901,Yakima,WA,All American Propane Inc
8,98930,Grandview,WA,Bleyhl Co-op
9,98271,Marysville,WA,Suburban Propane


In [20]:
charging_station_df['city'].value_counts()

Seattle                     484
Bellevue                    198
Tacoma                       97
Vancouver                    59
Spokane                      57
                           ... 
Westport                      1
Union                         1
Stevenson                     1
Seaview                       1
Joint Base Lewis-McChord      1
Name: city, Length: 218, dtype: int64

In [21]:
charging_station_df['zip_code'].value_counts()

98004    117
98121     99
98109     78
98005     50
99354     49
        ... 
98595      1
98648      1
98644      1
98564      1
98592      1
Name: zip_code, Length: 318, dtype: int64

In [22]:
charging_station_df.count()

zip_code        2053
city            2053
state           2053
station_name    2053
dtype: int64

### Merge the data form Electric_vehicle_data and charging_station_df into one Dataframe

In [23]:
evehicle_cs_data=pd.merge(new_EV_df,charging_station_df,left_on='zip', right_on='zip_code',how='inner')
evehicle_cs_data

Unnamed: 0,electric_vehicle_type,vin_1_10,make,model_year,vehicle_primary_use,county,city_x,zip,state_of_residence,zip_code,city_y,state,station_name
0,Battery Electric Vehicle (BEV),5YJ3E1EA3J,TESLA,2018,Passenger,King,KIRKLAND,98034,WA,98034,Kirkland,WA,Alliance AutoGas - Kirkland 76
1,Battery Electric Vehicle (BEV),5YJ3E1EA3J,TESLA,2018,Passenger,King,KIRKLAND,98034,WA,98034,Kirkland,WA,MAIN HOSPITAL SILVER PARKING
2,Battery Electric Vehicle (BEV),5YJ3E1EA3J,TESLA,2018,Passenger,King,KIRKLAND,98034,WA,98034,Kirkland,WA,MAIN HOSPITAL DEYOUNG1
3,Battery Electric Vehicle (BEV),5YJ3E1EA3J,TESLA,2018,Passenger,King,KIRKLAND,98034,WA,98034,Kirkland,WA,MAIN HOSPITAL BLUE PARKING
4,Battery Electric Vehicle (BEV),5YJ3E1EA3J,TESLA,2018,Passenger,King,KIRKLAND,98034,WA,98034,Kirkland,WA,MAIN HOSPITAL SILVER #2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1878,Battery Electric Vehicle (BEV),1N4AZ0CP7F,NISSAN,2015,Passenger,King,REDMOND,98053,WA,98053,Redmond,WA,Albertsons 3925236Th Ave NE Redmond WA
1879,Battery Electric Vehicle (BEV),1N4AZ0CPXD,NISSAN,2013,Passenger,King,KENMORE,98028,WA,98028,Kenmore,WA,U-Haul
1880,Battery Electric Vehicle (BEV),1N4AZ0CPXD,NISSAN,2013,Passenger,King,KENMORE,98028,WA,98028,Kenmore,WA,Safeway #3500
1881,Battery Electric Vehicle (BEV),1N4AZ0CPXD,NISSAN,2013,Passenger,King,KENMORE,98028,WA,98028,Kenmore,WA,Lodge at St Edward Park


### Loadd the data to Database

In [24]:
# Convert dataframe to dictinary
dict_data=evehicle_cs_data.to_dict("records")
dict_data

[{'electric_vehicle_type': 'Battery Electric Vehicle (BEV)',
  'vin_1_10': '5YJ3E1EA3J',
  'make': 'TESLA',
  'model_year': '2018',
  'vehicle_primary_use': 'Passenger',
  'county': 'King',
  'city_x': 'KIRKLAND',
  'zip': '98034',
  'state_of_residence': 'WA',
  'zip_code': '98034',
  'city_y': 'Kirkland',
  'state': 'WA',
  'station_name': 'Alliance AutoGas - Kirkland 76'},
 {'electric_vehicle_type': 'Battery Electric Vehicle (BEV)',
  'vin_1_10': '5YJ3E1EA3J',
  'make': 'TESLA',
  'model_year': '2018',
  'vehicle_primary_use': 'Passenger',
  'county': 'King',
  'city_x': 'KIRKLAND',
  'zip': '98034',
  'state_of_residence': 'WA',
  'zip_code': '98034',
  'city_y': 'Kirkland',
  'state': 'WA',
  'station_name': 'MAIN HOSPITAL SILVER PARKING'},
 {'electric_vehicle_type': 'Battery Electric Vehicle (BEV)',
  'vin_1_10': '5YJ3E1EA3J',
  'make': 'TESLA',
  'model_year': '2018',
  'vehicle_primary_use': 'Passenger',
  'county': 'King',
  'city_x': 'KIRKLAND',
  'zip': '98034',
  'state_of_

In [25]:
# create a table and collection in mongoDB 
#drop the data into the database
conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

# Define the 'classDB' database in Mongo
db = client.ev_cs_DB
#drop the collection if exists
db.table.drop()

#collection 

collection = db['table']

# Insert collection
collection.insert_many(dict_data)

<pymongo.results.InsertManyResult at 0x1e95331b190>