- - -
<!--NAVIGATION-->
Tesla Superchargers in the United States |
[Retrieving the data](./tesla_webscraping.ipynb) | **[Cleaning the data](./tesla_clean.ipynb)**
- - -

# Cleaning the data

In [100]:
import pandas as pd
import numpy as np

In [101]:
tesla_df = pd.read_csv('tesla_raw.csv')
NA = 'not available'

In [102]:
tesla_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2070 entries, 0 to 2069
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   coming_soon       1834 non-null   object
 1   street_address_1  1389 non-null   object
 2   street_address_2  1 non-null      object
 3   city              1729 non-null   object
 4   state             1834 non-null   object
 5   station_name      1494 non-null   object
 6   zip               1494 non-null   object
 7   longitude         1493 non-null   object
 8   latitude          1494 non-null   object
 9   num_chargers      1494 non-null   object
 10  availability      1494 non-null   object
 11  power_kW          1494 non-null   object
 12  amenities         1494 non-null   object
 13  link              2070 non-null   object
dtypes: object(14)
memory usage: 226.5+ KB


In [103]:
tesla_df

Unnamed: 0,coming_soon,street_address_1,street_address_2,city,state,station_name,zip,longitude,latitude,num_chargers,availability,power_kW,amenities,link
0,False,FAIRFIELD INN21282 Athens-Limestone Blvd.,,Athens,AL,"Athens, AL Supercharger",35613,-86.942864,34.785416,8,available 24/7,150,"restaurants,wifi,lodging,restrooms",http://tesla.com/findus/location/supercharger/...
1,False,Tiger Crossing Shopping Center1617 South Coll...,,Auburn,AL,"Auburn, AL - South College Street Supercharger",36832,-85.498239,32.576339,12,available 24/7,250,"restaurants,shopping,beverage,restrooms",http://tesla.com/findus/location/supercharger/...
2,False,Auburn Mall1627 Opelika Road,,Auburn,AL,"Auburn, AL Supercharger",36830-2871,-85.445105,32.627837,6,available 24/7,150,"restaurants,wifi,shopping,restrooms",http://tesla.com/findus/location/supercharger/...
3,False,Uptown Entertainment District2221 Richard Arri...,,Birmingham,AL,"Birmingham, AL Supercharger",35203-1103,-86.807072,33.525826,8,available 24/7,150,"restaurants,wifi,lodging,restrooms",http://tesla.com/findus/location/supercharger/...
4,False,Holiday Inn Express & Suites Tuscaloosa 6350 ...,,Cottondale,AL,"Cottondale, AL Supercharger",35453,-87.450076,33.17466,8,available 24/7,250,"wifi,lodging,beverage,restrooms",http://tesla.com/findus/location/supercharger/...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2065,,,,,,,,,,,,,,http://tesla.com/about/legal
2066,,,,,,,,,,,,,,http://tesla.com/contact
2067,,,,,,,,,,,,,,http://tesla.com/careers
2068,,,,,,,,,,,,,,http://tesla.com/updates


##### Remove rows where all values, except link, are empty (These are links that were parsed but were not superchargers). This is equivalent to fropping rows with a missing value for 'coming_soon':

In [104]:
not_superchargers = tesla_df['coming_soon'].isna()
tesla_df.loc[not_superchargers]

Unnamed: 0,coming_soon,street_address_1,street_address_2,city,state,station_name,zip,longitude,latitude,num_chargers,availability,power_kW,amenities,link
23,,,,,,,,,,,,,,http://tesla.com/findus/location/supercharger/...
30,,,,,,,,,,,,,,http://tesla.com/findus/location/supercharger/...
39,,,,,,,,,,,,,,http://tesla.com/findus/location/supercharger/...
43,,,,,,,,,,,,,,http://tesla.com/findus/location/supercharger/...
44,,,,,,,,,,,,,,http://tesla.com/findus/location/supercharger/...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2065,,,,,,,,,,,,,,http://tesla.com/about/legal
2066,,,,,,,,,,,,,,http://tesla.com/contact
2067,,,,,,,,,,,,,,http://tesla.com/careers
2068,,,,,,,,,,,,,,http://tesla.com/updates


In [105]:
#for example, this is a service/gallery store
tesla_df.iloc[23,13]

'http://tesla.com/findus/location/supercharger/glendale9245'

##### Remove those rows

In [106]:
tesla_df = tesla_df.loc[tesla_df['coming_soon'].notna()]

In [107]:
tesla_df.loc[not_superchargers]

Unnamed: 0,coming_soon,street_address_1,street_address_2,city,state,station_name,zip,longitude,latitude,num_chargers,availability,power_kW,amenities,link


In [108]:
tesla_df.reset_index(inplace=True, drop=True)

##### Check for empty values for city where there is a value for station_name

In [109]:
tesla_df[((tesla_df['city'].isna()) | (tesla_df['city'] == NA)) & (tesla_df['station_name'] != NA)]

Unnamed: 0,coming_soon,street_address_1,street_address_2,city,state,station_name,zip,longitude,latitude,num_chargers,availability,power_kW,amenities,link
9,False,,,,AL,"Leeds, AL Supercharger",not available,-86.587744,33.542762,not available,not available,not available,"restaurants,shopping,beverage,restrooms",http://tesla.com/findus/location/supercharger/...
34,False,,,,AZ,"Phoenix, AZ - East Camelback Road Supercharger",not available,-112.026803,33.511837,not available,not available,not available,"restaurants,wifi,shopping,lodging,restrooms",http://tesla.com/findus/location/supercharger/...
35,False,,,,AZ,"Phoenix, AZ - East Mayo Boulevard Supercharger",not available,-111.927509,33.655304,not available,not available,not available,"restaurants,shopping,beverage",http://tesla.com/findus/location/supercharger/...
36,False,,,,AZ,"Quartzsite, AZ Supercharger",not available,-114.241801,33.660784,not available,not available,not available,"restaurants,restrooms",http://tesla.com/findus/location/supercharger/...
72,False,,,,CA,"Baker, CA Supercharger",not available,-116.08074,35.262655,not available,not available,not available,"restaurants,restrooms",http://tesla.com/findus/location/supercharger/...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1716,False,,,,VA,"Richmond, VA - South Providence Road Supercharger",not available,-77.54725,37.49607,not available,not available,not available,"restaurants,wifi,shopping,beverage,restrooms",http://tesla.com/findus/location/supercharger/...
1740,False,,,,WA,"Bellevue, WA Supercharger",not available,-122.202181,47.611048,not available,not available,not available,"restaurants,shopping",http://tesla.com/findus/location/supercharger/...
1745,False,,,,WA,"Cle Elum, WA Supercharger",not available,-120.902714,47.189262,not available,not available,not available,"restaurants,shopping,restrooms",http://tesla.com/findus/location/supercharger/...
1793,False,,,,WV,"Sutton, WV",not available,not available,not available,not available,not available,not available,"restaurants,shopping,beverage,restrooms",http://tesla.com/findus/location/supercharger/...


##### Derive city from station_name

In [110]:
tesla_df.loc[((tesla_df['city'].isna()) | (tesla_df['city'] == NA)) & (tesla_df['station_name'] != NA), 'city'] = \
     tesla_df.loc[((tesla_df['city'].isna()) | (tesla_df['city'] == NA)) & (tesla_df['station_name'] != NA), 'station_name'].apply(lambda value: value.split(',')[0])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tesla_df.loc[((tesla_df['city'].isna()) | (tesla_df['city'] == NA)) & (tesla_df['station_name'] != NA), 'city'] = \


In [111]:
tesla_df[((tesla_df['city'].isna()) | (tesla_df['city'] == NA)) & (tesla_df['station_name'] != NA)]

Unnamed: 0,coming_soon,street_address_1,street_address_2,city,state,station_name,zip,longitude,latitude,num_chargers,availability,power_kW,amenities,link


##### Check for empty values for state

In [112]:
tesla_df[tesla_df['state'] == NA]

Unnamed: 0,coming_soon,street_address_1,street_address_2,city,state,station_name,zip,longitude,latitude,num_chargers,availability,power_kW,amenities,link
650,False,Wawa3637 34th Street South,,Saint Petersburg,not available,"Saint Petersburg, FL Supercharger",not available,-82.680354,27.734678,8.0,available 24/7,250.0,"restaurants,wifi,beverage,restrooms",http://tesla.com/findus/location/supercharger/...
865,True,,,New Orleans,not available,,,,,,,,,http://tesla.com/findus/location/supercharger/...
1557,False,not available,not available,Austin,not available,"Austin, TX - Century Oaks Terrace Supercharger",not available,-97.724276,30.403355,16.0,available 24/7,72.0,"restaurants,wifi,shopping,restrooms",http://tesla.com/findus/location/supercharger/...


In [114]:
tesla_df.iloc[650]['state'] = 'FL'
tesla_df.iloc[865]['state'] = 'LA'
tesla_df.iloc[1557]['state'] = 'TX'

In [115]:
tesla_df[tesla_df['state'] == NA]

Unnamed: 0,coming_soon,street_address_1,street_address_2,city,state,station_name,zip,longitude,latitude,num_chargers,availability,power_kW,amenities,link


##### Check that stations with multiple capacities exist(i.e the power, availability, and num_chargers column should have  some comma separated vaules)

In [121]:
multi_chargers = (tesla_df['num_chargers'].notna()) & (tesla_df['num_chargers'].str.contains(','))
tesla_df[multi_chargers]

Unnamed: 0,coming_soon,street_address_1,street_address_2,city,state,station_name,zip,longitude,latitude,num_chargers,availability,power_kW,amenities,link
17,False,Carl's Jr. - Sundance Towne Center416 S Watso...,,Buckeye,AZ,"Buckeye, AZ Supercharger",85326-3419,-112.556876,33.443011,48,"available 24/7,available 24/7",250150,"restaurants,wifi,shopping,restrooms",http://tesla.com/findus/location/supercharger/...
22,False,Carl's Jr. - Gila Bend826 826 W Pima St.,,Gila Bend,AZ,"Gila Bend, AZ Supercharger",85337-3033,-112.734081,32.943675,88,"available 24/7,available 24/7",250150,"restaurants,wifi,restrooms",http://tesla.com/findus/location/supercharger/...
25,False,Burger King Holbrook2096 Navajo Blvd,,Holbrook,AZ,"Holbrook, AZ Supercharger",86025-2100,-110.145558,34.922962,84,"available 24/7,available 24/7",250150,"restaurants,wifi,restrooms",http://tesla.com/findus/location/supercharger/...
27,False,Carl's Jr. Kingman789 W Beale St,,Kingman,AZ,"Kingman, AZ Supercharger",86401-5942,-114.065592,35.191331,64,"available 24/7,available 24/7",15072,"restaurants,wifi,restrooms",http://tesla.com/findus/location/supercharger/...
51,False,Hilton Garden Inn/ Pivot Point Conference Cent...,,Yuma,AZ,"Yuma, AZ Supercharger",85364-1417,-114.619093,32.726686,48,"available 24/7,available 24/7",250150,"restaurants,wifi,lodging,restrooms",http://tesla.com/findus/location/supercharger/...
114,False,Rabobank Corning950 Hwy 99 W,,Corning,CA,"Corning, CA Supercharger",96021-2706,-122.1984,39.92646,62,"available 24/7,available 24/7",15072,"restaurants,wifi,restrooms",http://tesla.com/findus/location/supercharger/...
136,False,Imperial Valley Mall3551 S Dogwood Rd,,El Centro,CA,El Centro Supercharger,92243-9679,-115.532486,32.760837,84,"available 24/7,available 24/7",15072,"restaurants,wifi,shopping,beverage,restrooms",http://tesla.com/findus/location/supercharger/...
150,False,Palladio at Broadstone220 Palladio Parkway,,Folsom,CA,"Folsom, CA - Palladio Parkway Supercharger",95630-8784,-121.118344,38.647199,2010,"available 24/7,available 24/7",250150,"restaurants,wifi,shopping,restrooms",http://tesla.com/findus/location/supercharger/...
155,False,Tesla Fremont Delivery45500 Fremont Blvd,,Fremont,CA,Fremont,94538-6326,-121.944725,37.492439,84,"available 24/7,available 24/7",250150,"wifi,restrooms",http://tesla.com/findus/location/supercharger/...
169,False,Granada Village10823 Zelzah Avenue,,Granada Hills,CA,"Granada Hills, CA Supercharger",91344,-118.526496,34.266405,164,"available 24/7,available 24/7",25072,"restaurants,wifi,shopping,beverage,restrooms",http://tesla.com/findus/location/supercharger/...


In [123]:
tesla_df['power_kW'].value_counts()

250              651
150              566
72               120
not available    117
250,150           24
150,72            11
250,72             4
250,150,72         1
Name: power_kW, dtype: int64

##### Note that the chargers come in three types : 250kW, 150kW, and 72kW. Replace the num_chargers and power_kW columns with three new columns: chargers_250kW, chargers_150kW, chargers_72kW

In [125]:
tesla_df['chargers_250kW'] = None
tesla_df['chargers_150kW'] = None
tesla_df['chargers_72kW'] = None

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tesla_df['chargers_250kW'] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tesla_df['chargers_150kW'] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tesla_df['chargers_72kW'] = None


In [174]:
((tesla_df['num_chargers'].isna())&(tesla_df['coming_soon'] == False)).sum()

0

##### update charging info for stations that are not coming soon

In [175]:
not_coming_soon = (tesla_df['num_chargers'].isna()) & (tesla_df['coming_soon'] == False)

In [176]:
NUM_ROWS = tesla_df[not_coming_soon].shape[0]
chargers_250kW = np.zeros(NUM_ROWS)
chargers_150kW = np.zeros(NUM_ROWS)
chargers_72kW = np.zeros(NUM_ROWS)
num_chargers = 9
power_kW = 11
for j,row in enumerate(tesla_df[not_coming_soon].itertuples(index=False)):
    if row[num_chargers] == NA:
        chargers_250kW = NA
        chargers_150kW = NA
        chargers_72kW = NA
    else:
        chargers = row[num_chargers].split(',')
        powers = row[power_kW].split(',')
        for i,charger in enumerate(chargers):
            if powers[i] == '250':
                chargers_250kW[j] = charger
            elif powers[i] == '150':
                chargers_150kW[j] = charger
            elif powers[i] == '72':
                chargers_72kW[j] = charger

In [178]:
tesla_df.loc[not_coming_soon, 'chargers_250kW'] = chargers_250kW
tesla_df.loc[not_coming_soon, 'chargers_150kW'] = chargers_150kW
tesla_df.loc[not_coming_soon, 'chargers_72kW'] = chargers_72kW

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tesla_df.loc[not_coming_soon, 'chargers_250kW'] = chargers_250kW
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tesla_df.loc[not_coming_soon, 'chargers_150kW'] = chargers_150kW
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tesla_df.loc[not_coming_soon, 'chargers_72kW'] = chargers_72kW


In [181]:
tesla_df.head()

Unnamed: 0,coming_soon,street_address_1,street_address_2,city,state,station_name,zip,longitude,latitude,num_chargers,availability,power_kW,amenities,link,chargers_250kW,chargers_150kW,chargers_72kW
0,False,FAIRFIELD INN21282 Athens-Limestone Blvd.,,Athens,AL,"Athens, AL Supercharger",35613,-86.942864,34.785416,8,available 24/7,150,"restaurants,wifi,lodging,restrooms",http://tesla.com/findus/location/supercharger/...,0.0,8.0,0.0
1,False,Tiger Crossing Shopping Center1617 South Coll...,,Auburn,AL,"Auburn, AL - South College Street Supercharger",36832,-85.498239,32.576339,12,available 24/7,250,"restaurants,shopping,beverage,restrooms",http://tesla.com/findus/location/supercharger/...,12.0,0.0,0.0
2,False,Auburn Mall1627 Opelika Road,,Auburn,AL,"Auburn, AL Supercharger",36830-2871,-85.445105,32.627837,6,available 24/7,150,"restaurants,wifi,shopping,restrooms",http://tesla.com/findus/location/supercharger/...,0.0,6.0,0.0
3,False,Uptown Entertainment District2221 Richard Arri...,,Birmingham,AL,"Birmingham, AL Supercharger",35203-1103,-86.807072,33.525826,8,available 24/7,150,"restaurants,wifi,lodging,restrooms",http://tesla.com/findus/location/supercharger/...,0.0,8.0,0.0
4,False,Holiday Inn Express & Suites Tuscaloosa 6350 ...,,Cottondale,AL,"Cottondale, AL Supercharger",35453,-87.450076,33.17466,8,available 24/7,250,"wifi,lodging,beverage,restrooms",http://tesla.com/findus/location/supercharger/...,8.0,0.0,0.0


In [182]:
tesla_df[multi_chargers].head()

Unnamed: 0,coming_soon,street_address_1,street_address_2,city,state,station_name,zip,longitude,latitude,num_chargers,availability,power_kW,amenities,link,chargers_250kW,chargers_150kW,chargers_72kW
17,False,Carl's Jr. - Sundance Towne Center416 S Watso...,,Buckeye,AZ,"Buckeye, AZ Supercharger",85326-3419,-112.556876,33.443011,48,"available 24/7,available 24/7",250150,"restaurants,wifi,shopping,restrooms",http://tesla.com/findus/location/supercharger/...,4.0,8.0,0.0
22,False,Carl's Jr. - Gila Bend826 826 W Pima St.,,Gila Bend,AZ,"Gila Bend, AZ Supercharger",85337-3033,-112.734081,32.943675,88,"available 24/7,available 24/7",250150,"restaurants,wifi,restrooms",http://tesla.com/findus/location/supercharger/...,8.0,8.0,0.0
25,False,Burger King Holbrook2096 Navajo Blvd,,Holbrook,AZ,"Holbrook, AZ Supercharger",86025-2100,-110.145558,34.922962,84,"available 24/7,available 24/7",250150,"restaurants,wifi,restrooms",http://tesla.com/findus/location/supercharger/...,8.0,4.0,0.0
27,False,Carl's Jr. Kingman789 W Beale St,,Kingman,AZ,"Kingman, AZ Supercharger",86401-5942,-114.065592,35.191331,64,"available 24/7,available 24/7",15072,"restaurants,wifi,restrooms",http://tesla.com/findus/location/supercharger/...,0.0,6.0,4.0
51,False,Hilton Garden Inn/ Pivot Point Conference Cent...,,Yuma,AZ,"Yuma, AZ Supercharger",85364-1417,-114.619093,32.726686,48,"available 24/7,available 24/7",250150,"restaurants,wifi,lodging,restrooms",http://tesla.com/findus/location/supercharger/...,4.0,8.0,0.0


In [186]:
tesla_df['availability'].value_counts(dropna=False)

available 24/7                                  1337
NaN                                              340
not available                                    117
available 24/7,available 24/7                     39
available 24/7,available 24/7,available 24/7       1
Name: availability, dtype: int64

In [194]:
not_avail = (tesla_df['availability'].isna()) | (tesla_df['availability'] == NA)
has_charger = (tesla_df['num_chargers'] != NA) & (tesla_df['num_chargers'].notna())
(not_avail & has_charger).sum()

0

##### Delete redundant/constant columns:
* availability : if a chargers exists it is available 24/4
* num_chargers
* power_kW

In [196]:
tesla_df.drop(['availability', 'num_chargers', 'power_kW'], axis='columns', inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tesla_df.drop(['availability', 'num_chargers', 'power_kW'], axis='columns', inplace=True)


In [197]:
tesla_df.to_csv('tesla_clean.csv', index=False)

- - -
<!--NAVIGATION-->
Tesla Superchargers in the United States |
[Retrieving the data](./tesla_webscraping.ipynb) | **[Cleaning the data](./tesla_clean.ipynb)**
- - -