Webscraping tesla.com to get the locations of their U.S. superchargers.  Once I have the addresses, I enter them into Nominatim to get their coordinates.  I then calculate if the supercharger is within any city's radius as defined in 'cities_data.csv'

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
from sqlalchemy import create_engine
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
import geopy.distance
import math

Webscraping all Tesla superchargers in the U.S.

In [2]:
url = 'https://www.tesla.com/findus/list/superchargers/United+States'
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}

In [3]:
request = requests.get(url, headers=headers)

In [4]:
request

<Response [200]>

In [2]:
#request.text

In [6]:
soup = BeautifulSoup(request.text, 'html.parser')

In [3]:
#print(soup.prettify())

In [8]:
superchargers = soup.findAll('address', attrs={'class':'vcard'})

In [4]:
#print(superchargers)

In [12]:
supercharger_details = {
    'name':[],
    'street_address':[],
    'locality':[]
}

for sc in superchargers:
    name = sc.find('a', attrs={'class':'fn org url'}).text
    supercharger_details['name'].append(name)
    
    street_address = sc.find('span', attrs={'class':'street-address'}).text
    supercharger_details['street_address'].append(street_address)
    
    locality = sc.find('span', attrs={'class':'locality'}).text
    supercharger_details['locality'].append(locality)

In [1]:
#supercharger_details

Putting superchargers into DataFrame

In [14]:
df = pd.DataFrame(supercharger_details)

In [15]:
df['full_address'] = df['street_address'] + ', ' +  df['locality']
df_address = df.drop(['street_address', 'locality'], 1)

In [2]:
df_address

In [19]:
df_address = df_address[~df_address['name'].str.contains('coming soon')]

Getting the coordinates of all the superchargers

In [20]:
geolocator = Nominatim(user_agent='jmarfice@lion.lmu.edu')
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)

In [21]:
df_address['coordinates'] = df_address['full_address'].apply(geocode)



In [3]:
df_address['latitude'] = df_address['coordinates'].apply(lambda x: x.latitude if x != None else None)
df_address['longitude'] = df_address['coordinates'].apply(lambda x: x.longitude if x != None else None)

NameError: name 'df_address' is not defined

Checking if service centers are within cities

In [26]:
cities_df = pd.read_csv('cities_data.csv', index_col=0)
cities_df.head()

Unnamed: 0_level_0,City,Latitude,Longitude,Radius
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,New York,40.73061,-73.93524,25
2,Los Angeles,34.05224,-118.24368,30
3,Chicago,41.88183,-87.62318,20
4,Houston,29.74991,-95.35842,25
5,Phoenix,33.44838,-112.07404,20


In [27]:
cities_df.loc[cities_df.index == 1]['Radius']

ID
1    25
Name: Radius, dtype: int64

In [28]:
df_address.loc[df_address.index == 1]['name']

1    Auburn Alabama Supercharger
Name: name, dtype: object

For each city, check if each supercharger is within the city's radius.

In [29]:
for index_city, row_city in cities_df.iterrows():
    for index_sc, row_sc in df_address.iterrows():
        coords_city = (row_city['Latitude'], row_city['Longitude'])
        coords_sc = (row_sc['latitude'], row_sc['longitude'])
        if math.isnan(coords_sc[0]):
            continue
        else:
            distance = geopy.distance.geodesic(coords_city, coords_sc).miles
            if (distance <= row_city['Radius']):
                df_address.loc[index_sc, 'city_id'] = index_city
            else:
                continue       



In [31]:
df_address.drop(['coordinates'], 1, inplace=True)



In [32]:
df_address

In [33]:
engine = create_engine('mysql+mysqldb://USERNAME:PASSWORD@HOST/DATABASE?charset=UTF8')

In [34]:
df_address.to_sql('superchargers', engine, if_exists='append', index=False)

