Webscraping tesla.com to get the locations of their U.S. superchargers.  Once I have the addresses, I enter them into Nominatim to get their coordinates.  I then calculate if the supercharger is within any city's radius as defined in 'cities_data.csv'

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
from sqlalchemy import create_engine
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
import geopy.distance
import math

Webscraping all Tesla superchargers in the U.S.

In [2]:
url = 'https://www.tesla.com/findus/list/superchargers/United+States'
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}

In [3]:
request = requests.get(url, headers=headers)

In [4]:
request

<Response [200]>

In [5]:
request.text

'\n<!DOCTYPE html>\n<html lang=en class="no-js ">\n    <head>\n        <!--\n            Copyright (C) 2011-2016 Hoefler & Co.\n            This software is the property of Hoefler & Co. (H&Co). Your right to\n            access and use this software is subject to the applicable License\n            Agreement, or Terms of Service, that exists between you and H&Co. If no\n            such agreement exists, you may not access or use this software for any\n            purpose. This software may only be hosted at the locations specified in\n            the applicable License Agreement or Terms of Service, and only for the\n            purposes expressly set forth therein. You may not copy, modify, convert,\n            create derivative works from or distribute this software in any way, or\n            make it accessible to any third party, without first obtaining the\n            written permission of H&Co. For more information, please visit us at:\n            http://typography.com. 2836

In [6]:
soup = BeautifulSoup(request.text, 'html.parser')

In [7]:
print(soup.prettify())

<!DOCTYPE html>
<html class="no-js" lang="en">
 <head>
  <!--
            Copyright (C) 2011-2016 Hoefler & Co.
            This software is the property of Hoefler & Co. (H&Co). Your right to
            access and use this software is subject to the applicable License
            Agreement, or Terms of Service, that exists between you and H&Co. If no
            such agreement exists, you may not access or use this software for any
            purpose. This software may only be hosted at the locations specified in
            the applicable License Agreement or Terms of Service, and only for the
            purposes expressly set forth therein. You may not copy, modify, convert,
            create derivative works from or distribute this software in any way, or
            make it accessible to any third party, without first obtaining the
            written permission of H&Co. For more information, please visit us at:
            http://typography.com. 283682-104959-20160422
       

In [8]:
superchargers = soup.findAll('address', attrs={'class':'vcard'})

In [9]:
print(superchargers)

[<address class="vcard">
<a class="fn org url" href="/findus/location/supercharger/athensalsupercharger">Athens, AL Supercharger</a> <span class="adr">
<span class="street-address">21282 Athens-Limestone Blvd</span>
<span class="extended-address"></span>
<span class="locality">Athens, AL </span>
</span>
<span class="tel">
<span class="type">Roadside Assistance</span>: <span class="value">(877) 798-3752</span><br/>
</span>
<span class="underline visible"></span>
</address>, <address class="vcard">
<a class="fn org url" href="/findus/location/supercharger/auburnalsupercharger">Auburn Alabama Supercharger</a> <span class="adr">
<span class="street-address">1627 Opelika Road</span>
<span class="extended-address"></span>
<span class="locality">Auburn, AL 36830</span>
</span>
<span class="tel">
<span class="type">Roadside Assistance</span>: <span class="value">(877) 798-3752</span><br/>
</span>
<span class="underline visible"></span>
</address>, <address class="vcard">
<a class="fn org url" 

In [12]:
supercharger_details = {
    'name':[],
    'street_address':[],
    'locality':[]
}

for sc in superchargers:
    name = sc.find('a', attrs={'class':'fn org url'}).text
    supercharger_details['name'].append(name)
    
    street_address = sc.find('span', attrs={'class':'street-address'}).text
    supercharger_details['street_address'].append(street_address)
    
    locality = sc.find('span', attrs={'class':'locality'}).text
    supercharger_details['locality'].append(locality)

In [13]:
supercharger_details

{'name': ['Athens, AL Supercharger',
  'Auburn Alabama Supercharger',
  'Birmingham, AL Supercharger',
  'Dothan, AL (coming soon)',
  'Greenville Supercharger',
  'Mobile Supercharger',
  'Montgomery, AL (coming soon)',
  'Oxford, AL Supercharger',
  'Steele Supercharger',
  'Tuscaloosa, AL (coming soon)',
  'Anchorage, AK (coming soon)',
  'Buckeye, AZ Supercharger',
  'Casa Grande, AZ Supercharger',
  'Cordes Lakes, AZ Supercharger',
  'Ehrenberg, AZ Supercharger',
  'Flagstaff, AZ Supercharger',
  'Gila Bend, AZ Supercharger',
  'Globe, AZ (coming soon)',
  'Holbrook, AZ Supercharger',
  'Kayenta, AZ (coming soon)',
  'Kingman, AZ Supercharger',
  'New River, AZ Supercharger',
  'Page Supercharger',
  'Payson, AZ Supercharger',
  'Phoenix, AZ (coming soon)',
  'Phoenix, AZ - Agua Fria Freeway Supercharger',
  'Phoenix, AZ - East Camelback Road Supercharger',
  'Quartzsite, AZ Supercharger',
  'Scottsdale, AZ - N. Scottsdale Road Supercharger',
  'Scottsdale, AZ - North Kierland Blv

Putting superchargers into DataFrame

In [14]:
df = pd.DataFrame(supercharger_details)

In [15]:
df['full_address'] = df['street_address'] + ', ' +  df['locality']
df_address = df.drop(['street_address', 'locality'], 1)

In [16]:
df_address

Unnamed: 0,name,full_address
0,"Athens, AL Supercharger","21282 Athens-Limestone Blvd, Athens, AL"
1,Auburn Alabama Supercharger,"1627 Opelika Road, Auburn, AL 36830"
2,"Birmingham, AL Supercharger","2221 Richard Arrington Junior Blvd, Birmingham..."
3,"Dothan, AL (coming soon)",", Dothan, AL"
4,Greenville Supercharger,"219 Interstate Drive, Greenville, AL 36037"
...,...,...
1155,"Lusk, WY Supercharger","730 S Main St, Lusk, WY 82225"
1156,Rawlins Supercharger,"2370 E Cedar St., Rawlins, WY 82301-6026"
1157,Rock Springs Supercharger,"2441 Foothill Blvd, Rock Springs, 82901-5659"
1158,Sheridan Supercharger,"612 North Main Street, Sheridan, WY 82801"


In [19]:
df_address = df_address[~df_address['name'].str.contains('coming soon')]

Getting the coordinates of all the superchargers

In [20]:
geolocator = Nominatim(user_agent='jmarfice@lion.lmu.edu')
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)

In [21]:
df_address['coordinates'] = df_address['full_address'].apply(geocode)

RateLimiter caught an error, retrying (0/2 tries). Called with (*('350 West Hillcrest Drive, Thousand Oaks, CA 91360',), **{}).
Traceback (most recent call last):
  File "C:\Users\jmarf\anaconda3\lib\site-packages\geopy\geocoders\base.py", line 355, in _call_geocoder
    page = requester(req, timeout=timeout, **kwargs)
  File "C:\Users\jmarf\anaconda3\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "C:\Users\jmarf\anaconda3\lib\urllib\request.py", line 543, in _open
    '_open', req)
  File "C:\Users\jmarf\anaconda3\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\jmarf\anaconda3\lib\urllib\request.py", line 1362, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Users\jmarf\anaconda3\lib\urllib\request.py", line 1322, in do_open
    r = h.getresponse()
  File "C:\Users\jmarf\anaconda3\lib\http\client.py", line 1344, in getresponse
    response.begin()
  File "C:\Users

In [23]:
df_address['latitude'] = df_address['coordinates'].apply(lambda x: x.latitude if x != None else None)
df_address['longitude'] = df_address['coordinates'].apply(lambda x: x.longitude if x != None else None)
df_address

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,name,full_address,coordinates,latitude,longitude
0,"Athens, AL Supercharger","21282 Athens-Limestone Blvd, Athens, AL","(Athens-Limestone Boulevard, Athens Limestone ...",34.786114,-86.941484
1,Auburn Alabama Supercharger,"1627 Opelika Road, Auburn, AL 36830","(Auburn Mall, 1627, Opelika Road, Town and Cou...",32.626972,-85.448010
2,"Birmingham, AL Supercharger","2221 Richard Arrington Junior Blvd, Birmingham...",,,
4,Greenville Supercharger,"219 Interstate Drive, Greenville, AL 36037","(Hampton Inn Greenville, 219, Interstate Drive...",31.856096,-86.635229
5,Mobile Supercharger,"3201 Airport Blvd, Mobile, AL 36606","(REEDS Jewelers - Bel Air Mall, 3201, Airport ...",30.675205,-88.119520
...,...,...,...,...,...
1155,"Lusk, WY Supercharger","730 S Main St, Lusk, WY 82225","(730, South Cedar Street, Lusk, Niobrara Count...",42.757015,-104.452049
1156,Rawlins Supercharger,"2370 E Cedar St., Rawlins, WY 82301-6026","(2370, East Cedar Street, Rawlins, Carbon Coun...",41.792283,-107.210176
1157,Rock Springs Supercharger,"2441 Foothill Blvd, Rock Springs, 82901-5659","(White Mountain Mall, 2441, Foothill Boulevard...",41.581218,-109.264146
1158,Sheridan Supercharger,"612 North Main Street, Sheridan, WY 82801","(612, North Main Street, City of Sheridan, She...",44.804021,-106.955827


In [25]:
df_address.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 810 entries, 0 to 1159
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   name          810 non-null    object 
 1   full_address  810 non-null    object 
 2   coordinates   627 non-null    object 
 3   latitude      627 non-null    float64
 4   longitude     627 non-null    float64
dtypes: float64(2), object(3)
memory usage: 38.0+ KB


Checking if service centers are within cities

In [26]:
cities_df = pd.read_csv('cities_data.csv', index_col=0)
cities_df.head()

Unnamed: 0_level_0,City,Latitude,Longitude,Radius
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,New York,40.73061,-73.93524,25
2,Los Angeles,34.05224,-118.24368,30
3,Chicago,41.88183,-87.62318,20
4,Houston,29.74991,-95.35842,25
5,Phoenix,33.44838,-112.07404,20


In [27]:
cities_df.loc[cities_df.index == 1]['Radius']

ID
1    25
Name: Radius, dtype: int64

In [28]:
df_address.loc[df_address.index == 1]['name']

1    Auburn Alabama Supercharger
Name: name, dtype: object

In [29]:
for index_city, row_city in cities_df.iterrows():
    for index_sc, row_sc in df_address.iterrows():
        coords_city = (row_city['Latitude'], row_city['Longitude'])
        coords_sc = (row_sc['latitude'], row_sc['longitude'])
        if math.isnan(coords_sc[0]):
            continue
        else:
            distance = geopy.distance.geodesic(coords_city, coords_sc).miles
            if (distance <= row_city['Radius']):
                df_address.loc[index_sc, 'city_id'] = index_city
            else:
                continue

df_address
        


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Unnamed: 0,name,full_address,coordinates,latitude,longitude,city_id
0,"Athens, AL Supercharger","21282 Athens-Limestone Blvd, Athens, AL","(Athens-Limestone Boulevard, Athens Limestone ...",34.786114,-86.941484,
1,Auburn Alabama Supercharger,"1627 Opelika Road, Auburn, AL 36830","(Auburn Mall, 1627, Opelika Road, Town and Cou...",32.626972,-85.448010,
2,"Birmingham, AL Supercharger","2221 Richard Arrington Junior Blvd, Birmingham...",,,,
4,Greenville Supercharger,"219 Interstate Drive, Greenville, AL 36037","(Hampton Inn Greenville, 219, Interstate Drive...",31.856096,-86.635229,
5,Mobile Supercharger,"3201 Airport Blvd, Mobile, AL 36606","(REEDS Jewelers - Bel Air Mall, 3201, Airport ...",30.675205,-88.119520,
...,...,...,...,...,...,...
1155,"Lusk, WY Supercharger","730 S Main St, Lusk, WY 82225","(730, South Cedar Street, Lusk, Niobrara Count...",42.757015,-104.452049,
1156,Rawlins Supercharger,"2370 E Cedar St., Rawlins, WY 82301-6026","(2370, East Cedar Street, Rawlins, Carbon Coun...",41.792283,-107.210176,
1157,Rock Springs Supercharger,"2441 Foothill Blvd, Rock Springs, 82901-5659","(White Mountain Mall, 2441, Foothill Boulevard...",41.581218,-109.264146,
1158,Sheridan Supercharger,"612 North Main Street, Sheridan, WY 82801","(612, North Main Street, City of Sheridan, She...",44.804021,-106.955827,


In [30]:
[x for x in df_address['city_id']]

[nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 5.0,
 5.0,
 nan,
 5.0,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 2.0,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 2.0,
 2.0,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 2.0,
 10.0,
 nan,
 2.0,
 2.0,
 nan,
 10.0,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 2.0,
 10.0,
 10.0,
 nan,
 nan,
 2.0,
 nan,
 nan,
 nan,
 2.0,
 2.0,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 10.0,
 2.0,
 nan,
 2.0,
 nan,
 nan,
 nan,
 10.0,
 10.0,
 nan,
 nan,
 nan,
 nan,
 10.0,
 10.0,
 nan,
 nan,
 nan,
 nan,
 nan,
 10.0,
 nan,
 nan,
 nan,
 nan,
 nan,
 2.0,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 2.0,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 8.0,
 8.0,
 nan,
 8.0,
 8.0,
 nan,
 10.0,
 10.0,
 10.0,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 2.0,
 nan,
 2.0,
 nan,
 nan,
 nan,
 10.0,
 n

In [31]:
df_address.drop(['coordinates'], 1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [32]:
df_address

Unnamed: 0,name,full_address,latitude,longitude,city_id
0,"Athens, AL Supercharger","21282 Athens-Limestone Blvd, Athens, AL",34.786114,-86.941484,
1,Auburn Alabama Supercharger,"1627 Opelika Road, Auburn, AL 36830",32.626972,-85.448010,
2,"Birmingham, AL Supercharger","2221 Richard Arrington Junior Blvd, Birmingham...",,,
4,Greenville Supercharger,"219 Interstate Drive, Greenville, AL 36037",31.856096,-86.635229,
5,Mobile Supercharger,"3201 Airport Blvd, Mobile, AL 36606",30.675205,-88.119520,
...,...,...,...,...,...
1155,"Lusk, WY Supercharger","730 S Main St, Lusk, WY 82225",42.757015,-104.452049,
1156,Rawlins Supercharger,"2370 E Cedar St., Rawlins, WY 82301-6026",41.792283,-107.210176,
1157,Rock Springs Supercharger,"2441 Foothill Blvd, Rock Springs, 82901-5659",41.581218,-109.264146,
1158,Sheridan Supercharger,"612 North Main Street, Sheridan, WY 82801",44.804021,-106.955827,


In [33]:
engine = create_engine('mysql+mysqldb://admin:sql_2020@lmu-sql.clqgvydstxhb.us-east-1.rds.amazonaws.com/sql_project?charset=UTF8')

In [34]:
df_address.to_sql('superchargers', engine, if_exists='append', index=False)

  cursor.execute("SET NAMES %s" % charset_name)
