# STA 141B Data & Web Technologies for Data Analysis

### Lecture 12, 02/17/26, Scraping

### Today's topics
 - Web Scraping: 
     - Foodwise
     - Tornado Watch

### Ressources
 - [Foodwise](https://foodwise.org/)
 - [Tornado Watch](https://www.tornadohq.com/)

### Writing Scrapers

Lets scrape the wiki table ourselves. Attention: We are using request, so pay attention to the file that is being returned. Check on devtools the html element for `<thead>` and see what is returned in the network. 

In [None]:
import requests
import lxml.html as lx
import pandas as pd

In [None]:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
}
result = requests.get(url = 'https://en.wikipedia.org/wiki/List_of_United_States_cities_by_area', headers = headers)
result.raise_for_status()
html = lx.fromstring(result.text)

In [None]:
result.text[:100]

In [None]:
tables = html.xpath('//table')
table = tables[0]

In [None]:
table

In [None]:
table.text_content()

In [None]:
html.xpath('//*[@id="mw-content-text"]/div[1]/table[2]/thead')

In [None]:
html.xpath('//table[2]/tbody/tr[4]//text()')

In [None]:
from re import sub 
def remove(string):
    '''
    Removes everything inside [], a whitespace before that and *'s.
    '''
    if isinstance(string, str):
        string = sub(r'\s*\[.*\]\**|\n|,|\*', '', string)
        # \s means every whitespace (incl. space and newline) followed by any text between square brackets and an trailing * OR just \n OR just comma,
        # * means zero or more occurences, . any character
        # this aims to remove the [a]* after Tribune and the /n in the columns
    return string

In [None]:
def retrieve_rows(html): 
    rows = html.xpath('//table[2]/tbody/tr')
    cells = []
    for row in rows: 
        # ./td|th means we start at the node (not searching the whole doc again), and choose td OR th children
        cells.append([remove(cell.text_content()) for cell in row.xpath('./td|th')]) # no text, as some cells are in <b>
    return cells

In [None]:
retrieve_rows(html)

In [None]:
df = pd.DataFrame(retrieve_rows(html))
df.head(10)

In [None]:
df.columns = df.iloc[0]

In [None]:
df = df.iloc[2:]

In [None]:
df

### Example: Foodwise

Foodwise, formerly CUESA (Center for Urban Education about Sustainable Agriculture) provides [a chart](https://foodwise.org/eat-seasonally/seasonality-chart-vegetables/) on when certain vegetables are in season. We want to create this chart for ourselves. All the info we need is on `foodwise`, so lets scrape! 

First, observe that the search mask (Food type, Month) invokes an API. However, the params are complicated to assemble, also, the returned object is an html. So we have to scrape the html. First check, using devtools, that the desired information is returned by the API (under `doc`). 

In [3]:
import requests
import lxml.html as lx
import requests_cache
import time
requests_cache.install_cache("../output/lecture9")

In [4]:
url = "https://foodwise.org/eat-seasonally/seasonality-charts/?_food_type=vegetable"

In [5]:
headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}

Here, the server needs the `user-agent` key in the header. 

#### First approach

In [6]:
response = requests.get(url, headers=headers)
response.raise_for_status()

In [7]:
response.text[:100]

'<!doctype html>\n<html lang="en-US">\n<head>\n\t<meta charset="UTF-8">\n\t<meta name="viewport" content="w'

#### How to get the product in the first place? 

Visit https://foodwise.org/eat-seasonally/seasonality-charts/?_food_type=vegetable
and use Inspect.

In [8]:
url = 'https://foodwise.org/eat-seasonally/seasonality-charts/?_food_type=vegetable'
response = requests.get(url, headers = headers)
response.raise_for_status()

In [9]:
html = lx.fromstring(response.text) # Parse the HTML
html

<Element html at 0x112970370>

In [10]:
produce = html.xpath('//div[@class="card-image-title__text-content"]/h3/text()')
produce   

['Artichokes',
 'Arugula',
 'Asparagus',
 'Beets',
 'Bitter melon',
 'Bok choy',
 'Broccoli',
 'Broccoli rabe',
 'Brussels sprouts',
 'Burdock',
 'Cabbage',
 'Cactus pads',
 'Cardoons',
 'Carrots',
 'Cauliflower',
 'Celeriac',
 'Celery',
 'Celtuce',
 'Chard',
 'Chickweed']

In [11]:
# [i.text for i in produce]
# N

These are only the very first entries. Click on next page.

In [12]:
def get_produce(page):
    url = 'https://foodwise.org/eat-seasonally/seasonality-charts/'
    response = requests.get(url, headers = headers, params = {
        '_food_type': 'vegetable',
        '_paged': page
    })
    response.raise_for_status()
    html = lx.fromstring(response.text) # Parse the HTML
    products = html.xpath('//div[@class="card-image-title__text-content"]/h3/text()')
    return products

In [13]:
get_produce(2)

['Chicory',
 'Collard greens',
 'Corn',
 'Cress',
 'Cresta di Gallo',
 'Cucumbers',
 'Dandelion greens',
 'Eggplant',
 'Endive',
 'Fava beans',
 'Fava greens',
 'Fennel',
 'Garlic',
 'Ginger root',
 'Green beans',
 'Herbs',
 'Horseradish',
 'Jicama',
 'Kale',
 'Kohlrabi']

There are four pages in total.

In [14]:
url = 'https://foodwise.org/eat-seasonally/seasonality-charts'
response = requests.get(url, headers = headers, params = {
    '_food_type': 'vegetable'
})
response.raise_for_status()
html = lx.fromstring(response.text) # Parse the HTML
pages = html.xpath('//div[@class="facetwp-facet facetwp-facet-query_pager facetwp-type-pager"]')
pages[0].text_content()

''

In [15]:
lst = [get_produce(i) for i in range(1,5)]

In [16]:
lst

[['Artichokes',
  'Arugula',
  'Asparagus',
  'Beets',
  'Bitter melon',
  'Bok choy',
  'Broccoli',
  'Broccoli rabe',
  'Brussels sprouts',
  'Burdock',
  'Cabbage',
  'Cactus pads',
  'Cardoons',
  'Carrots',
  'Cauliflower',
  'Celeriac',
  'Celery',
  'Celtuce',
  'Chard',
  'Chickweed'],
 ['Chicory',
  'Collard greens',
  'Corn',
  'Cress',
  'Cresta di Gallo',
  'Cucumbers',
  'Dandelion greens',
  'Eggplant',
  'Endive',
  'Fava beans',
  'Fava greens',
  'Fennel',
  'Garlic',
  'Ginger root',
  'Green beans',
  'Herbs',
  'Horseradish',
  'Jicama',
  'Kale',
  'Kohlrabi'],
 ['Komatsuna',
  'Lambsquarters',
  'Leeks',
  'Lettuce',
  'Mushrooms',
  'Mustard greens',
  'Nettles',
  'Okra',
  'Onions',
  'Orach',
  'Parsnips',
  'Pea shoots',
  'Peas',
  'Peppers, chile',
  'Peppers, sweet',
  'Potatoes',
  'Purslane',
  'Radishes',
  'Romanesco',
  'Rutabagas'],
 ['Salsify',
  'Scallions',
  'Shallots',
  'Shelling beans',
  'Spinach',
  'Sprouts',
  'Squash, summer',
  'Squash, 

In [17]:
produce = [item for pages in [get_produce(i) for i in range(1,5)] for item in pages]
produce

['Artichokes',
 'Arugula',
 'Asparagus',
 'Beets',
 'Bitter melon',
 'Bok choy',
 'Broccoli',
 'Broccoli rabe',
 'Brussels sprouts',
 'Burdock',
 'Cabbage',
 'Cactus pads',
 'Cardoons',
 'Carrots',
 'Cauliflower',
 'Celeriac',
 'Celery',
 'Celtuce',
 'Chard',
 'Chickweed',
 'Chicory',
 'Collard greens',
 'Corn',
 'Cress',
 'Cresta di Gallo',
 'Cucumbers',
 'Dandelion greens',
 'Eggplant',
 'Endive',
 'Fava beans',
 'Fava greens',
 'Fennel',
 'Garlic',
 'Ginger root',
 'Green beans',
 'Herbs',
 'Horseradish',
 'Jicama',
 'Kale',
 'Kohlrabi',
 'Komatsuna',
 'Lambsquarters',
 'Leeks',
 'Lettuce',
 'Mushrooms',
 'Mustard greens',
 'Nettles',
 'Okra',
 'Onions',
 'Orach',
 'Parsnips',
 'Pea shoots',
 'Peas',
 'Peppers, chile',
 'Peppers, sweet',
 'Potatoes',
 'Purslane',
 'Radishes',
 'Romanesco',
 'Rutabagas',
 'Salsify',
 'Scallions',
 'Shallots',
 'Shelling beans',
 'Spinach',
 'Sprouts',
 'Squash, summer',
 'Squash, winter',
 'Sunchokes',
 'Sweet potatoes',
 'Taro root',
 'Tatsoi',
 'To

#### How to get the products?

In [18]:
url = "https://foodwise.org/foods/corn/"
response = requests.get(url, headers=headers)

We have to provide the correct header! 

In [19]:
response.raise_for_status()

In [20]:
response.text # works after executed chunk below, as we use cache

'<!doctype html>\n<html lang="en-US">\n<head>\n\t<meta charset="UTF-8">\n\t<meta name="viewport" content="width=device-width, initial-scale=1">\n\t<link rel="profile" href="https://gmpg.org/xfn/11">\n\n\t<!-- refine, and properly optimize and include (using enqueue_script) the following files before production launch -->\n\t<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css"\n\t      integrity="sha512-1ycn6IcaQQ40/MKBW2W4Rhis/DbILU74C1vSrLJxCq57o941Ym01SwNsOMqvEBFlcgUa6xLiPY/NS5R+E6ztJQ=="\n\t      crossorigin="anonymous" referrerpolicy="no-referrer"/>\n\n\t<link rel="preconnect" href="https://fonts.googleapis.com">\n\t<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>\n\t<link\n\t\thref="https://fonts.googleapis.com/css2?family=Source+Sans+Pro:ital,wght@0,200;0,300;0,400;0,600;0,700;0,900;1,200;1,300;1,400;1,600;1,700;1,900&family=Waterfall&display=swap"\n\t\trel="stylesheet">\n\n\n\t<meta name=\'robots\' content=\

In [21]:
response = requests.get(url, headers = headers)
response.raise_for_status

<bound method Response.raise_for_status of CachedResponse(_content=b'<!doctype html>\n<html lang="en-US">\n<head>\n\t<meta charset="UTF-8">\n\t<meta name="viewport" content="width=device-width, initial-scale=1">\n\t<link rel="profile" href="https://gmpg.org/xfn/11">\n\n\t<!-- refine, and properly optimize and include (using enqueue_script) the following files before production launch -->\n\t<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css"\n\t      integrity="sha512-1ycn6IcaQQ40/MKBW2W4Rhis/DbILU74C1vSrLJxCq57o941Ym01SwNsOMqvEBFlcgUa6xLiPY/NS5R+E6ztJQ=="\n\t      crossorigin="anonymous" referrerpolicy="no-referrer"/>\n\n\t<link rel="preconnect" href="https://fonts.googleapis.com">\n\t<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>\n\t<link\n\t\thref="https://fonts.googleapis.com/css2?family=Source+Sans+Pro:ital,wght@0,200;0,300;0,400;0,600;0,700;0,900;1,200;1,300;1,400;1,600;1,700;1,900&family=Waterfall&display

In [22]:
response.text[:100]

'<!doctype html>\n<html lang="en-US">\n<head>\n\t<meta charset="UTF-8">\n\t<meta name="viewport" content="w'

In [23]:
response.url

'https://foodwise.org/foods/corn/'

Find the table 'In Season' from the HTML. (Use Inspect!)

In [24]:
html = lx.fromstring(response.text) # Parse the HTML
html

<Element html at 0x112ab1630>

In [25]:
html.xpath('//section[@class="sidebar__section"][h2[contains(text(), "In Season")]]/text()')[1]

'\n                    June • July • August • September • October            '

In [26]:
string = html.xpath('//section[@class="sidebar__section"][h2[contains(text(), "In Season")]]/text()')[1]
string

'\n                    June • July • August • September • October            '

In [27]:
from re import sub
st = sub(r'\W', ' ', string)
st

'                     June   July   August   September   October            '

In [28]:
sub(r'\W', ' ', st).split() # recall regex: \W is any non-alphanumeric value. In particular, we are removing everything but letters or numbers.

['June', 'July', 'August', 'September', 'October']

In [29]:
def get_months(product): 
    time.sleep(0.1)
    url = "https://foodwise.org/foods/" + product + "/"
    response = requests.get(url, headers = headers)
    response.raise_for_status()
    html = lx.fromstring(response.text)
    try: # N
        string = html.xpath('//section[@class="sidebar__section"][h2[contains(text(), "In Season")]]/text()')[1]
        month = sub(r'(In Season)|\W', ' ', string).split() # remove (In Season) or any non-alphanumeric content
    except:
        month = []
    
    return month

In [30]:
month = get_months('corn')
month 

['June', 'July', 'August', 'September', 'October']

#### Iterate over produce items

In [31]:
seasonality_info = [get_months(p) for p in produce]

HTTPError: 404 Client Error: Not Found for url: https://foodwise.org/foods/Peppers,%20chile/

Have a closer look:

In [32]:
url = 'https://foodwise.org/eat-seasonally/seasonality-charts/'
response = requests.get(url, headers = headers, params = {
    '_food_type': 'vegetable',
    '_paged': 3
})
response.raise_for_status()
html = lx.fromstring(response.text) # Parse the HTML


In [33]:
products = html.xpath('//a[@class="card-image-title__outer-link"]/@href')

In [34]:
products

['https://foodwise.org/foods/komatsuna/',
 'https://foodwise.org/foods/lambsquarters/',
 'https://foodwise.org/foods/leeks/',
 'https://foodwise.org/foods/lettuce/',
 'https://foodwise.org/foods/mushrooms/',
 'https://foodwise.org/foods/mustard-greens/',
 'https://foodwise.org/foods/nettles/',
 'https://foodwise.org/foods/okra/',
 'https://foodwise.org/foods/onions/',
 'https://foodwise.org/foods/orach/',
 'https://foodwise.org/foods/parsnips/',
 'https://foodwise.org/foods/pea-shoots/',
 'https://foodwise.org/foods/peas/',
 'https://foodwise.org/foods/peppers-chile/',
 'https://foodwise.org/foods/peppers-sweet/',
 'https://foodwise.org/foods/potatoes/',
 'https://foodwise.org/foods/purslane/',
 'https://foodwise.org/foods/radishes/',
 'https://foodwise.org/foods/romanesco/',
 'https://foodwise.org/foods/rutabagas/']

We have to account for new links.... Retrieve the `href` attribute from the anchor. Again: Use __Inspect__.

In [36]:
def get_products(page):
    url = 'https://foodwise.org/eat-seasonally/seasonality-charts/'
    response = requests.get(url, headers = headers, params = {
        '_food_type': 'vegetable',
        '_paged': page
    })
    response.raise_for_status()
    html = lx.fromstring(response.text) # Parse the HTML
    return(html.xpath('//a[@class="card-image-title__outer-link"]/@href'))

In [37]:
get_products(1)

['https://foodwise.org/foods/artichokes/',
 'https://foodwise.org/foods/arugula/',
 'https://foodwise.org/foods/asparagus/',
 'https://foodwise.org/foods/beets/',
 'https://foodwise.org/foods/bitter-melon/',
 'https://foodwise.org/foods/bok-choy/',
 'https://foodwise.org/foods/broccoli/',
 'https://foodwise.org/foods/broccoli-rabe/',
 'https://foodwise.org/foods/brussels-sprouts/',
 'https://foodwise.org/foods/burdock/',
 'https://foodwise.org/foods/cabbage/',
 'https://foodwise.org/foods/cactus-pads/',
 'https://foodwise.org/foods/cardoons/',
 'https://foodwise.org/foods/carrots/',
 'https://foodwise.org/foods/cauliflower/',
 'https://foodwise.org/foods/celeriac/',
 'https://foodwise.org/foods/celery/',
 'https://foodwise.org/foods/celtuce/',
 'https://foodwise.org/foods/chard/',
 'https://foodwise.org/foods/chickweed/']

In [38]:
lst = [el for p in range(1,5) for el in get_products(p)]

In [52]:
lst

['https://foodwise.org/foods/artichokes/',
 'https://foodwise.org/foods/arugula/',
 'https://foodwise.org/foods/asparagus/',
 'https://foodwise.org/foods/beets/',
 'https://foodwise.org/foods/bitter-melon/',
 'https://foodwise.org/foods/bok-choy/',
 'https://foodwise.org/foods/broccoli/',
 'https://foodwise.org/foods/broccoli-rabe/',
 'https://foodwise.org/foods/brussels-sprouts/',
 'https://foodwise.org/foods/burdock/',
 'https://foodwise.org/foods/cabbage/',
 'https://foodwise.org/foods/cactus-pads/',
 'https://foodwise.org/foods/cardoons/',
 'https://foodwise.org/foods/carrots/',
 'https://foodwise.org/foods/cauliflower/',
 'https://foodwise.org/foods/celeriac/',
 'https://foodwise.org/foods/celery/',
 'https://foodwise.org/foods/celtuce/',
 'https://foodwise.org/foods/chard/',
 'https://foodwise.org/foods/chickweed/',
 'https://foodwise.org/foods/chicory/',
 'https://foodwise.org/foods/collard-greens/',
 'https://foodwise.org/foods/corn/',
 'https://foodwise.org/foods/cress/',
 'ht

In [53]:
def get_months(produce_link): 
    time.sleep(0.1)
    response = requests.get(produce_link, headers = headers)
    try: response.raise_for_status()
    except requests.HTTPError:
        return [None, []] 
    else:
        html = lx.fromstring(response.text)
        try: 
            string = html.xpath('//section[@class="sidebar__section"][h2[contains(text(), "In Season")]]/text()')[1]
        except:
            return [None, []] 
        else:
            month = sub(r'(In Season)|\W', ' ', string).split() 
            name = html.xpath("//h1/text()")[0]
            return [name, month]

In [54]:
seasonality_info = [get_months(url) for url in lst]

In [55]:
seasonality_info

[['Artichokes',
  ['March',
   'April',
   'May',
   'June',
   'September',
   'October',
   'November',
   'December']],
 ['Arugula',
  ['January',
   'February',
   'March',
   'April',
   'May',
   'June',
   'July',
   'August',
   'September',
   'October',
   'November',
   'December']],
 ['Asparagus', ['February', 'March', 'April', 'May', 'June']],
 ['Beets',
  ['January',
   'February',
   'March',
   'April',
   'May',
   'June',
   'July',
   'August',
   'September',
   'October',
   'November',
   'December']],
 ['Bitter melon',
  ['June', 'July', 'August', 'September', 'October', 'November']],
 ['Bok choy',
  ['January',
   'February',
   'March',
   'April',
   'May',
   'June',
   'July',
   'August',
   'September',
   'October',
   'November',
   'December']],
 ['Broccoli',
  ['January',
   'February',
   'March',
   'April',
   'May',
   'June',
   'July',
   'August',
   'September',
   'October',
   'November',
   'December']],
 ['Broccoli rabe',
  ['January',
   '

#### How to combine everything?

In [59]:
year = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 
        'October', 'November', 'December']

In [60]:
month

['June', 'July', 'August', 'September', 'October']

In [61]:
[item in month for item in year]

[False, False, False, False, False, True, True, True, True, True, False, False]

In [62]:
def assemble_row(produce_link): 
    name, months = get_months(produce_link)
    months = [item in months for item in year]
    months.insert(0, name)
    return months

In [64]:
df = [assemble_row(i) for i in lst] 
df

[['Artichokes',
  False,
  False,
  True,
  True,
  True,
  True,
  False,
  False,
  True,
  True,
  True,
  True],
 ['Arugula',
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True],
 ['Asparagus',
  False,
  True,
  True,
  True,
  True,
  True,
  False,
  False,
  False,
  False,
  False,
  False],
 ['Beets',
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True],
 ['Bitter melon',
  False,
  False,
  False,
  False,
  False,
  True,
  True,
  True,
  True,
  True,
  True,
  False],
 ['Bok choy',
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True],
 ['Broccoli',
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True,
  True],
 ['Broccoli rabe',
  True,
  True,
  True,
  True,
  True,
  True,
  False,
  False,
  True,
  True,
  True,
  True],
 ['Brussels sprouts',
  True,
  True,
  True,
  True,
  True,
  False,
  False,
  Fal

In [65]:
import pandas as pd
tbl = pd.DataFrame(df)
tbl.shape

(76, 13)

In [66]:
tbl.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,Artichokes,False,False,True,True,True,True,False,False,True,True,True,True
1,Arugula,True,True,True,True,True,True,True,True,True,True,True,True
2,Asparagus,False,True,True,True,True,True,False,False,False,False,False,False
3,Beets,True,True,True,True,True,True,True,True,True,True,True,True
4,Bitter melon,False,False,False,False,False,True,True,True,True,True,True,False


In [67]:
columnames = year.copy()
columnames.insert(0, 'Produce')
tbl.columns = columnames

In [68]:
tbl

Unnamed: 0,Produce,January,February,March,April,May,June,July,August,September,October,November,December
0,Artichokes,False,False,True,True,True,True,False,False,True,True,True,True
1,Arugula,True,True,True,True,True,True,True,True,True,True,True,True
2,Asparagus,False,True,True,True,True,True,False,False,False,False,False,False
3,Beets,True,True,True,True,True,True,True,True,True,True,True,True
4,Bitter melon,False,False,False,False,False,True,True,True,True,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
71,Tatsoi,True,True,True,True,False,False,False,False,False,True,True,True
72,Tomatillos,False,False,False,False,False,True,True,True,True,True,True,False
73,Tomatoes,False,False,False,False,False,True,True,True,True,True,False,False
74,Turnips,True,True,True,True,True,True,True,True,True,True,True,True


### Tornado Watch 

We are interested in scraping and plotting the locations of all tornado warnings in the last 48 hours. 

See the link <a href="https://www.tornadohq.com/">here<a>.

In [None]:
import requests
import lxml.html as lx
import time
import pandas as pd

In [None]:
result = requests.get('https://www.tornadohq.com/')
result.raise_for_status

In [None]:
html = lx.fromstring(result.text) # Parse the HTML

In [None]:
warnings = html.xpath('//pre')
warnings

In [None]:
warning = warnings[0].text
warning

In [None]:
for w in warnings:
    print(w.text)
    print("\n\n-----THIS IS A NEW WARNING-----\n\n")

Lets match the latitude-longitude pair after `LAT...LON`. 

In [None]:
from re import findall

In [None]:
findall('(?<=LAT\.{3}LON\s)(\d+\s\d+)', warning)

In [None]:
findall('(?<=LAT\.{3}LON\s)(\d+\s\d+)', warning)[0].split()
# (?<=...)	Positive Lookbehind.
# group consisting of ? optional character, LAT...LON followed by any whitespace. \d: any digit, at least one occurence, whitespace, \d any digit, at least one occurence

Rename the coordinates in readable format. 

In [None]:
coord_list = [findall('(?<=LAT\.{3}LON\s)(\d+\s\d+)', warning.text)[0].split() for warning in warnings]

In [None]:
coord = pd.DataFrame(coord_list)
coord.columns = ['N', 'W']
coord = coord.map(lambda x: float(x) / 100) # convert location in readable format
coord['W'] = -coord['W'] # longitude to west is negative
coord.head()

Plot the results (consider a [mapbox token](https://studio.mapbox.com/) to plot.)!

In [None]:
coord

In [None]:
import plotly.express as px
import geopandas as gpd

# px.set_mapbox_access_token(open("./../keys/mapbox.txt").read())
fig = px.scatter_mapbox(coord,
                        lat='N',
                        lon='W',
                        zoom=2,
                        mapbox_style="open-street-map")
fig.show()


### Summary 

- Scraping does not necessarily return the desired, make use of error handling 
- Make use of the advantages of devtools to see how the website is structured