## Scraping Events on visitseattle.org

https://visitseattle.org/

- Where are our data of interest?
  - List page
  - Detail page
- How to turn pages for list page?
    - URL parameters
    - Pagination
- How to get data from detail page?
    - HTML structure
    - CSS selector

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import json

In [3]:
urls = []
for page in range(1,42):
    res = requests.get(f"https://visitseattle.org/events/page/{page}")
    soup = BeautifulSoup(res.text, "html.parser")
    links = soup.select("#searchform div.search-result-preview > div > h3 > a")
    urls += [link["href"] for link in links]
print(len(urls))
print(urls)

361
['https://visitseattle.org/events/glen-teriyaki/', 'https://visitseattle.org/events/greta-matassa-sextet/', 'https://visitseattle.org/events/holding-absence/', 'https://visitseattle.org/events/nellie-mckay/', 'https://visitseattle.org/events/amber-liu/', 'https://visitseattle.org/events/disability-justice/', 'https://visitseattle.org/events/hughes-bros-presents/', 'https://visitseattle.org/events/sarya-wu/', 'https://visitseattle.org/events/the-sweet-lillies/', 'https://visitseattle.org/events/dinosaur-jr/', 'https://visitseattle.org/events/black-dogs/', 'https://visitseattle.org/events/blue-elephant-and-the-seven-snakes/', 'https://visitseattle.org/events/brock-lanzetti-ogawa/', 'https://visitseattle.org/events/fact-and-fiction-the-lord-of-the-rings/', 'https://visitseattle.org/events/groundation/', 'https://visitseattle.org/events/kayla-min-andrews/', 'https://visitseattle.org/events/ol-doris/', 'https://visitseattle.org/events/rosetan/', 'https://visitseattle.org/events/untold-s

In [5]:
with open("visitseattle.html", "w") as f:
    f.write(res.text)

In [17]:
# Get detail page
events = []
for url in urls:
    res = requests.get(url)
    soup = BeautifulSoup(res.text, "html.parser")
    title = soup.select_one("div.medium-6.columns.event-top > h1").text
    detail = soup.select("div.medium-6.columns.event-top > h4")[0].text.split(' | ')
    date, location = detail[0], detail[1]
    type_region = soup.select("div.medium-6.columns.event-top > a")
    e_type, e_region = type_region[0].text, type_region[1].text
    print(title, date, location)
    print(e_type, e_region)
    events.append({'title': title, 'date': date, 'location': location, 'type': e_type, 'region': e_region})
    
events = pd.DataFrame(events)
events.to_csv('./events.csv')

Glen Teriyaki 1/16/2024  Sea Monster Lounge
Music Wallingford / Greenlake
Greta Matassa Sextet 1/16/2024  Dimitriou's Jazz Alley
Music Downtown
Holding Absence 1/16/2024  Neumos
Music Capitol Hill / Central District
Nellie McKay 1/16/2024  The Triple Door
Music Downtown
Amber Liu 1/17/2024  The Crocodile
Music Downtown
Disability Justice 1/17/2024  Town Hall Seattle
Readings & Lectures Downtown
Hughes Bros Presents 1/17/2024  Sea Monster Lounge
Music Wallingford / Greenlake
sarya wu 1/17/2024  Chop Suey
Music Capitol Hill / Central District
The Sweet Lillies 1/17/2024  High Dive
Music Fremont / Ballard
Dinosaur Jr. Now through 1/18/2024  The Neptune Theatre
Music University District
Black Dogs 1/18/2024  Sea Monster Lounge
Music Wallingford / Greenlake
Blue Elephant and The Seven Snakes 1/18/2024  Funhouse
Music Capitol Hill / Central District
Brock, Lanzetti, Ogawa 1/18/2024  The Royal Room
Music South Seattle
Fact and Fiction: The Lord of the Rings 1/18/2024  National Nordic Museum
F

## Web API

### Weather.gov
https://www.weather.gov/documentation/services-web-api
https://api.weather.gov/points/{latitude},{longitude}

### Geo location

https://nominatim.openstreetmap.org/search.php?q=seattle&format=jsonv2


In [13]:
soup.select("div.medium-6.columns.event-top > a:nth-child(3)")

[<a class="button big medium black category" href="/?s=&amp;frm=events&amp;event_type=music">Music</a>]

## Practice

Please finish the scraper for this page

In [13]:
events = pd.read_csv('./events.csv')
locations = events['location']
regions = events['region']
weathers = []
drop_list = []
for i in range(len(locations)):
    location_name = locations[i]
    res = requests.get(f"https://nominatim.openstreetmap.org/search.php?q={location_name}&format=jsonv2")
    if len(res.json()) == 0:
        res = requests.get(f"https://nominatim.openstreetmap.org/search.php?q={regions[i]} seattle&format=jsonv2")
    if len(res.json()) == 0:
        res = requests.get(f"https://nominatim.openstreetmap.org/search.php?q={regions[i]}&format=jsonv2")
    location = res.json()
    lat, lon = location[0]['lat'], location[0]['lon']
    res = requests.get(f"https://api.weather.gov/points/{lat},{lon}")
    weather_point = res.json()
    try:
        forecast_url = weather_point['properties']['forecast']
    except:
        print(res.status_code)
        print(location_name)
        print(i)
        drop_list.append(i)
        continue
    res = requests.get(forecast_url)
    weather = res.json()['properties']['periods'][0].copy()
    del weather['number']
    del weather['name']
    weathers.append(weather.copy())

weathers = pd.DataFrame(weathers)
events.drop(drop_list, inplace=True)
events_weathers = pd.merge(events, weathers)
events_weathers.to_csv('events_with_weathers.csv')
# params = {
#     'q': 'seattle'
# }

In [12]:
res

<Response [200]>

In [17]:
res = requests.get(f"https://api.weather.gov/points/{lat},{lon}")
weather_point = res.json()
weather_point

{'@context': ['https://geojson.org/geojson-ld/geojson-context.jsonld',
  {'@version': '1.1',
   'wx': 'https://api.weather.gov/ontology#',
   's': 'https://schema.org/',
   'geo': 'http://www.opengis.net/ont/geosparql#',
   'unit': 'http://codes.wmo.int/common/unit/',
   '@vocab': 'https://api.weather.gov/ontology#',
   'geometry': {'@id': 's:GeoCoordinates', '@type': 'geo:wktLiteral'},
   'city': 's:addressLocality',
   'state': 's:addressRegion',
   'distance': {'@id': 's:Distance', '@type': 's:QuantitativeValue'},
   'bearing': {'@type': 's:QuantitativeValue'},
   'value': {'@id': 's:value'},
   'unitCode': {'@id': 's:unitCode', '@type': '@id'},
   'forecastOffice': {'@type': '@id'},
   'forecastGridData': {'@type': '@id'},
   'publicZone': {'@type': '@id'},
   'county': {'@type': '@id'}}],
 'id': 'https://api.weather.gov/points/47.6038,-122.3301',
 'type': 'Feature',
 'geometry': {'type': 'Point', 'coordinates': [-122.3301, 47.6038]},
 'properties': {'@id': 'https://api.weather.gov

In [18]:
forecast_url = weather_point['properties']['forecast']
forecast_url

'https://api.weather.gov/gridpoints/SEW/125,68/forecast'

In [19]:
res = requests.get(forecast_url)
res.json()

{'@context': ['https://geojson.org/geojson-ld/geojson-context.jsonld',
  {'@version': '1.1',
   'wx': 'https://api.weather.gov/ontology#',
   'geo': 'http://www.opengis.net/ont/geosparql#',
   'unit': 'http://codes.wmo.int/common/unit/',
   '@vocab': 'https://api.weather.gov/ontology#'}],
 'type': 'Feature',
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-122.338331, 47.6159569],
    [-122.33210799999999, 47.5954304],
    [-122.30157759999999, 47.5996357],
    [-122.30779399999999, 47.6201625],
    [-122.338331, 47.6159569]]]},
 'properties': {'updated': '2024-01-16T20:33:27+00:00',
  'units': 'us',
  'forecastGenerator': 'BaselineForecastGenerator',
  'generatedAt': '2024-01-16T21:41:20+00:00',
  'updateTime': '2024-01-16T20:33:27+00:00',
  'validTimes': '2024-01-16T14:00:00+00:00/P7DT11H',
  'elevation': {'unitCode': 'wmoUnit:m', 'value': 73.152},
  'periods': [{'number': 1,
    'name': 'This Afternoon',
    'startTime': '2024-01-16T13:00:00-08:00',
    'endTime': '2024-01-16T1

In [20]:
weather = res.json()['properties']['periods']
import pandas as pd
weather = pd.DataFrame(weather)
weather.to_csv(f'{location_name}_{lat}_{lon}.csv')