## Scrapping St. Louis Weather Data

## Problem

Data exists in a variety of forms from a variety of places.  Sometimes data will need to be pulled from raw HTML to get at it.  We want to simulate an IoT device by grabbing data from the weather station at the St. Louis Science Center, which at the moment, we have no other way to get at this data.

### Solution

* grab HTML at the end of the site: [http://agebb.missouri.edu/weather/realtime/st_louis_science_center.php](http://agebb.missouri.edu/weather/realtime/st_louis_science_center.php)
* parse the data for temperature and humdity
* store the data on the IoT block chain

### Code

In [1]:
# uncomment and run necessary libraries if not already installed
# pip install html5lib
# pip install bs4

In [2]:
# importing necessary libraries

import datetime
import requests
import csv
import pandas as pd
from bs4 import BeautifulSoup

In [3]:
# Accessing the html content from webpage
url = "http://agebb.missouri.edu/weather/realtime/st_louis_science_center.php"

results = requests.get(url)
rdata = results.content
soup = BeautifulSoup(rdata, 'html5lib')
timestamp = datetime.datetime.now().isoformat()

In [4]:
# Accessing content in tags
# Desired feature group is in a collapsible div with'id':'current-collapsible'
# features class on webpage is "ui-body" while value class is "ui-bar"

table = soup.find('div', {'id':'current-collapsible'})  # accessing the collapsible section

data_file = []
# looping through the first 3 features (Temperature, dewpoint and humidity) in the table
for r in table.findAll('div', attrs = {'class':"ui-grid-a"})[:3]:
    stldata = {}
    stldata['feature'] = r.find('div', {'class':"ui-body"}).string
    stldata['value'] = r.find('div', {'class':"ui-bar"}).string
    data_file.append(stldata)

#
# payload will be of the form  
# {
#     "temp": 22.1,
#     "rh": 55,
#     "timestamp": "YYYY-MM-DDTHH:MM:SSZ"
# }    
#

payload = {}
for data in data_file:
    if data['feature'][:4] == 'Temp':
        payload['temp'] = data['value'].replace("°F", "")
    if data['feature'][:4] == 'Humi':
        payload['rh'] = data['value'].replace("%", "")
    payload['timestamp'] = timestamp

Show the payload

In [5]:
payload

{'temp': '72.3', 'timestamp': '2021-11-08T12:31:16.945480', 'rh': '28'}

Put the payload on the blockchain

In [6]:
block_chain_server = "34.69.195.189:5000"
r = requests.post(f"http://{block_chain_server}/add", data=payload)
if r.status_code == 200:
    resp = r.json()
    print(resp)
else:
    print(f"[info] unable to communicate with server or add data block : {r.status_code}")

{'status': 'ok', 'message': 'block added', 'hash': '59eaf01a2cd67649d3fffd4755c2301632230670f6b3dd95e57e12285f5ab83a'}


Test the hash data block:

In [7]:
r = requests.get(f"http://34.69.195.189:5000/{resp['hash']}", data=payload)
if r.status_code == 200:
    print(r.content)

b'Block<hash: 59eaf01a2cd67649d3fffd4755c2301632230670f6b3dd95e57e12285f5ab83a, prev_hash: fc4297b8986a03a21d88a0de8a0ddd984e9fdd863a3810a5078b3c35ecd49b7f, messages: 1, time: 1636399878.0322878>'
