<img src="../images/action-bicycle-bike.jpg" width="900">

# <span style="color:#37535e">Historical Weather Data</span>
<span style="color:#3b748a">Along with investigating the bike usage data, it is clear that weather affects usage rates. To collect historical weather data, I use <a href="https://www.crummy.com/software/BeautifulSoup/"><span style="color:#37535e">BeautifulSoup</span></a> to scrape from <a href="https://english.wunderground.com/"><span style="color:#37535e">Weather Underground</span></a>. </span>

<span style="color:#3b748a">For a given set of years, weather data for each city is scraped into a separate DataFrame and written to a csv file.</span>

<hr>

## <span style='color:#3b748a'>Links</span>
* <a href="https://www.crummy.com/software/BeautifulSoup/"><span style="color:#37535e">BeautifulSoup</span></a> 
* <a href="https://english.wunderground.com/"><span style="color:#37535e">Weather Underground</span></a>
* <a href="main.ipynb">Main notebook</a>

## <span style='color:#3b748a'>Weather links</span>
* <a href="https://english.wunderground.com/history/airport/KATL/2018/7/1/MonthlyHistory.html?req_city=Atlanta&req_state=GA&req_statename=&reqdb.zip=30301&reqdb.magic=1&reqdb.wmo=99999"><span style="color:#37535e">Atlanta weather: KATL</span></a> 
* <a href="https://english.wunderground.com/history/airport/KBOS/2018/7/1/MonthlyHistory.html?req_city=Atlanta&req_state=GA&req_statename=&reqdb.zip=30301&reqdb.magic=1&reqdb.wmo=99999"><span style="color:#37535e">Boston weather: KBOS</span></a> 
* <a href="https://english.wunderground.com/history/airport/KORD/2018/7/1/MonthlyHistory.html?req_city=Atlanta&req_state=GA&req_statename=&reqdb.zip=30301&reqdb.magic=1&reqdb.wmo=99999"><span style="color:#37535e">Chicago weather: KORD</span></a> 
* <a href="https://english.wunderground.com/history/airport/KCQT/2018/7/1/MonthlyHistory.html?req_city=Atlanta&req_state=GA&req_statename=&reqdb.zip=30301&reqdb.magic=1&reqdb.wmo=99999"><span style="color:#37535e">Los Angeles weather: KCQT</span></a> 
* <a href="https://english.wunderground.com/history/airport/KPHL/2018/7/1/MonthlyHistory.html?req_city=Atlanta&req_state=GA&req_statename=&reqdb.zip=30301&reqdb.magic=1&reqdb.wmo=99999"><span style="color:#37535e">Philadelphia weather: KPHL</span></a> 
* <a href="https://english.wunderground.com/history/airport/KSFO/2018/7/1/MonthlyHistory.html?req_city=Atlanta&req_state=GA&req_statename=&reqdb.zip=30301&reqdb.magic=1&reqdb.wmo=99999"><span style="color:#37535e">San Francisco weather: KSFO</span></a> 
* <a href="https://english.wunderground.com/history/airport/KDCA/2018/7/1/MonthlyHistory.html?req_city=Atlanta&req_state=GA&req_statename=&reqdb.zip=30301&reqdb.magic=1&reqdb.wmo=99999"><span style="color:#37535e">Washington, DC weather: KDCA</span></a> 
* <a href="https://english.wunderground.com/history/airport/KNYC/2018/7/1/MonthlyHistory.html?req_city=Atlanta&req_state=GA&req_statename=&reqdb.zip=30301&reqdb.magic=1&reqdb.wmo=99999"><span style="color:#37535e">NYC weather: KNYC</span></a> 

<hr>

In [5]:
# Let's get the administrative stuff done first
# import all the libraries and set up the plotting

import requests
import pandas as pd
from bs4 import BeautifulSoup
from datetime import datetime


## <span style='color:#3b748a'>Scrape monthly weather data</span>
<ul>
    <li><span style='color:#4095b5'>For each month, visit the correct MonthlyData site on Weather Underground.</span></li>
    <li><span style='color:#4095b5'>Not sure why I have to use 'english" instead of "www".</span></li>
    <li><span style='color:#4095b5'>Scrape using BeautifulSoup.</span></li>
    <li><span style='color:#4095b5'>Wind data is mislabeled on the webpages.</span></li>
    <li><span style='color:#4095b5'>Precipitation and Events might be null or blank.</span></li>
    
</ul>

In [1]:
def read_write_wx(start_year, end_year, wx_code):
    weather_cols = ["date", 'temp_max', 'temp_avg', 'temp_min', 'dew_max', 'dew_avg', 'dew_min',
                    'hum_max', 'hum_avg', 'hum_min', 'sea_max', 'sea_avg', 'sea_min',
                    'vis_max', 'vis_avg', 'vis_min', 'wind_max', 'wind_avg', 'wind_unk',
                    'prec', 'events' ]

    list_wx = []
    for yr in range(start_year, end_year+1):
        for mn in range(1, 13):

            # Open wunderground.com url
            url = "http://english.wunderground.com/history/airport/" + wx_code + "/" + str(yr)+ "/" + str(mn) + "/1/MonthlyHistory.html?&reqdb.zip=&reqdb.magic=&reqdb.wmo="
            res = requests.get(url)
            res.status_code
            soup = BeautifulSoup(res.content, 'lxml')

            hist_table = soup.find_all('table', { 'id' : 'obsTable'})
            for h in hist_table:
                body = h.find_all('tbody')
                for b in body[1:]:
                    row_list = b.find_all('tr')
                    for row in row_list:
                        col_list = row.find_all('td')
                        wx = dict()
                        day = col_list[0].find('a').text
                        wx["date"] = datetime(yr, mn, int(day))
                        for i in range(1,20):
                            val = 0
                            elem = col_list[i].find('span')
                            if elem:
                                val = elem.text
                            wx[weather_cols[i]] = val
                        events = "None"
                        elem = col_list[20]
                        if elem:
                            events = elem.text.strip()
                            events = events.replace('\t','')
                            events = events.replace('\n','')
                            if len(events) == 0:
                                events = "None"
                        wx['events'] = events
                        list_wx.append(wx)

    df = pd.DataFrame(list_wx)
    return df

## <span style='color:#3b748a'>Cities and dates</span>

In [2]:
city_codes = { 
    'atl' : "KATL", 
    'bos' : "KBOS", 
    'chi' : "KORD", 
    'la' : "KCQT", 
    'phl' : "KPHL", 
    'sf' : "KSFO", 
    'dc' : "KDCA", 
    'nyc' : "KNYC"
        }

start_year = 2017
end_year = 2018

## <span style='color:#3b748a'>Loop through cities, scrape data, write data to csv file.</span>

In [8]:
for city, wx_code in city_codes.items():
    df = read_write_wx(start_year, end_year, wx_code)
    df.to_csv('../data/' + city + '/weather.csv', index=False)