<img src="../images/sunset-banner.jpg" width="1000">

# <span style="color:#37535e">Daylight Data</span>
<span style="color:#3b748a">Along with investigating day of the week bike usage and weather to make predictions, it seems likely that hours of daylight also can affect usage rates. To collect historical daylight data, I use <a href="https://www.crummy.com/software/BeautifulSoup/"><span style="color:#37535e">BeautifulSoup</span></a> to scrape from <a href="https://www.timeanddate.com/"><span style="color:#37535e">timeanddate.com</span></a>. </span>

<span style="color:#3b748a">For a given set of years, daylight data for each city is scraped into a separate DataFrame and written to a csv file.</span>

<hr>

## <span style='color:#3b748a'>Links</span>
* <a href="https://www.crummy.com/software/BeautifulSoup/"><span style="color:#37535e">BeautifulSoup</span></a> 
* <a href="https://www.timeanddate.com/"><span style="color:#37535e">timeanddate.com</span></a>
* <a href="main.ipynb">Main notebook</a>

## <span style='color:#3b748a'>Daylight links</span>
* <a href="https://www.timeanddate.com/sun/usa/atlanta"><span style="color:#37535e">Atlanta</span></a> 
* <a href="https://www.timeanddate.com/sun/usa/boston"><span style="color:#37535e">Boston</span></a> 
* <a href="https://www.timeanddate.com/sun/usa/chicago"><span style="color:#37535e">Chicago</span></a> 
* <a href="https://www.timeanddate.com/sun/usa/los-angeles"><span style="color:#37535e">Los Angeles</span></a> 
* <a href="https://www.timeanddate.com/sun/usa/philadelphia"><span style="color:#37535e">Philadelphia</span></a> 
* <a href="https://www.timeanddate.com/sun/usa/san-francisco"><span style="color:#37535e">San Francisco</span></a> 
* <a href="https://www.timeanddate.com/sun/usa/washington-dc"><span style="color:#37535e">Washington, DC</span></a> 
* <a href="https://www.timeanddate.com/sun/usa/new-york"><span style="color:#37535e">NYC</span></a> 

<hr>

In [1]:
# Let's get the administrative stuff done first
# import all the libraries and set up the plotting

import requests
import pandas as pd
from bs4 import BeautifulSoup
from datetime import datetime


## <span style='color:#3b748a'>Scrape monthly daylight data</span>
<ul>
    <li><span style='color:#4095b5'>For each month, visit the correct month and city on timeanddate.</span></li>
    <li><span style='color:#4095b5'>Scrape using BeautifulSoup.</span></li>    
</ul>

In [2]:
def read_write_wx(start_year, end_year, wx_code):
    list_wx = []
    for yr in range(start_year, end_year+1):
        for mn in range(1, 13):

            # Open timedate.com url
            url = "https://www.timeanddate.com/sun/usa/" + wx_code + "?month=" + str(mn)+ "&year=" + str(yr) + "2018"
            res = requests.get(url)
            res.status_code
            soup = BeautifulSoup(res.content, 'lxml')

            h = soup.find('table', { 'id' : 'as-monthsun'})
            b = h.find('tbody')
            row_list = b.find_all('tr', { 'title' : "Click to expand for more details"})
            for row in row_list:
                date_list = row.find('th')
                day = date_list.text
                date = datetime(yr, mn, int(day))
                daylight = row.find('td', {'class' : 'c tr sep-l'}).text
                list_wx.append({"Date" : date, "Daylight" : daylight})

        df = pd.DataFrame(list_wx)
    return df

## <span style='color:#3b748a'>Cities and dates</span>

In [3]:
city_codes = { 
    'atl' : "atlanta", 
    'bos' : "boston", 
    'chi' : "chicago", 
    'la' : "los-angeles", 
    'phl' : "philadelphia", 
    'sf' : "san-francisco", 
    'dc' : "washington-dc", 
    'nyc' : "new-york"
        }

start_year = 2017
end_year = 2018

## <span style='color:#3b748a'>Loop through cities, scrape data, write data to csv file.</span>

In [4]:
for city, wx_code in city_codes.items():
    df = read_write_wx(start_year, end_year, wx_code)
    df.to_csv('../data/' + city + '/daylight.csv', index=False)