# Downloading Weather Data

Weather condition can potentially affect the usage of bikes. For example, people might be less inclined to ride a bike in heavy rain than in a sunny day. Thus, having access to weather data, including temperature, rain amount, visibility, etc, could prove useful.

In this notebook, we will download weather data from https://climate.weather.gc.ca/. There are two forms of data avaiable based on the time interval: daily weather data and hourly weather data. We will download, examine and clean them separately.

In [14]:
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

Daily data contains the following potentially useful information, including max/min temperature and total rain amount on a daily basis. 

Compared to daily data, an advantage of using hourly data is that our bike usage data is also on a hourly basis. Having access to hourly data might reveal relation between weather condition and bike usage. However, one drawback of hourly data is a significant larger count of missing values. Another problem is that "Total Rain (mm)" is not avaiable in hourly data.

## Script

The daily data can be downloaded from the following url:
https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=1706&Year=${year}&Month=${month}&Day=14&timeframe=2&submit= Download+Data

In [27]:
#define function which downloads all daily weather data in the specified range of years and months
#years must be a list of strings of years, e.g. ['2023','2024']
#months must be a list of strings of months, e.g. ['01,'02','03']
#station_id is the Station ID of the weather observer. Vancouver Intl A has station ID 51442.

def download_weather(years,months=[],station_id='51442'):

    for year in years:
        if months == []:
            filename = '../data/weather_data/' + year + '_Daily.csv'
            if os.path.exists(filename):
                print(filename+' already exists. No download initiated.')
            else:
                url = f"https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID={station_id}&Year={year}&Day=14&timeframe=2&submit= Download+Data"
                response = requests.get(url)
                if response.status_code != 200:
                    print("unable to download daily data for "+year+f". Error code: {response.status_code}")
                else:
                    with open(filename,'wb') as f:
                        for chunk in response.iter_content(chunk_size=1024):
                            if chunk:
                                f.write(chunk)
                    print(filename + ' downloaded.')
                    response.close()
        else:
            for month in months:
                filename = '../data/weather_data/' + year + '-' + month +'_Hourly.csv'
                if os.path.exists(filename):
                    print(filename + 'already exists. No download initiated.')
                else:
                    url = f"https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID={station_id}&Year={year}&Month={month}&Day=14&timeframe=1&submit= Download+Data"
                    response = requests.get(url)
                    if response.status_code != 200:
                        print("unable to download hourly data for " + "-".join(year,month) + f". Error code: {response.status_code}")
                    else:
                        with open(filename,'wb') as f:
                            for chunk in response.iter_content(chunk_size=1024):
                                if chunk:
                                    f.write(chunk)
                        print(filename + ' downloaded.')
                        response.close()

## Download Data 

In [23]:
#Set the range of dates
years = ['2017','2018','2019','2020','2021','2022','2023','2024']
months = ['01','02','03','04','05','06','07','08','09','10','11','12']

In [24]:
#Download the daily weather data 
download_weather(years)

Start downloading daily weather data.
2017_Daily.csv downloaded.
Start downloading daily weather data.
2018_Daily.csv downloaded.
Start downloading daily weather data.
2019_Daily.csv downloaded.
Start downloading daily weather data.
2020_Daily.csv downloaded.
Start downloading daily weather data.
2021_Daily.csv downloaded.
Start downloading daily weather data.
2022_Daily.csv downloaded.
Start downloading daily weather data.
2023_Daily.csv downloaded.
Start downloading daily weather data.
2024_Daily.csv downloaded.


In [28]:
download_weather(years,months)

2017-01_Hourly.csv downloaded.
2017-02_Hourly.csv downloaded.
2017-03_Hourly.csv downloaded.
2017-04_Hourly.csv downloaded.
2017-05_Hourly.csv downloaded.
2017-06_Hourly.csv downloaded.
2017-07_Hourly.csv downloaded.
2017-08_Hourly.csv downloaded.
2017-09_Hourly.csv downloaded.
2017-10_Hourly.csv downloaded.
2017-11_Hourly.csv downloaded.
2017-12_Hourly.csv downloaded.
2018-01_Hourly.csv downloaded.
2018-02_Hourly.csv downloaded.
2018-03_Hourly.csv downloaded.
2018-04_Hourly.csv downloaded.
2018-05_Hourly.csv downloaded.
2018-06_Hourly.csv downloaded.
2018-07_Hourly.csv downloaded.
2018-08_Hourly.csv downloaded.
2018-09_Hourly.csv downloaded.
2018-10_Hourly.csv downloaded.
2018-11_Hourly.csv downloaded.
2018-12_Hourly.csv downloaded.
2019-01_Hourly.csv downloaded.
2019-02_Hourly.csv downloaded.
2019-03_Hourly.csv downloaded.
2019-04_Hourly.csv downloaded.
2019-05_Hourly.csv downloaded.
2019-06_Hourly.csv downloaded.
2019-07_Hourly.csv downloaded.
2019-08_Hourly.csv downloaded.
2019-09_