# BOM Weather Data Scraper

### *Extracting Weather Data from the Australian Bureau of Meteorology*

In [1]:
%load_ext autoreload
%autoreload 2

# Standard Python
import os
import pandas as pd
import matplotlib.pyplot as plt

# BOM-scraper modules
from bomdata.scraper import BOM_scraper

# Configuration
import config
config.use('config.json')

Value for NEM_PATH has been set!
Value for BOM_PATH has been set!


## Selecting weather stations

When instantiating a `BOM_scraper` class, a `station_list.csv` file is created and stored in the `data_dir`:
* This file is used to set the `.station_df` attribute, a DataFrame that contains all Australian weather stations with free-of-charge historical data
* Unless a `station_list.csv` already exists in `data_dir`, it is automatically created by scraping the BOM website with the `_scrape_station_list()` function 

In [2]:
data_dir = os.getenv('BOM_PATH')
bom_ws = BOM_scraper(data_dir)
bom_ws.station_df.head()

Unnamed: 0,name,station_id,url,state
0,Canberra,IDCJDW2801,http://www.bom.gov.au/climate/dwo/IDCJDW2801.l...,Australian Capital Territory
1,Tuggeranong,IDCJDW2802,http://www.bom.gov.au/climate/dwo/IDCJDW2802.l...,Australian Capital Territory
2,Sydney,IDCJDW2124,http://www.bom.gov.au/climate/dwo/IDCJDW2124.l...,New South Wales
3,Penrith,IDCJDW2111,http://www.bom.gov.au/climate/dwo/IDCJDW2111.l...,New South Wales
4,Newcastle,IDCJDW2098,http://www.bom.gov.au/climate/dwo/IDCJDW2098.l...,New South Wales


Now run `.select_station()` method to select a weather station to scrape data from.

In [3]:
bom_ws.select_station()

[0] Australian Capital Territory
[1] New South Wales
[2] Victoria
[3] Queensland
[4] South Australia
[5] Western Australia
[6] Tasmania
[7] Northern Territory
[8] Australia's Antarctic Bases
[9] Australia's Offshore Islands

Selected: Queensland
[0] Brisbane
[1] Cairns
[2] Townsville
[3] Gold Coast

Selected: Brisbane


## Scraping monthly historical weather data

The `BOM_scraper` class provides the following functionality:

* the `.set_station_id()` method allows the user to select a weather station. The station ID can be found using the `.station_df` attribute. 
* the `.scrape_station()` method will get all the relevant links for the monthly historical CSV data files of the station
* the `.download_files()` method downloads from all these links, unless a specific list of months is specified - the files will be in the station's directory which will be created as a subfolder of `data_dir`
* finally, the data can be loaded as a pandas DataFrame using the `.load_station_data()` method (Note that BOM data files have inconsistent headers...)

In [5]:
bom_ws.scrape_station()
bom_ws.download_files()#dates=['Apr-20'])
bom_df = bom_ws.load_station_data(skip=8)
bom_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 426 entries, 0 to 425
Data columns (total 21 columns):
Date                                 426 non-null object
Minimum temperature (°C)             426 non-null float64
Maximum temperature (°C)             424 non-null float64
Rainfall (mm)                        413 non-null float64
Evaporation (mm)                     192 non-null float64
Sunshine (hours)                     425 non-null float64
Direction of maximum wind gust       424 non-null object
Speed of maximum wind gust (km/h)    424 non-null float64
Time of maximum wind gust            424 non-null object
9am Temperature (°C)                 426 non-null float64
9am relative humidity (%)            426 non-null int64
9am cloud amount (oktas)             426 non-null int64
9am wind direction                   426 non-null object
9am wind speed (km/h)                426 non-null object
9am MSL pressure (hPa)               426 non-null float64
3pm Temperature (°C)              