# NEM Open Data Extraction Tool

**The Objective**: to create a tool that can easily extract open data from nemweb.com.au and assemble it into an analysis-ready format.

* For the intitial testing phase, extracted data will be stored as CSV files
* Once the data extraction pipeline is set up, we can experiment with SQL databases

In [1]:
%load_ext autoreload
%autoreload 2

# Standard Python
import os
from io import BytesIO
from zipfile import ZipFile
from datetime import datetime as dt
import pandas as pd

# My modules
from nem_tracker import NEM_tracker
from nem_extract import NEM_extractor
import config
config.use('config.json')

Value for DATA_PATH has been set!


### NEM Tracker

* `.bulk_update()` method will update all existing resources
* to add a new resource use `.update_resource(resource, new=True)`
* tracking CSVs in `.tracker_dir` keep track of resource URLs 

In [4]:
data_dir = os.getenv('DATA_PATH')
#nem_trk = NEM_tracker(data_dir)
#nem_trk.bulk_update()

### NEM Extractor

In [5]:
nem_get = NEM_extractor(data_dir)
nem_get.select_resource()
nem_get.selected_resource

'/Reports/Current/Operational_Demand/ACTUAL_DAILY/'

In [6]:
nem_get.load_tracker_df()
nem_get.current_tracker_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 5 columns):
TIMESTAMP        60 non-null datetime64[ns]
UPLOAD_DATE      60 non-null datetime64[ns]
DOWNLOADED       60 non-null bool
DOWNLOAD_DATE    60 non-null datetime64[ns]
URL              60 non-null object
dtypes: bool(1), datetime64[ns](3), object(1)
memory usage: 2.1+ KB


In [7]:
nem_get.adjust_time_range(date_min='2020-04-03')
nem_get.time_range

[Timestamp('2020-04-03 00:00:00'), Timestamp('2020-04-05 00:00:00')]

In [8]:
nem_get.set_download_df()
nem_get.download_df

Unnamed: 0,TIMESTAMP,UPLOAD_DATE,DOWNLOADED,DOWNLOAD_DATE,URL
57,2020-04-03,2020-04-04 04:40:00,False,1900-01-01,http://nemweb.com.au/Reports/Current/Operation...
58,2020-04-04,2020-04-05 04:40:00,False,1900-01-01,http://nemweb.com.au/Reports/Current/Operation...
59,2020-04-05,2020-04-06 04:40:01,False,1900-01-01,http://nemweb.com.au/Reports/Current/Operation...


In [9]:
nem_get.download_files()
nem_get.current_tracker_df.tail()

Unnamed: 0,TIMESTAMP,UPLOAD_DATE,DOWNLOADED,DOWNLOAD_DATE,URL
55,2020-04-01,2020-04-02 04:40:00,False,1900-01-01 00:00:00,http://nemweb.com.au/Reports/Current/Operation...
56,2020-04-02,2020-04-03 04:40:01,False,1900-01-01 00:00:00,http://nemweb.com.au/Reports/Current/Operation...
57,2020-04-03,2020-04-04 04:40:00,True,2020-04-06 21:12:31,http://nemweb.com.au/Reports/Current/Operation...
58,2020-04-04,2020-04-05 04:40:00,True,2020-04-06 21:12:31,http://nemweb.com.au/Reports/Current/Operation...
59,2020-04-05,2020-04-06 04:40:01,True,2020-04-06 21:12:32,http://nemweb.com.au/Reports/Current/Operation...
