# NEM-TEL: National Energy Market - Track, Extract, Load
### *A tool for tracking, extracting and loading NEM open data resources on the go*

***

<br>

## Setup

* Use the `config_template.json` to create your own `config.json` file. Here, you just need to specify the path to the directory where you want to store the NEM data.

In [1]:
%load_ext autoreload
%autoreload 2

# Standard Python
import os
import pandas as pd

# NEM-TEL modules
from nemtel.tracker import NEM_tracker
from nemtel.extractor import NEM_extractor
from nemtel.loader import NEM_loader

# Configuration
import config
config.use('config.json')

Value for DATA_PATH has been set!


## NEM Tracker

* create an instance of the `NEM_tracker` object, providing a `data_dir` as the path to the main data directory (note this can be specified manually, but I set it in the `config.json` file)
* run the `resources_report()` method to check up on what NEM data resources you're tracking - here resource means the relative URL path to where the data files are located on `'http://nemweb.com.au'`

* to add a new resource use `.update_resource(resource, new=True)`
* tracking CSVs in `.tracker_dir` keep track of resource URLs 

In [2]:
data_dir = os.getenv('NEM_PATH')
nem_trk = NEM_tracker(data_dir)
nem_trk.resources_report()


/Reports/Current/Operational_Demand/ACTUAL_DAILY/
Last update: 2020-04-13-10:44:42

/Reports/Current/Dispatch_SCADA/
Last update: 2020-04-13-10:44:42

/Reports/Current/Next_Day_Intermittent_DS/
Last update: 2020-04-13-10:44:42

/Reports/Archive/Operational_Demand/Actual_Daily/
Last update: 2020-04-13-10:44:42


### Adding and updating NEM data resources

* `.bulk_update()` method will update all existing resources. Some NEM data is updated every 5 minutes! Therefore, it is important to run this update to obtain a list of the latest available data files.

* the `add_resources()` method allows users to add any new relative URL paths from `'http://nemweb.com.au'`

* everytime a resource is added/updated it will be reflected in the resource's 'tracker' CSV file in the directory path, which can be found by checking the `.tracker_dir` attribute.

In [3]:
nem_trk.bulk_update()
resources_new = []
nem_trk.add_resources(resources_new)

## NEM Extractor

* The `NEM_extractor` class inherits from the `NEM_tracker` class. If the same `data_dir` is provided as an argument, it will identify all the tracked resources.

* The `select_resource()` method allows the user to choose one of the currently tracked resources to work with.

In [4]:
nem_get = NEM_extractor(data_dir)
nem_get.select_resource()

[0] /Reports/Current/Operational_Demand/ACTUAL_DAILY/
[1] /Reports/Current/Dispatch_SCADA/
[2] /Reports/Current/Next_Day_Intermittent_DS/
[3] /Reports/Archive/Operational_Demand/Actual_Daily/

Selected: /Reports/Current/Dispatch_SCADA/


### The tracker file

* for the selected resource, we can use the `load_tracker_df()` method which sets the `current_tracker_df` attribute.
* `current_tracker_df` contains several useful fields:
  * the `TIMESTAMP` and `VERSION` (if applicable) for each file 
  * whether or not it has been `DOWNLOADED` and when (note `1900-01-01` is the default if it has not been downloaded to avoid Pandas data type issues with `NaN`)
  * the `URL` from which we need to request the data

In [5]:
nem_get.load_tracker_df()
nem_get.current_tracker_df.tail(3)

Unnamed: 0,TIMESTAMP,VERSION,DOWNLOADED,DOWNLOAD_DATE,URL
574,2020-04-13 10:35:00,V0000000321591470,False,1900-01-01,http://nemweb.com.au/Reports/Current/Dispatch_...
575,2020-04-13 10:40:00,V0000000321591665,False,1900-01-01,http://nemweb.com.au/Reports/Current/Dispatch_...
576,2020-04-13 10:45:00,V0000000321591804,False,1900-01-01,http://nemweb.com.au/Reports/Current/Dispatch_...


### Selecting files to download

* this involves subsetting the `current_tracker_df` using the `set_download_df()` method, which allows a number of options to narrow down, either by choosing the *n* `latest` rows, specifying a time range, or by setting the DataFrame manually

In [6]:
#nem_get.adjust_time_range(date_min='2020-03-01')
nem_get.set_download_df(latest=1, by_time_range=False, manual_df=None)
nem_get.download_df

Unnamed: 0,TIMESTAMP,VERSION,DOWNLOADED,DOWNLOAD_DATE,URL
576,2020-04-13 10:45:00,V0000000321591804,False,1900-01-01,http://nemweb.com.au/Reports/Current/Dispatch_...


### Downloading the files

* Once this is set, just run `download_files` and the files will be extractd, unzipped and stored locally within the folder specified by `data_dir`

* Note that `current_tracker_df` gets updated accordingly after the download is executed

In [7]:
nem_get.download_files()
nem_get.current_tracker_df.tail(3)

Unnamed: 0,TIMESTAMP,VERSION,DOWNLOADED,DOWNLOAD_DATE,URL
574,2020-04-13 10:35:00,V0000000321591470,False,1900-01-01 00:00:00,http://nemweb.com.au/Reports/Current/Dispatch_...
575,2020-04-13 10:40:00,V0000000321591665,False,1900-01-01 00:00:00,http://nemweb.com.au/Reports/Current/Dispatch_...
576,2020-04-13 10:45:00,V0000000321591804,True,2020-04-13 10:59:58,http://nemweb.com.au/Reports/Current/Dispatch_...


## NEM Loader

* The `NEM_loader` class inherits from the `NEM_extractor` class and is used to load and manipulate downloaded data into an analysis-ready format

* Its functionality is still quite preliminary. Currently, for a selected resource, it can identify the list of downloaded data files using the `get_available_files()` method

In [8]:
nem_load = NEM_loader(data_dir)
nem_load.get_available_files()

[0] /Reports/Current/Operational_Demand/ACTUAL_DAILY/
[1] /Reports/Current/Dispatch_SCADA/
[2] /Reports/Current/Next_Day_Intermittent_DS/
[3] /Reports/Archive/Operational_Demand/Actual_Daily/

Selected: /Reports/Current/Operational_Demand/ACTUAL_DAILY/


### Selecting files to read

* By default, the `set_read_list()` chooses the most recent file, which is useful if the user just wants to peek at the file format

In [9]:
nem_load.set_read_list()
nem_load.files_to_read

['PUBLIC_ACTUAL_OPERATIONAL_DEMAND_DAILY_20200412_20200413044000.CSV']

### Loading files

* the `process_real_list()` method will load the files as DataFrames nested in a dictionary structure, from which the user can start using `pandas` functions and methods to analyse further

In [10]:
data = nem_load.process_read_list()
k = list(data.keys())[0]
data[k][0].head()

Unnamed: 0,OPERATIONAL_DEMAND,ACTUAL,1,REGIONID,INTERVAL_DATETIME,OPERATIONAL_DEMAND.1,LASTCHANGED
0,OPERATIONAL_DEMAND,ACTUAL,1,NSW1,2020/04/12 04:30:00,5622,2020/04/12 04:30:09
1,OPERATIONAL_DEMAND,ACTUAL,1,NSW1,2020/04/12 05:00:00,5608,2020/04/12 05:01:06
2,OPERATIONAL_DEMAND,ACTUAL,1,NSW1,2020/04/12 05:30:00,5668,2020/04/12 05:30:45
3,OPERATIONAL_DEMAND,ACTUAL,1,NSW1,2020/04/12 06:00:00,5754,2020/04/12 06:00:18
4,OPERATIONAL_DEMAND,ACTUAL,1,NSW1,2020/04/12 06:30:00,5854,2020/04/12 06:31:21
