<table style="width:100%; background-color: #EBF5FB">
  <tr>
    <td style="border: 1px solid #CFCFCF">
      <b>Household data: Processing Notebook</b>
      <ul>
        <li><a href="main.ipynb">Main Notebook</a></li>
        <li>Processing Notebook</li>
      </ul>
      <br>This Notebook is part of the <a href="http://data.open-power-system-data.org/household_data">Household Data Package</a> of <a href="http://open-power-system-data.org">Open Power System Data</a>.
    </td>
  </tr>
</table>

# Table of Contents
* [1. Introductory Notes](#1.-Introductory-Notes)
* [2. Settings](#2.-Settings)
	* [2.1 Import Python libraries](#2.1-Import-Python-libraries)
	* [2.2 Set version number and recent changes](#2.2-Set-version-number-and-recent-changes)
	* [2.3 Select timerange](#2.3-Select-timerange)
	* [2.4 Select download source](#2.4-Select-download-source)
* [3. Download](#3.-Download)
* [4. Read](#4.-Read)
	* [4.1 Preparations](#4.1-Preparations)
	* [4.2 Select household subset](#4.2-Select-household-subset)
	* [4.3 Reading loop](#4.3-Reading-loop)


# 1. Introductory Notes

This Notebook handles missing data, performs calculations and aggragations and creates the output files.

# 2. Settings

## 2.1 Import Python libraries

This section: load libraries and set up a log.

In [150]:
# Python modules
from datetime import datetime, date, timedelta, time
import pandas as pd
import numpy as np
import logging
import json
import sqlite3
import yaml
import itertools
import os
import pytz
import hashlib
from shutil import copyfile

# Scripts from household repository package
from household.download import download
from household.read import read

# Reload modules with execution of any code, to avoid having to restart
# the kernel after editing timeseries_scripts
%load_ext autoreload
%autoreload 2

households_yaml_path = 'conf/households.yml'

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger('log')
# For more detailed logging messages, replace 'INFO' with 'DEBUG'
# (May slow down computation).
#logger.setLevel('INFO')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## 2.2 Set version number and recent changes

Executing this script till the end will create a new version of the data package.
The Version number specifies the local directory for the data <br>
We include a note on what has been changed.

In [11]:
version = '2017-06-30'
data_path = os.path.join(version, 'original_data')
changes = '''Initial upload'''

## 2.3 Select timerange

Select the time range to read and process data. <br>
*Default: all data.*

Type `None` to process all available data.

In [15]:
start_from_user = None  # i.e. date(2016, 1, 1)
end_from_user = None  # i.e. date(2016, 1, 31)

## 2.4 Select download source

The raw data can be downloaded as a zip file from the OPSD Server. To do this, specify an archive version to use, that has been cached on the OPSD server as input.

In [12]:
archive_version = '2017-06-30' # i.e. '2017-06-30'

# 3. Download

This section: download raw data to process.

If the original data does not exist, it will be downloaded from the OPSD Server and extracted in a local directory

In [14]:
download(version=archive_version)

INFO:log:original_data.zip already exists. Delete it if you want to download again
INFO:log:Extracted data to /original_data.


# 4. Read

This section: Read each downloaded file into a pandas-DataFrame and merge data from different sources if it has the same time resolution.

## 4.1 Preparations

Set the title of the rows at the top of the data used to store metadata internally. The order of this list determines the order of the levels in the resulting output.

In [85]:
headers = ['project', 'region', 'household', 'type', 'feed']

Create a dictionary of empty DataFrames to be populated by the read function

In [53]:
data_sets = {'1min': pd.DataFrame(), '15min': pd.DataFrame()}

## 4.2 Select household subset

Optionally, specify a subset of households to process. <br>
The next cell prints the available sources and datasets.

In [54]:
with open(households_yaml_path, 'r') as f:
    households = yaml.load(f.read())
for k, v in households.items():
    print(yaml.dump({k: list(v['feeds'].keys())}, default_flow_style=False))

Residential 4:
- grid_import
- grid_export
- pv
- ev
- heat_pump
- dishwasher
- washing_machine
- refrigerator
- freezer



Copy from its output and paste to following cell to get the right format.

Type `subset = None` to include all data.

In [55]:
subset = yaml.load('''
    insert_household_here:
    - insert_feed1_here
    - insert_feed2_here
    more_households_here:
    - more_feeds_here
    ''')

subset = None

Now eliminate households and feeds not in subset.

In [56]:
if subset:  # eliminate households and feeds not in subset
    households = {household_name: {k: v
                                   for k, v in households[household_name].items()
                                   for k, v in households[household_name]['feeds'].items()
                                   if k in feed_list}
                  for household_name, feed_list in subset.items()}

## 4.3 Reading loop

Loop through households and feeds to do the reading

In [149]:
# For each source in the household dictionary
for household_name, household_dict in households.items():
    df = read(household_name, household_dict['dir'], household_dict['feeds'], 
              household_dict['project'], household_dict['region'], household_dict['type'], headers, 
              out_path=data_path,
              start_from_user=start_from_user,
              end_from_user=end_from_user)
    

INFO:log:Reading Residential 4 - feeds


Progress: [--------------------------------------------------] 0/9 feeds 

INFO:log:                           grid_import
2015-10-11 19:33:00+00:00     0.002860
2015-10-11 19:34:00+00:00     0.005720
2015-10-11 19:35:00+00:00     0.008580
2015-10-11 19:36:00+00:00     0.011440
2015-10-11 19:37:00+00:00     0.014300
2015-10-11 19:38:00+00:00     0.017160
2015-10-11 19:39:00+00:00     0.021449
2015-10-11 19:40:00+00:00     0.022879
2015-10-11 19:41:00+00:00     0.024309
2015-10-11 19:42:00+00:00     0.025739
2015-10-11 19:43:00+00:00     0.027169
2015-10-11 19:44:00+00:00     0.028599
2015-10-11 19:45:00+00:00     0.032889
2015-10-11 19:46:00+00:00     0.035749
2015-10-11 19:47:00+00:00     0.038609
2015-10-11 19:48:00+00:00     0.041469
2015-10-11 19:49:00+00:00     0.044329
2015-10-11 19:50:00+00:00     0.047189
2015-10-11 19:51:00+00:00     0.052551
2015-10-11 19:52:00+00:00     0.055054
2015-10-11 19:53:00+00:00     0.057556
2015-10-11 19:54:00+00:00     0.062561
2015-10-11 19:55:00+00:00     0.065063
2015-10-11 19:56:00+00:00     0.067566
2015-10-11 19:57

Progress: [######--------------------------------------------] 1/9 feeds None
