> Under Construction: this notebook uses the [waylay-py-internal](https://github.com/waylayio/waylay-py-internal) extension for apis that are not yet public. Requires a current versions both waylay-py and waylay-py-internal:

```
pip install https://github.com/waylayio/waylay-py
pip install https://github.com/waylayio/waylay-py-internal
```

# HVAC occupancy detection

This notebook illustrates how to interact with the Waylay Platform API's for an HVAC data science use case. 

## References
* The [kaggle](https://www.kaggle.com) notebook [HVAC Occupancy Detection with ML and DL Methods](https://www.kaggle.com/turksoyomer/hvac-occupancy-detection-with-ml-and-dl-methods/notebook), and related [dataset](https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+), on which this example is based.
* The [Waylay api documentation](https://docs.waylay.io/api/)
* The [Waylay python SDK](https://docs.waylay.io/api/sdk/python/)
* [Setup instructions](https://github.com/waylayio/demo-general/tree/master/python-sdk) for a python notebook using the Waylay Python SDK.


## Parameters
Please review and adapt the following parameters for this demo

In [None]:
class HVACDemo:
    """parametrization for this demo"""
    
    # original location of the data set
    data_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00357/occupancy_data.zip'
    
    # the profile name under which waylay credentials are stored
    waylay_client_profile='staging'  # 'rules'
    
    # the id of the resource under which this demo is run
    resource_id = 'demo_energy_hvac_occupancy'
    
    
    
    

## Setup

In [None]:
import pandas as pd
import waylay
from datetime import datetime

In [None]:
# NEEDED FOR NOW

import waylay_internal
dict(
    waylay=waylay.__version__,
    waylay_internal=waylay_internal.__version__,
)

In [None]:
# if the profile does not exist, this will interactively request for credentials, and let you optionally store it.
waylay_client = waylay.WaylayClient.from_profile(HVACDemo.waylay_client_profile)

## Data retrieval

### download the data set
We download the dataset (a zipped set of csv files), inspect its content, and read out the csv files into a pandas data structure.

In [None]:
import os
import os.path
import zipfile
from urllib.request import urlretrieve

os.makedirs('input', exist_ok=True)
os.makedirs('output', exist_ok=True)

# download the kaggle data set
if not os.path.isfile('input/occupancy.zip'):
    urlretrieve(HVACDemo.data_url, 'input/occupancy.zip')
    
with zipfile.ZipFile('input/occupancy.zip') as occ_zip:
    for file_name in occ_zip.namelist():
        print(file_name)

In [None]:
with zipfile.ZipFile('input/occupancy.zip') as occ_zip:
    datatest = pd.read_csv(occ_zip.open('datatest.txt'))
    datatest2 = pd.read_csv(occ_zip.open('datatest2.txt'))
    datatraining = pd.read_csv(occ_zip.open('datatraining.txt'))
    


In [None]:
datatraining.describe()

In [None]:
datatraining.head()

### convert to etl format
To upload bulk data into waylay, the data should be converted into an optimized format.
The `timeseries.tool.prepare_etl_import` helps you to create these _import files_.

In this case, we provide the tool with additional information:
 * `timestamp_timezone='UTC'` as timestamps do not contain a timezone component
 * `resource=HVACDemo.resource_id` as the resource id is not provided in the input
 * `timestamp_key='date'`, as timestamps are in the `date` column. In this case this is not required as `date` will be recognised as a timestamp column if not specified otherwise.
 * `directory='input'` because we want the resulting import file to reside in that directory

The first two instruction are required for this dataset. Try to omit them to see what errors are raised.

In [None]:
etl_import = waylay_client.timeseries.etl_tool.prepare_import(
    datatraining, 
    timestamp_timezone='UTC',
    resource=HVACDemo.resource_id,
    timestamp_key='date',
    directory='output'
)
etl_import

Because it is easer to work with recent data, we instruct the tool to shift timestamps
(with `timestamp_offset`, `timestamp_first` or `timestamp_last`)

In [None]:
etl_import = waylay_client.timeseries.etl_tool.prepare_import(
    datatraining, 
    timestamp_timezone='UTC',
    resource=HVACDemo.resource_id,
    timestamp_key='date',
    timestamp_last=datetime.utcnow(), # shift all timestamps so that last one is now
    directory='output'
)
etl_import

The resulting file is a `gzip` compressed csv file in fully normalized _waylay timeseries ETL_ format

In [None]:
import gzip
with gzip.open(etl_import.path, 'rt') as csv_file:
     etl_series_df = pd.read_csv(csv_file)

etl_series_df.head()

### create or update waylay resource
Timeseries in waylay are best associated with a Waylay resource. This documents the entity that is represented by the timeseries data.

In [None]:
etl_import.spec.metrics


In [None]:

hvac_resource_repr = {
        "id": HVACDemo.resource_id,
        "name": HVACDemo.resource_id,
        "description": """
Experimental data used for binary classification (room occupancy) 
from Temperature,Humidity,Light and CO2. 
Ground-truth occupancy was obtained from time stamped pictures that were taken every minute.
See https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+#
        """,
        "metrics" : [ { "name": name } for name in etl_import.spec.metrics ]
    }

In [None]:
# use `update` (PATCH method) to upsert the resource
hvac_resource_resp = waylay_client.api.resource.update(HVACDemo.resource_id, body=hvac_resource_repr)

# validate it is stored correctly
waylay_client.api.resource.get(HVACDemo.resource_id)

In [None]:
# maybe add some more metadata
metrics_metadata = [
    { "name": "Temperature", "valueType": "float", "metricType": "gauge", "unit": "°C" }, 
    { "name": "Humidity", "valueType": "float", "metricType": "gauge", "unit": "%", "description": "Relative Humidity" }, 
    { "name": "Light", "valueType": "float", "metricType": "gauge", "unit": "Lux" }, 
    { "name": "CO2", "valueType": "float", "metricType": "gauge", "unit": "ppm" }, 
    { "name": "HumidityRatio", "valueType": "float", "metricType": "gauge", "unit": "kgwater-vapor/kg-air", "description": "Derived quantity from temperature and relative humidity."},
    { "name": "Occupancy", "valueType": "integer", "metricType": "gauge", "unit": "boolean", "description": "0 for not occupied, 1 for occupied status" } 
]
hvac_resource_resp = waylay_client.api.resource.update(HVACDemo.resource_id, body=dict(metrics=metrics_metadata))
waylay_client.api.resource.get(HVACDemo.resource_id)
      

### upload the etl-import data


In [None]:
upload_bucket, upload_prefix = waylay_client.timeseries.etl_tool.initiate_import(etl_import)

The etl file is uploaded to the `etl-import/upload` storage folder.
Any upload in this folder will initiate an etl process.

This can be monitored as follows:
* the file is moved from `etl-import/upload` to an timestamped folder in  `etl-import/busy`
* the etl process is kicked of
* on completion, the file (and a result statement) is copied to a folder in `etl-import/done`


In [None]:
# listing of /bucket/etl-import/upload/ 
#  ( should be empty when etl process has started)
waylay_client.storage.object.list(upload_bucket, 'upload/')

In [None]:
# listing of /bucket/etl-import/busy/ 
#  ( should contain a folder with the file, as long as etl process is busy )
list(
    obj['name']
    for obj in waylay_client.storage.object.list(upload_bucket, 'busy/', params=dict(recursive=True))
    if etl_import.path.name in obj['name']
)

In [None]:
# listing of /bucket/etl-import/done/ 
#  ( should contain a folder with the file, when the etl process has concluded )
done_list= list(
    obj['name']
    for obj in waylay_client.storage.object.list(upload_bucket, 'done/', params=dict(recursive=True))
    if etl_import.path.name in obj['name']
)
done_folder = '/'.join(done_list[0].split('/')[:-1]) + '/' # name of parent folder
done_listing = list(obj['name'] for obj in waylay_client.storage.object.list(upload_bucket, done_folder))
done_listing

In [None]:
# inspect the result file
waylay_client.storage.content.get(upload_bucket, done_listing[1]).json()

In [None]:
query = dict(
    resource=HVACDemo.resource_id,
    data=[
        dict(metric=metric) for metric in etl_import.spec.metrics
    ]
)
# test query
waylay_client.analytics.query.execute(body=query)

In [None]:
# save query
query_name = f'example_{HVACDemo.resource_id}'
waylay_client.analytics.query.create(body=dict(name=query_name, query=query))


In [None]:
waylay_client.analytics.query.data(query_name)

In [None]:
from waylay import RestResponseError
def cleanup(filter='demo_energy_hvac_occupancy', query_name_prefix='example_'):
    resource_ids = [ r['id'] for r in waylay_client.api.resource.search(params=dict(filter=filter)) ]
    if not resource_ids:
        print('No resources to clean.')
        return
    print('removing data and resources with ids:' + ''.join(f"\n  - {resource_id}" for resource_id in resource_ids))
    answer = input('OK? [Y/N] ')
        
    if not answer or answer[0].upper() != 'Y':
        print('Cleanup cancelled.')
        return
    
    # delete data
    for resource_id in resource_ids:
        try:
            print(waylay_client.data.series.remove(resource_id)  or f'removed series   {resource_id}')
            print(waylay_client.api.resource.remove(resource_id) or f'removed resource {resource_id}')
            query_name = f'{query_name_prefix}{resource_id}'
            print(waylay_client.analytics.query.remove(query_name) or f'removed query {query_name}')
        except RestResponseError as exc:
            print(f'stopped processing resource {resource_id} because of:')
            print(exc)

In [None]:
cleanup()