## Workflow example

This notebook demonstrates a basic workflow for loading data, making timeseries plots and saving it to csv files, with two different ways: 

- Individual `device`
- `Test` containing various devices

This is an example of the metadata stored in a `test`, alongside a collection of devices with different options: 

- author
- project
- notes
- dates
- report
- ...

## Device example

Devices can be loaded from various sources:
- local csv files
- the Smart Citizen API
- the MUV api
- open data APIs such as the Barcelona City council.
- NILU iflink API (Norwegian Institute for Air Research)

This notebook will showcase the SmartCitizen API one. Visit [this notebook](./02_access_the_power_of_data.ipynb) to get more info on how to acces other sources.

In [None]:
from scdata._config import config

config._out_level = 'DEBUG'

In [None]:
from scdata.device import Device

# Below, the device ID is the number after kits/ in the kit URL, for instance:
# or this kit: http://smartcitizen.me/kits/13625, the device would be 13625
device = Device(blueprint = 'sck_21', descriptor = {'id': '13625', 
                                                    # The source is always api when it comes from any API, 
                                                    # in this case as it's an sck_21, we'll use the SmartCitizen one
                                                    'source': 'api', 
                                                    # The frequency at which we want to load the data. By default, we don't clean NaNs
                                                    'frequency': '1Min'})

In [None]:
# Get the device information
print ('---SENSORS---')
print (device.sensors)
# The device contains another sub-device from the API in question that shows other methods
print ('\n---ADDED AT---')
print (device.api_device.get_device_added_at())
print ('\n---LAST READING---')
print (device.api_device.get_device_last_reading())
print ('\n---TIMEZONE---')
print (device.api_device.get_device_timezone())
print ('\n---API SENSORS---')
print (device.api_device.get_device_sensors())
print ('\n---API KIT ID---')
print (device.api_device.get_kit_ID())


In this case, we assumed the device was a SCK 2.1 blueprint, but in fact the platform returns a kit_id 33 [see https://api.smartcitizen.me/v0/kits?per_page=200](https://api.smartcitizen.me/v0/kits?per_page=200). The sensors there will be used.

In [None]:
# Not get the device data
device.load();

In [None]:
# Take a look at the first rows
device.readings.head(4)

In [None]:
# The readings object is a pandas.DataFrame() object, with the same properties to plot, filter, get data, etc
# More information on the pandas.DataFrame() object here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
device.readings[['TEMP']].plot(figsize = (15,10), 
                               grid = True, 
                               ylim=(15,20))

In [None]:
# Get some basic metrics
print (device.readings[['TEMP']].mean())
print (device.readings[['TEMP']].max())
print (device.readings[['TEMP']].min())

## Test example

Tests are "more complex" structures, that allow having plenty of devices in the same abstract representation. It allows for traceability of different deployments with metadata stored alongside with it.

In [None]:
from scdata import Test
# The second time you load it, you don't need to input the whole name, just some words. Then, in the input box, put the number for the test
test = Test('MINKE_WORFKLOW')

In [None]:
### WARNING: Run this cell only the first time, when you create the test

# Add the devices you want to it
devices = ['13625', '13604', '13605']

for device in devices:
    # Tests can have devices from many sources, and they can be compared in a common framework (from csv data, API(s), etc.)
    test.add_device(Device(blueprint = 'sck_21',  descriptor = {'source': 'api',
                                                  'id': device,
                                                  'frequency': '1Min',
                                                  'timezone': 'Europe/Madrid'}))

In [None]:
### WARNING: Run this cell only the first time, when you create the test

# Create it
test.create()

This creates the necessary folder structure and data in the following path

In [None]:
test.path

In [None]:
# Finally, load it
test.load()

# Alternatively, you can load from different dates - if you have cached the files, you might need to delete them first
# Options for min_date, max_date, frequency, or what to do with the NaNs

# options = {'min_date': '2021-01-20'}
# test.load(options = options)

All csv data is directly stored in the folder above, but in the `cached` subfolder. Next time, the load process from the API will account for what is already in that folder and won't load the same data again. The margin to reload data can be adjusted in the `cached_data_margin` parameter in the `config.yaml` file (in hours)

In [None]:
# Explore a bit
test.devices

In [None]:
# The same applies for the devices data inside (a pandas.DataFrame)
test.devices['13625'].readings.head(4)

In [None]:
# Make a plot (basic one)
traces = {1: {'devices': 'all', 'channel': 'TEMP', 'subplot': 1}}

test.ts_plot(traces = traces);

In [None]:
# Make some adjustments
traces = {1: {'devices': 'all', 'channel': 'TEMP', 'subplot': 1}}

formatting = {'width': 12, 'height': 8, 'ylabel': {1: 'TEMP'}, 'title': 'Temperature comparison'}

# Options for min_date, max_date, frequency, or what to do with the NaNs
options = {'min_date': '2021-01-19 12:00:00', 'max_date': '2021-01-22', 'frequency': '10Min', 'clean_na': None}

test.ts_plot(traces = traces, options = options, formatting = formatting);

In [None]:
# Make some adjustments and some subplots
# If you put 'all' in the devices for the traces, it will plot all of them
# Otherwise, you can just put a list of the devices you want
traces = {1: {'devices': 'all', 'channel': 'TEMP', 'subplot': 1},
          2: {'devices': ['13625', '13604'], 'channel': 'HUM', 'subplot': 2}}

formatting = {'width': 12, 
              'height': 10, 
              'ylabel': {1: 'TEMP (degC)', 2: 'HUM (%rh)'}, 
              'title': 'Temperature and humidity comparison'}

options = {'min_date': '2021-01-19 12:00:00','max_date': '2021-01-22', 'frequency': '10Min', 'clean_na': None}
fig = test.ts_plot(traces = traces, options = options, formatting = formatting);

# Uncomment below to save the figure somewhere
# fig.savefig('~/Desktop/plot.png', dpi = 300, transparent=False, bbox_inches='tight')

# Visit the 03_plotting_in_no_time example to explore more options regarding plots

In [None]:
# Make some interactive plots (if you have plotly installed)
# If you put 'all' in the devices for the traces, it will plot all of them
# Otherwise, you can just put a list of the devices you want
traces = {1: {'devices': 'all', 'channel': 'TEMP', 'subplot': 1},
          2: {'devices': ['13625', '13604'], 'channel': 'HUM', 'subplot': 2}}

formatting = {'width': 800, 
              'height': 600, 
              'ylabel': {1: 'TEMP (degC)', 2: 'HUM (%rh)'}, 
              'title': 'Temperature and humidity comparison'}

options = {'min_date': '2021-01-19 12:00:00', 'max_date': '2021-01-22', 'frequency': '10Min', 'clean_na': None}
test.ts_uplot(traces = traces, options = options, formatting = formatting)

# Uncomment below to save the figure somewhere
# fig.savefig('~/Desktop/plot.png', dpi = 300, transparent=False, bbox_inches='tight')

In [None]:
## Export data to the desktop in csv
test.devices['13625'].export(path ='~/Desktop')

In [None]:
# Or the whole thing
test.to_csv()

In [None]:
# You can also make a descriptor front page in HTML
test.to_html();