# Data processing

This example walks through the basics for processing data and added metrics.

## Concepts

Devices in the framework contain _raw readings_ that are under the device.readings pandas dataframe. A list of the sensors raw metrics can be shown in device.sensors.

Devices can also contain processed values called metrics. These metrics can be added by passing a callable function and then processed.

In [None]:
%load_ext autoreload
%autoreload 2

In [3]:
from scdata.test import Test
from scdata.device import Device
from scdata._config import config

config.out_level='DEBUG'
config.framework='jupyterlab'

test = Test('EXAMPLE')

[INFO]: 63 --- 2020_05_EXAMPLE
[INFO]: 66 --- 2020_05_EXAMPLE2


Similar tests found, please select one or input other name [New]:  63


[INFO]: Test full name, 2020_05_EXAMPLE


In [None]:
test.load()

## Process basics

In [None]:
## The readings for each device are accessible via
test.devices['10751'].readings

In [None]:
## The sensors for each device are accessible via
test.devices['10751'].sensors

In [None]:
## The metrics for each device are accessible via
test.devices['10751'].metrics

In [2]:
help(Test.process)

Help on function process in module scdata.test:

process(self, only_new=False)
    Calculates all the metrics in each of the devices
    Returns True if done OK



In [None]:
## Process the metrics as a default
test.process()

## Add metrics

In [4]:
help(Device.add_metric)

Help on function add_metric in module scdata.device:

add_metric(self, metric={})
    Add a metric to the device to be processed by a callable function
    Parameters
    ----------
        metric: dict
        Empty dict
        Description of the metric to be added. It only adds it to
        Device.metrics, but does not calculate anything yet. The metric dict needs 
        to follow the format:
            metric = {
                        'metric_name': {'process': <function_name>
                                        'args': <iterable>
                                        'kwargs': <**kwargs for @function_name>
                                        'from_list': <module to load function from>
                        }
            }
        
        The 'from_list' parameter is optional, and onle needed if the process is not 
        already available in scdata.device.process.
    
        For a list of available processes call help(scdata.device.process)
    
        Examp

In [5]:
help(Device.process)

Help on function process in module scdata.device:

process(self, only_new=False, metrics=None)
    Processes devices metrics, either added by the blueprint definition
    or the addition using Device.add_metric(). See help(Device.add_metric) for
    more information about the definition of the metrics to be added
    
    Parameters
    ----------
    only_new: boolean
        False
        To process or not the existing channels in the Device.readings that are
        defined in Device.metrics
    metrics: list
        None
        List of metrics to process. If none, processes all
    Returns
    ----------
        boolean
        True if processed ok, False otherwise



In [10]:
import scdata
help(scdata.device.process.timeseries)
# help(sc.device.process.alphasense)
# help(sc.device.process.regression)

Help on module scdata.device.process.timeseries in scdata.device.process:

NAME
    scdata.device.process.timeseries

FUNCTIONS
    clean_ts(dataframe, **kwargs)
        Cleans the time series measurements sensors, by filling the out of band values with NaN
        Parameters
        ----------
            name: string
                column to clean to apply.
            limits: list, optional 
                (0, 99999)
                Sensor limits. The function will fill with NaN in the values that exceed the band
            window_size: int, optional 
                3
                If not None, will smooth the time series by applying a rolling window of that size
            window_type: str, optional
                None
                Accepts arguments in the list of windows for scipy.signal windows:
                https://docs.scipy.org/doc/scipy/reference/signal.html#window-functions
                Default to None implies normal rolling average
        Returns
        -

In [11]:
help(scdata.device.process.timeseries.poly_ts)

Help on function poly_ts in module scdata.device.process.timeseries:

poly_ts(dataframe, **kwargs)
    Calculates the a polinomy based on channels.
    Parameters
    ----------
        channels: list of strings
            list containing channels
        coefficients: list or np array
            list containing coefficients
        exponents: list or np array
            list containing exponents
        extra_term: float
            0
            Independent term
    Returns
    -------
        result = sum(coefficients[i]*channels[i]^exponents[i] + extra_term)



In [None]:
metric = {f'TEMP_POLY': {'process': 'poly_ts',
                           'kwargs': {'channels': ['TEMP', 'EXT_TEMP'],
                                      'coefficients': [1, -1]}
                        }}

test.devices['10751'].add_metric(metric)
test.devices['10751'].process(metrics = metric)

In [None]:
test.devices['10751'].readings

In [None]:
traces = {1: {'devices': '10751',
              'channel': 'TEMP_POLY',
              'subplot': 1},
          2: {'devices': '10751',
              'channel': 'TEMP',
              'subplot': 1},          
         }

options = {
            'frequency': '1H'
}
test.ts_iplot(traces = traces, options = options);

## Reprocessing

When adding a new metric, one can only process the added metric as above or the whole test (test.process()).

If processes take too long, when adding a metric, the new ones can be processed as: test.reprocess()

In [13]:
help(Test.reprocess)

Help on function reprocess in module scdata.test:

reprocess(self)
    Calculates only the new metrics in each of the devices
    Returns True if done OK



In [12]:
help(scdata.device.process.timeseries.clean_ts)

Help on function clean_ts in module scdata.device.process.timeseries:

clean_ts(dataframe, **kwargs)
    Cleans the time series measurements sensors, by filling the out of band values with NaN
    Parameters
    ----------
        name: string
            column to clean to apply.
        limits: list, optional 
            (0, 99999)
            Sensor limits. The function will fill with NaN in the values that exceed the band
        window_size: int, optional 
            3
            If not None, will smooth the time series by applying a rolling window of that size
        window_type: str, optional
            None
            Accepts arguments in the list of windows for scipy.signal windows:
            https://docs.scipy.org/doc/scipy/reference/signal.html#window-functions
            Default to None implies normal rolling average
    Returns
    -------
        pandas series containing the clean



In [None]:
metric = {f'PM_1_CLEAN': {'process': 'clean_ts',
                           'kwargs': {'name': 'PM_1', 'limits': [0, 1000], 'window_size': 3}
                        }}

test.devices['10751'].add_metric(metric)
test.reprocess()

In [None]:
traces = {1: {'devices': '10751',
              'channel': 'PM_1',
              'subplot': 1},
          2: {'devices': '10751',
              'channel': 'PM_1_CLEAN',
              'subplot': 1},          
         }

options = {
            'frequency': '1H'
}
test.ts_iplot(traces = traces, options = options);