## **Aggregate time-series dataframe**

performs a rolling aggregation on `df_artifact`, over `window` by the selected `keys`
applying `metric_aggs` on `metrics` and `label_aggs` on `labels`.<br> 
adding `suffix` to the 
feature names.
    
    

### **Steps**


1. [Data exploration](#Data-exploration)
2. [Importing the function](#Importing-the-function)
3. [Running the function locally](#Running-the-function-locally)
4. [Running the function remotely](#Running-the-function-remotely)

### **Data exploration**

This is the dataset [Occupancy Detection Data Set, UCI](http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+)
as used in the article [how-to-predict-room-occupancy-based-on-environmental-factors](https://machinelearningmastery.com/how-to-predict-room-occupancy-based-on-environmental-factors/).<br>


> **Attribute Information:**<br>
    `date` - time year-month-day hour:minute:second<br>
    `Temperature` - in Celsius<br>
     Relative `Humidity` - %<br>
    `Light` - in Lux<br>
    `CO2` - in ppm<br>
    `Humidity Ratio` - Derived quantity from temperature and relative humidity, in kgwater-vapor/kg-air<br>
    `Occupancy` - 0 or 1, 0 for not occupied, 1 for occupied status

In [1]:
import mlrun
env_path = "../examples_ci.env"

env_dict = mlrun.set_env_from_file(env_path, return_dict=True)

In [2]:
import pandas as pd

data_path = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/aggregate/train_room_occupancy.csv'
df = pd.read_csv(data_path).set_index('date',drop=False)
df.head()

Unnamed: 0_level_0,date,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2015-02-04 17:51:00,2015-02-04 17:51:00,23.18,27.272,426.0,721.25,0.004793,1
2015-02-04 17:51:59,2015-02-04 17:51:59,23.15,27.2675,429.5,714.0,0.004783,1
2015-02-04 17:53:00,2015-02-04 17:53:00,23.15,27.245,426.0,713.5,0.004779,1
2015-02-04 17:54:00,2015-02-04 17:54:00,23.15,27.2,426.0,708.25,0.004772,1
2015-02-04 17:55:00,2015-02-04 17:55:00,23.1,27.2,426.0,704.5,0.004757,1


### **Importing the function**

In [3]:
import mlrun
mlrun.set_environment(project='function-marketplace')

fn = mlrun.import_function("hub://aggregate")
fn.apply(mlrun.auto_mount())

> 2021-10-18 07:14:27,886 [info] loaded project function-marketplace from MLRun DB


<mlrun.runtimes.kubejob.KubejobRuntime at 0x7fbab0c083d0>

In [5]:
import numpy as np

# Declaring a custom aggregation function
def dist_from_mean(l):
    mean = np.mean(l)
    return abs(list(l)[3] - mean)

### **Running the function locally**

In [7]:
aggregate_run = fn.run(name='aggregate',
                       params = {'metrics': ['Temperature','Humidity'],
                                 'labels': ['Occupancy'],
                                 'metric_aggs': ['mean','std',dist_from_mean],
                                 'label_aggs': ['sum'],
                                 'window': 5,
                                 'center': True},
                       inputs={'df_artifact': data_path},
                       local=True)

> 2021-10-18 07:14:28,145 [info] starting run aggregate uid=fc35b3eea88a40af915e18c09d93cb66 DB=http://mlrun-api:8080
> 2021-10-18 07:14:28,266 [info] Aggregating https://s3.wasabisys.com/iguazio/data/function-marketplace-data/aggregate/train_room_occupancy.csv
> 2021-10-18 07:14:33,760 [info] Logging artifact


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
function-marketplace,...9d93cb66,0,Oct 18 07:14:28,completed,aggregate,v3io_user=danikind=owner=danihost=jupyter-dani-6bfbd76d96-zxx6f,df_artifact,"metrics=['Temperature', 'Humidity']labels=['Occupancy']metric_aggs=['mean', 'std', ]label_aggs=['sum']window=5center=True",,aggregate





> 2021-10-18 07:14:34,007 [info] run executed, status=completed


In [8]:
aggregate_run.artifact('aggregate').as_df()

Unnamed: 0,date,Temperature,Humidity,Light,CO2,HumidityRatio,Occupancy,Temperature_mean,Temperature_std,Temperature_dist_from_mean,Humidity_mean,Humidity_std,Humidity_dist_from_mean,Occupancy_sum
2,2015-02-04 17:53:00,23.15,27.2450,426.0,713.500000,0.004779,1,23.146,2.880972e-02,0.004,27.2369,0.035204,0.0369,5.0
3,2015-02-04 17:54:00,23.15,27.2000,426.0,708.250000,0.004772,1,23.130,2.738613e-02,0.030,27.2225,0.031820,0.0225,5.0
4,2015-02-04 17:55:00,23.10,27.2000,426.0,704.500000,0.004757,1,23.120,2.738613e-02,0.020,27.2090,0.020125,0.0090,5.0
5,2015-02-04 17:55:59,23.10,27.2000,419.0,701.000000,0.004757,1,23.110,2.236068e-02,0.010,27.2000,0.000000,0.0000,5.0
6,2015-02-04 17:57:00,23.10,27.2000,419.0,701.666667,0.004757,1,23.100,1.870776e-08,0.000,27.2000,0.000000,0.0000,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8136,2015-02-10 09:27:00,21.00,35.8600,433.0,771.333333,0.005525,1,21.025,2.500000e-02,0.025,35.9315,0.158623,0.1185,5.0
8137,2015-02-10 09:28:00,21.05,36.0500,433.0,780.250000,0.005571,1,21.035,2.236068e-02,0.015,35.9905,0.091761,0.1070,5.0
8138,2015-02-10 09:29:00,21.05,36.0975,433.0,787.250000,0.005579,1,21.050,3.535534e-02,0.000,36.0195,0.098431,0.0245,5.0
8139,2015-02-10 09:29:59,21.05,35.9950,433.0,789.500000,0.005563,1,21.070,2.738613e-02,0.030,36.0995,0.098938,0.0045,5.0


### **Running the function remotely**

In [9]:
aggregate_run = fn.run(name='aggregate',
                       params = {'metrics': ['Temperature','Humidity'],
                                 'labels': ['Occupancy'],
                                 'metric_aggs': ['mean','std'],
                                 'label_aggs': ['sum'],
                                 'window': 5,
                                 'center': True},
                       inputs={'df_artifact': data_path},
                       local=False)


> 2021-10-18 07:14:34,103 [info] starting run aggregate uid=ec73626f9acd4a75af535e1ec9127e36 DB=http://mlrun-api:8080
> 2021-10-18 07:14:34,239 [info] Job is running in the background, pod: aggregate-vzd78
> 2021-10-18 07:14:39,540 [info] Aggregating https://s3.wasabisys.com/iguazio/data/function-marketplace-data/aggregate/train_room_occupancy.csv
> 2021-10-18 07:14:41,662 [info] Logging artifact
> 2021-10-18 07:14:41,845 [info] run executed, status=completed
final state: completed


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
function-marketplace,...c9127e36,0,Oct 18 07:14:39,completed,aggregate,v3io_user=danikind=jobowner=danihost=aggregate-vzd78,df_artifact,"metrics=['Temperature', 'Humidity']labels=['Occupancy']metric_aggs=['mean', 'std']label_aggs=['sum']window=5center=True",,aggregate





> 2021-10-18 07:14:43,446 [info] run executed, status=completed


[Back to the top](#Aggregate-time-series-dataframe)