## Running a Gordo workflow locally:

This demonstrates the basic workflow of gordo, running locally.

---

### Import and initialize a Gordo dataset
In this case we shall be using the `DataLakeProvider` where `InfluxDataProvider` is also available.

In [None]:
import dateutil.parser
import yaml

from datetime import datetime

from gordo_dataset.datasets import TimeSeriesDataset
from gordo_dataset.data_provider.providers import DataLakeProvider
from gordo import serializer

data_provider = DataLakeProvider(storename="dataplatformdlsprod", interactive=True)
dataset = TimeSeriesDataset(train_start_date=dateutil.parser.isoparse('2016-07-01T00:10:00+00:00'),
    train_end_date=dateutil.parser.isoparse('2017-01-01T00:00:00+00:00'),
    tag_list=[
        'asgb.19ZT3950%2FY%2FPRIM',
        'asgb.19PST3925%2FDispMeasOut%2FPRIM'
    ],
    data_provider=data_provider)

### We'll need to login to Azure to authenticate the ability load data from the Data Lake

In [None]:
X, y = dataset.get_data()

In [3]:
X.head()

Unnamed: 0,asgb.19ZT3950%2FY%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM
2016-07-01 07:40:00+00:00,99.989201,46.329
2016-07-01 07:50:00+00:00,99.989201,46.329
2016-07-01 08:00:00+00:00,99.989201,46.329
2016-07-01 08:10:00+00:00,99.989201,46.329
2016-07-01 08:20:00+00:00,99.989201,46.329


### Define a pipeline for model building

In [4]:
config = yaml.load(
    """ 
    sklearn.pipeline.Pipeline:
        steps:
          - sklearn.preprocessing.MinMaxScaler
          - gordo.machine.model.models.KerasAutoEncoder:
              kind: feedforward_hourglass
    """
)
pipe = serializer.from_definition(config)
pipe

Pipeline(memory=None,
     steps=[('step_0', MinMaxScaler(copy=True, feature_range=(0, 1))), ('step_1', <gordo.machine.model.models.KerasAutoEncoder object at 0x7f64bd6de7f0>)])

### AutoEncoders were agreed to meet the specifications of a `Transformer`. Therefore, they do not implement a `predict` method.

We shall then call `fit_transform` or `fit` -> `transform` if desired to treat datasets separately. 

In [5]:
pipe.fit(X, y=X.copy())  # Our target is just X
xhat = pipe.predict(X)

  return self.partial_fit(X, y)


Epoch 1/1


---
### `xhat` is now the auto-encoded result*

*where the first half of each resulting sample was the _input_ to the model and secondhalf is the _output_

In [6]:
xhat

array([[0.32019111, 0.29786357, 0.32511118, 0.30381536],
       [0.32019111, 0.29786357, 0.32511118, 0.30381536],
       [0.32019111, 0.29786357, 0.32511118, 0.30381536],
       ...,
       [0.32429197, 0.26762422, 0.32392147, 0.27555713],
       [0.32429197, 0.26762422, 0.32392147, 0.27555713],
       [0.32450209, 0.26762422, 0.32392147, 0.27555713]])

## Using custom or multiple aggregation methods
TimeSeriesDataset supports customization of the aggregation method used for the resampled buckets, and it can even use multiple aggregation methods. 

### Custom aggregation method

In [6]:
# Remember to load the first cell to have the required imports
dataset = TimeSeriesDataset(train_start_date=dateutil.parser.isoparse('2016-07-01T00:10:00+00:00'),
    train_end_date=dateutil.parser.isoparse('2017-01-01T00:00:00+00:00'),
    tag_list=[
        'asgb.19ZT3950%2FY%2FPRIM',
        'asgb.19PST3925%2FDispMeasOut%2FPRIM'
    ],
    aggregation_methods="max",
    data_provider=data_provider)
X, y = dataset.get_data()
X.head()

Unnamed: 0,asgb.19ZT3950%2FY%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM
2016-07-01 07:40:00+00:00,100.032417,46.330772
2016-07-01 07:50:00+00:00,100.032417,46.330772
2016-07-01 08:00:00+00:00,100.032417,46.330772
2016-07-01 08:10:00+00:00,100.032417,46.330772
2016-07-01 08:20:00+00:00,100.032417,46.330772


### Multiple aggregation methods
When using multiple aggregation methods the returned dataframe will have multi-level columns, with the tag-name as top-level  and aggregation method as the second level. 

In [7]:
# Remember to load the first cell to have the required imports
dataset = TimeSeriesDataset(train_start_date=dateutil.parser.isoparse('2016-07-01T00:10:00+00:00'),
    train_end_date=dateutil.parser.isoparse('2017-01-01T00:00:00+00:00'),
    tag_list=[
        'asgb.19ZT3950%2FY%2FPRIM',
        'asgb.19PST3925%2FDispMeasOut%2FPRIM'
    ],
    aggregation_methods=["max","min","mean"],
    data_provider=data_provider)
X, y = dataset.get_data()
X.head()

tag,asgb.19ZT3950%2FY%2FPRIM,asgb.19ZT3950%2FY%2FPRIM,asgb.19ZT3950%2FY%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM
aggregation_method,max,min,mean,max,min,mean
2016-07-01 07:40:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 07:50:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 08:00:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 08:10:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 08:20:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
