## Running a Gordo workflow locally:

This demonstrates the basic workflow of gordo, running locally.

---

### Import and initialize a Gordo dataset
In this case we shall be using the `DataLakeProvider` where `InfluxDataProvider` is also available.

In [None]:
import dateutil.parser
import yaml

from datetime import datetime

from gordo_components.dataset.datasets import TimeSeriesDataset
from gordo_components.data_provider.providers import DataLakeProvider
from gordo_components import serializer

data_provider = DataLakeProvider(storename="dataplatformdlsprod", interactive=True)
dataset = TimeSeriesDataset(from_ts=dateutil.parser.isoparse('2016-07-01T00:10:00+00:00'),
    to_ts=dateutil.parser.isoparse('2017-01-01T00:00:00+00:00'),
    tag_list=[
        'asgb.19ZT3950%2FY%2FPRIM',
        'asgb.19PST3925%2FDispMeasOut%2FPRIM'
    ],
    data_provider=data_provider)

### We'll need to login to Azure to authenticate the ability load data from the Data Lake

In [None]:
X, y = dataset.get_data()

In [3]:
X.head()

Unnamed: 0,asgb.19ZT3950%2FY%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM
2016-07-01 07:40:00+00:00,99.989201,46.329
2016-07-01 07:50:00+00:00,99.989201,46.329
2016-07-01 08:00:00+00:00,99.989201,46.329
2016-07-01 08:10:00+00:00,99.989201,46.329
2016-07-01 08:20:00+00:00,99.989201,46.329


### Define a pipeline for model building

In [4]:
config = yaml.load(
    """ 
    sklearn.pipeline.Pipeline:
        steps:
          - sklearn.preprocessing.data.MinMaxScaler
          - gordo_components.model.models.KerasAutoEncoder:
              kind: feedforward_hourglass
    """
)
pipe = serializer.pipeline_from_definition(config)
pipe

Pipeline(memory=None,
         steps=[('step_0', MinMaxScaler(copy=True, feature_range=(0, 1))),
                ('step_1',
                 <gordo_components.model.models.KerasAutoEncoder object at 0x7f6ef5a09be0>)],
         verbose=False)

### AutoEncoders were agreed to meet the specifications of a `Transformer`. Therefore, they do not implement a `predict` method.

We shall then call `fit_transform` or `fit` -> `transform` if desired to treat datasets separately. 

In [5]:
pipe.fit(X)
xhat = pipe.predict(X)

  return self.partial_fit(X, y)


Epoch 1/1


---
### `xhat` is now the auto-encoded result*

*where the first half of each resulting sample was the _input_ to the model and secondhalf is the _output_

In [6]:
xhat

array([[0.30955735, 0.2324789 ],
       [0.30955735, 0.2324789 ],
       [0.30955735, 0.2324789 ],
       ...,
       [0.05328919, 0.8471484 ],
       [0.05328919, 0.8471484 ],
       [0.05328919, 0.8471484 ]], dtype=float32)

### Using different aggregation methods
By default the `TimeSeriesDataset` resamples the data using the `mean`, but this can be customized, and one can even
use several aggregation methods. In the following example we use `max`

In [7]:
dataset = TimeSeriesDataset(from_ts=dateutil.parser.isoparse('2016-07-01T00:10:00+00:00'),
    to_ts=dateutil.parser.isoparse('2017-01-01T00:00:00+00:00'),
    tag_list=[
        'asgb.19ZT3950%2FY%2FPRIM',
        'asgb.19PST3925%2FDispMeasOut%2FPRIM'
    ],
    aggregation_methods="max",
    data_provider=data_provider)


In [8]:
X, y = dataset.get_data()

In [9]:
X.head()


Unnamed: 0,asgb.19ZT3950%2FY%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM
2016-07-01 07:40:00+00:00,100.032417,46.330772
2016-07-01 07:50:00+00:00,100.032417,46.330772
2016-07-01 08:00:00+00:00,100.032417,46.330772
2016-07-01 08:10:00+00:00,100.032417,46.330772
2016-07-01 08:20:00+00:00,100.032417,46.330772


We can also resample with multiple resampling methods:

In [10]:
dataset = TimeSeriesDataset(from_ts=dateutil.parser.isoparse('2016-07-01T00:10:00+00:00'),
    to_ts=dateutil.parser.isoparse('2017-01-01T00:00:00+00:00'),
    tag_list=[
        'asgb.19ZT3950%2FY%2FPRIM',
        'asgb.19PST3925%2FDispMeasOut%2FPRIM'
    ],
    aggregation_methods=["max","min","mean"],
    data_provider=data_provider)


In [11]:
X, y = dataset.get_data()

In [12]:
X.head()


Unnamed: 0,asgb.19ZT3950%2FY%2FPRIM_max,asgb.19ZT3950%2FY%2FPRIM_min,asgb.19ZT3950%2FY%2FPRIM_mean,asgb.19PST3925%2FDispMeasOut%2FPRIM_max,asgb.19PST3925%2FDispMeasOut%2FPRIM_min,asgb.19PST3925%2FDispMeasOut%2FPRIM_mean
2016-07-01 07:40:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 07:50:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 08:00:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 08:10:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
2016-07-01 08:20:00+00:00,100.032417,99.945984,99.989201,46.330772,46.327229,46.329
