## Running a Gordo workflow locally:

This demonstrates the basic workflow of gordo, running locally.

---

### Import and initialize a Gordo dataset
In this case we shall be using the `DataLakeBackedDataset` where `InfluxBackedDataset` is also available.

In [None]:
import dateutil.parser
import yaml

from datetime import datetime

from gordo_components.dataset.datasets import TimeSeriesDataset
from gordo_components.data_provider.providers import DataLakeProvider
from gordo_components import serializer


# Parameters used by both DataLakeProvider and TimeSeriesDataset
kwargs = dict(
    from_ts=dateutil.parser.isoparse('2014-07-01T00:10:00+00:00'),
    to_ts=dateutil.parser.isoparse('2015-01-01T00:00:00+00:00'),
    tag_list=[
        'asgb.19ZT3950%2FY%2FPRIM',
        'asgb.19PST3925%2FDispMeasOut%2FPRIM'
    ],
    data_provider=DataLakeProvider(storename="dataplatformdlsprod", interactive=True),
)
dataset = TimeSeriesDataset(**kwargs)

### We'll need to login to Azure to authenticate the ability load data from the Data Lake

In [None]:
X, y = dataset.get_data()

In [3]:
X.head()

Unnamed: 0,asgb.19ZT3950%2FY%2FPRIM,asgb.19PST3925%2FDispMeasOut%2FPRIM
2014-07-01 07:50:00+00:00,98.516167,85.29229
2014-07-01 08:00:00+00:00,98.516167,85.29229
2014-07-01 08:10:00+00:00,98.516167,85.29229
2014-07-01 08:20:00+00:00,98.516167,85.29229
2014-07-01 08:30:00+00:00,98.516167,85.29229


### Define a pipeline for model building

In [4]:
config = yaml.load(
    """ 
    sklearn.pipeline.Pipeline:
        steps:
          - sklearn.preprocessing.data.MinMaxScaler
          - gordo_components.model.models.KerasAutoEncoder:
              kind: feedforward_hourglass
    """
)
pipe = serializer.pipeline_from_definition(config)
pipe

Pipeline(memory=None,
     steps=[('step_0', MinMaxScaler(copy=True, feature_range=(0, 1))), ('step_1', <gordo_components.model.models.KerasAutoEncoder object at 0x7f64bd6de7f0>)])

### AutoEncoders were agreed to meet the specifications of a `Transformer`. Therefore, they do not implement a `predict` method.

We shall then call `fit_transform` or `fit` -> `transform` if desired to treat datasets separately. 

In [5]:
pipe.fit(X)
xhat = pipe.predict(X)

  return self.partial_fit(X, y)


Epoch 1/1


---
### `xhat` is now the auto-encoded result*

*where the first half of each resulting sample was the _input_ to the model and secondhalf is the _output_

In [6]:
xhat

array([[0.32019111, 0.29786357, 0.32511118, 0.30381536],
       [0.32019111, 0.29786357, 0.32511118, 0.30381536],
       [0.32019111, 0.29786357, 0.32511118, 0.30381536],
       ...,
       [0.32429197, 0.26762422, 0.32392147, 0.27555713],
       [0.32429197, 0.26762422, 0.32392147, 0.27555713],
       [0.32450209, 0.26762422, 0.32392147, 0.27555713]])