## Run Gordo from only a config file

This is a higher level example of how gordo works.

Train a model from a config file.

---
** Some slight difference in how it _actually_ works, in that we normally use some resources from `gordo-infrastructure` for parsing the config that we don't have access to here. Yet.

---

In [1]:
import tempfile
import yaml
from pprint import pprint
from dateutil.parser import isoparse

import sys
sys.path.append("..")

from gordo_components import serializer
from gordo_components.builder import build_model
from gordo_components.data_provider.providers import DataLakeProvider

Using TensorFlow backend.


## Define some config file:

In [2]:
config = \
"""
machines:

  - name: statfjord-lstm-010613-010614
    dataset:
      tags: #list of tags 
        - DQ-PT-T-B30L%2FMeas1%2FPRIM
        - DQ-TT-T-B30L%2FMeas1%2FPRIM
        - FT-24684%2FMeas%2FPRIM
        - PT-13005%2FMeasA%2FPRIM
        - PT-13009%2FMeasA%2FPRIM
        - TT-13092%2FMeas1%2FPRIM
      train_start_date: 2013-06-01T00:10:00+00:00
      train_end_date: 2014-06-01T00:00:00+00:00
    metadata:
      metadata_name: statfjord_test
    
model:
  sklearn.pipeline.Pipeline:
    steps:
      - sklearn.preprocessing.data.StandardScaler
      - gordo_components.model.models.KerasLSTMAutoEncoder:
          kind: lstm_autoencoder
          lookback_window: 144
          encoding_dim: [4,3]
          decoding_dim: [3,4]
          out_func: linear
"""

## Simulate how Gordo extract required information from a config file

##### Note:
This is _not_ exactly how it's actually done. We use some resources available in `gordo-infrastructure` which is not available from `gordo-components`

In [3]:
# Load into a normal dict
config = yaml.load(config, Loader=yaml.BaseLoader)

# Model configuration
model_config = config['model']

# In this case, we only build a model for a single machine
machine_config = config['machines'][0]

# TODO: This is the ugliest portion, as we normally use resources [`Machine`] found in `gordo-infrastructure`
data_config  = {
    "type": "TimeSeriesDataset",  # We want to use `DataLakeBackedDataset` for data acquisition
    "from_ts": isoparse(machine_config['dataset']['train_start_date']),
    "to_ts": isoparse(machine_config['dataset']['train_end_date']),
    "tag_list": machine_config['dataset']['tags'],
    "data_provider": DataLakeProvider(storename="dataplatformdlsprod", interactive=True),
}

### Build model from data and model configs

This also optionally takes and will return metadata `dict` updated with various model building events

In [4]:
pipe, metadata = build_model(
    model_config=model_config, 
    data_config=data_config,
    metadata=machine_config['metadata']
)

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code DLWHFAPBF to authenticate.


ValueError: Found no data providers able to download the tag DQ-PT-T-B30L%2FMeas1%2FPRIM

---

### The trained model/pipeline:

In [None]:
pipe

### Metadata from the model and build process

In [None]:
print(yaml.dump(metadata))