# Gordo workflow

This is a higher level example of how gordo works.


Establish temporory warning filters:

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # Tensorflow INFO and WARNING messages are not printed

# Temporory suppress warnings
import warnings
warnings.filterwarnings("ignore", module='sklearn.base', category=UserWarning) # sklearn/base.py:446: UserWarning: X does not have valid feature names, but MinMaxScaler was fitted with feature names
warnings.filterwarnings("ignore", module='gordo.machine.model.anomaly.diff', category=FutureWarning) # gordo/machine/model/anomaly/diff.py:236: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead
from gordo_core.time_series import NotEnoughDataWarning
warnings.filterwarnings("ignore", category=NotEnoughDataWarning)

Train a model from a config file

In [2]:
import tempfile
import json

from pprint import pprint
from dateutil.parser import isoparse

from gordo import serializer
from gordo.builder import local_build

Define some config file:

In [3]:
config = \
"""
machines:
  - dataset: |
      tags:
        - SOME-TAG1
        - SOME-TAG2
      target_tag_list:
        - SOME-TAG3
        - SOME-TAG4
      train_end_date: '2019-03-01T00:00:00+00:00'
      train_start_date: '2019-01-01T00:00:00+00:00'
      data_provider:
        type: RandomDataProvider
    metadata: |
      information: Some sweet information about the model
    model: |
      gordo.machine.model.anomaly.diff.DiffBasedAnomalyDetector:
        base_estimator:
          sklearn.pipeline.Pipeline:
            steps:
            - sklearn.decomposition.PCA
            - sklearn.multioutput.MultiOutputRegressor:
                estimator: sklearn.linear_model.LinearRegression
    name: crazy-sweet-name
"""

Build model from data and model configs.

`gordo.builder.local_build` is a generator for building each machine in your string config, much in the same actual deployment on k8s works. 

Here we'll simply cast it as a list and get the first element, which is a `Tuple[BaseEstimator, dict]` of your model and metadata

In [4]:
pipe, metadata = next(local_build(config))

The trained model/pipeline:

In [5]:
pipe

Metadata from the model and build process:

In [10]:
metadata.to_yaml()

RepresenterError: ('cannot represent an object', Timestamp('2019-03-01 00:00:00+0000', tz='UTC'))