In [1]:
# Logging setup
import logging

logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(level=logging.ERROR)
logging.getLogger('orion').setLevel(level=logging.INFO)

import warnings
warnings.simplefilter("ignore")

# Orion Tutorial

In the following steps we will learn how to set Orion up, run pipelines to detect anomalies
on our timeseries and then explore the results.

Overall, the steps that we will perform are:

1. Add _Datasets_, _Signals_, _Templates_, _Pipelines_ and _Experiments_ to our Database.
2. Create start _Dataruns_, which create _Signalruns_ and _Events_.
3. Explore the _Signalrun_ results and the detected _Events_.
4. Add _Annotations_ to the existing _Events_ as well as new manual _Events_.

## Creating an instance of the OrionDBExplorer

In order to connect to the database, all you need to do is import and create an instance of the
`OrionDBExplorer` class.

Note that, because of the dynamic schema-less nature of MongoDB, no database initialization
or table creation is needed. All you need to do start using a new database is create the
`OrionDBExplorer` instance with the right connection details and start using it!

In order to create the `OrionDBExplorer` instance you will need to pass:

* `user`: An identifier of the user that is running Orion.
* `database`: The name of the MongoDB database to use. This is optional and defaults to `orion`.

In [2]:
from orion.db import OrionDBExplorer

orex = OrionDBExplorer(user='my_username', database='orion-usage-example')

This will directly create a connection to the database named `'orion'` at the default
MongoDB host, `localhost`, and port, `27017`.

In case you wanted to connect to a different database, host or port, or in case user authentication
is enabled in your MongoDB instance, you can pass a dictionary or a path to a JSON file containing
any required additional arguments:

* `host`: Hostname or IP address of the MongoDB Instance. Defaults to `'localhost'`.
* `port`: Port to which MongoDB is listening. Defaults to `27017`.
* `username`: username to authenticate with.
* `password`: password to authenticate with.
* `authentication_source`: database to authenticate against.

Once we have created the `OrionDBExplorer` instance, and to be sure that we are ready to follow
the tutorial, let's do the following two set-up setps:

1. Drop the `orion-usage-example` database

**WARNING**: This will remove all the data that exists in this database!

In [3]:
orex.drop_database()

2. Make sure to have downloaded some demo data using the `orion.data.download_demo()` function

In [4]:
from orion.data import download_demo

download_demo()

INFO:orion.data:Downloading Orion Demo Data to folder orion-data


This will create a folder called `orion-data` in your current directory with the 3 CSV files
that we will use later on.

## Setting up the Orion Environment

The first thing that you will need to do to start using **Orion** with a Database will be
to add information about your data and your pipelines.

This can be done by using the methods of the `OrionDBExplorer` class that are documenteted below,
which allow creating the corresponding objects in the Database.

### Add a Dataset

In order to add a dataset you can use the `add_dataset` method, which has the following arguments:

* `name (str)`: Name of the dataset
* `entity (str)`: Name or Id of the entity which this dataset is associated to

Let's create the `Demo Dataset` that we will use for our demo.

In [5]:
dataset = orex.add_dataset(
    name='Demo Dataset',
    entity='Orion',
)

This call will try to create a new _Dataset_ object in the database and return it.

We can now see the _Dataset_ that we just created using the `get_datasets` method:

In [6]:
orex.get_datasets()

Unnamed: 0,dataset_id,created_by,entity,insert_time,name
0,602c91be219a13a319ca725c,my_username,Orion,2021-02-17 03:47:10.234,Demo Dataset


### Add a Signal

The next step is to add Signals. This can be done with the `add_signal` method, which expects:

* `name (str)`: Name of the signal
* `dataset (Dataset or ObjectID)`: Dataset Object or Dataset Id.
* `start_time (int)`: (Optional) minimum timestamp to be used for this signal. If not given, it
  defaults to the minimum timestamp found in the data.
* `stop_time (int)`: (Optional) maximum timestamp to be used for this signal. If not given, it
  defaults to the maximum timestamp found in the data.
* `data_location (str)`: URI of the dataset
* `timestamp_column (int)`: (Optional) index of the timestamp column. Defaults to 0.
* `value_column (int)`: (Optional) index of the value column. Defaults to 1.

For example, adding the `S-1` signal to the Demo Dataset that we just created could be done like
this:

In [7]:
orex.add_signal(
    name='S-1',
    dataset=dataset,
    data_location='orion-data/S-1.csv'
)

<Signal: Signal object>

Additionally, we can also add all the signals that exist inside a folder by using the `add_signals`
method, passing a `signals_path`:

In [8]:
orex.add_signals(
    dataset=dataset,
    signals_path='orion-data'
)

After this is done, we can see that one signal has been created for each one of the CSV
files that we downloaded before.

In [9]:
orex.get_signals(dataset=dataset)

Unnamed: 0,signal_id,created_by,data_location,dataset,insert_time,name,start_time,stop_time
0,602c91be219a13a319ca725d,my_username,orion-data/S-1.csv,602c91be219a13a319ca725c,2021-02-17 03:47:10.627,S-1,1222819200,1442016000
1,602c91be219a13a319ca725f,my_username,orion-data/P-1.csv,602c91be219a13a319ca725c,2021-02-17 03:47:10.920,P-1,1222819200,1468540800
2,602c91be219a13a319ca7260,my_username,orion-data/E-1.csv,602c91be219a13a319ca725c,2021-02-17 03:47:10.931,E-1,1222819200,1468951200


### Add a Template

The next thing we need to add is a _Template_ to the Database using the `add_template` method.

This method expects:

* `name (str)`: Name of the template.
* `template (dict or str)`: Optional. Specification of the template to use, which can be one of:
    * An MLPipeline instance
    * The name of a registered template
    * a dict containing the MLPipeline details
    * The path to a pipeline JSON file.
    
**Orion** comes with a few templates ready to be used, so let's have a look at the ones that exist
using the `orion.get_available_templates` function.

In [10]:
from orion.analysis import get_available_templates

get_available_templates()

['azure', 'arima', 'tadgan', 'lstm_dynamic_threshold', 'dummy']

And now let's create a _Template_ using the `lstm_dynamic_threshold` template.

In [11]:
template = orex.add_template(
    name='lstmdt',
    template='lstm_dynamic_threshold',
)

Using TensorFlow backend.


We can now see the _Template_ that we just created

In [12]:
orex.get_templates()

Unnamed: 0,template_id,created_by,insert_time,name
0,6026e925b402c68ae42b1105,my_username,2021-02-12 20:46:29.187,lstmdt


Also, during this step, apart from a _Template_ object, a _Pipeline_ object has also been
registred with the same name as the _Template_ and using the default hyperparameter values.

In [13]:
orex.get_pipelines()

Unnamed: 0,pipeline_id,created_by,insert_time,name,template
0,6026e925b402c68ae42b1106,my_username,2021-02-12 20:46:29.416,lstmdt,6026e925b402c68ae42b1105


However, if we want to use a configuration different from the default, we might want to
create another _Pipeline_ with custom hyperparameter values.

In order to do this we will need to call the `add_pipeline` method passing:

* `name (str)`: Name given to this pipeline
* `template (Template or ObjectID)`: Template or the corresponding id.
* `hyperparameters (dict or str)`: dict containing the hyperparameter details or path to the
  corresponding JSON file. Optional.

For example, if we want to specify a different number of epochs for the LSTM primitive of the
pipeline that we just created we will run:

In [14]:
new_hyperparameters = {
   'keras.Sequential.LSTMTimeSeriesRegressor#1': {
       'epochs': 1,
       'verbose': True
   }
}
pipeline = orex.add_pipeline(
   name='lstmdt_1_epoch',
   template=template,
   hyperparameters=new_hyperparameters,
)

And we can see how a new _Pipeline_ was created in the Database.

In [15]:
orex.get_pipelines()

Unnamed: 0,pipeline_id,created_by,insert_time,name,template
0,6026e925b402c68ae42b1106,my_username,2021-02-12 20:46:29.416,lstmdt,6026e925b402c68ae42b1105
1,6026e925b402c68ae42b1107,my_username,2021-02-12 20:46:29.713,lstmdt_1_epoch,6026e925b402c68ae42b1105


### Add an Experiment

Once we have a _Dataset_ with _Signals_ and a _Template_, we are ready to add an
_Experiment_.

In order to run an _Experiment_ we will need to:

1. Get the _Dataset_ and the list of _Signals_ that we want to run the _Experiment_ on.
2. Get the _Template_ which we want to use for the _Experiment_
3. Call the `add_experiment` method passing all these with an experiment, a project name and a
   username.

For example, if we want to create an experiment using the _Dataset_, the _Signals_ and the
_Template_ that we just created, we will use:

In [16]:
experiment = orex.add_experiment(
    name='My Experiment',
    project='My Project',
    template=template,
    dataset=dataset,
)

This will create an _Experiment_ object in the database using the indicated _Template_
and all the _Signals_ from the given _Dataset_.

In [17]:
orex.get_experiments()

Unnamed: 0,experiment_id,created_by,dataset,insert_time,name,project,signals,template
0,6026e925b402c68ae42b1108,my_username,6026e920b402c68ae42b1100,2021-02-12 20:46:29.747,My Experiment,My Project,"[6026e921b402c68ae42b1101, 6026e921b402c68ae42...",6026e925b402c68ae42b1105


## Starting a Datarun

Once we have created our _Experiment_ object we are ready to start executing _Pipelines_ on our
_Signals_.

For this we will need to use the `orion.runner.start_datarun` function, which expects:

* `orex (OrionExplorer)`: The `OrionDBExplorer` instance.
* `experiment (Experiment or ObjectID)`: Experiment object or the corresponding ID.
* `pipeline (Pipeline or ObjectID)`: Pipeline object or the corresponding ID.

This will create a _Datarun_ object for this _Experiment_ and _Pipeline_ in the database,
and then it will start creating and executing _Signalruns_, one for each _Signal_ in the _Experiment_.

Let's trigger a _Datarun_ using the `lstmdt_1_epoch` _Pipeline_ that we created.

In [18]:
from orion.runner import start_datarun

start_datarun(orex, experiment, pipeline)

INFO:orion.runner:Datarun 6026e926b402c68ae42b1109 started
INFO:orion.runner:Signalrun 6026e926b402c68ae42b110a started
INFO:orion.runner:Running pipeline lstmdt_1_epoch on signal S-1


Train on 7919 samples, validate on 1980 samples
Epoch 1/1


INFO:orion.runner:Processing pipeline lstmdt_1_epoch predictions on signal S-1
INFO:orion.runner:Signalrun 6026e9a4b402c68ae42b110d started
INFO:orion.runner:Running pipeline lstmdt_1_epoch on signal P-1


Train on 8901 samples, validate on 2226 samples
Epoch 1/1


INFO:orion.runner:Processing pipeline lstmdt_1_epoch predictions on signal P-1
INFO:orion.runner:Signalrun 6026ea39b402c68ae42b110e started
INFO:orion.runner:Running pipeline lstmdt_1_epoch on signal E-1


Train on 8916 samples, validate on 2230 samples
Epoch 1/1


INFO:orion.runner:Processing pipeline lstmdt_1_epoch predictions on signal E-1


## Explore the results

Once a _Datarun_ has finished, we can see can see its status by using the `orex.get_dataruns` method.

In [19]:
orex.get_dataruns()

Unnamed: 0,datarun_id,end_time,experiment,insert_time,num_events,pipeline,start_time,status
0,6026e926b402c68ae42b1109,2021-02-12 20:54:15.818,6026e925b402c68ae42b1108,2021-02-12 20:46:29.978,3,6026e925b402c68ae42b1107,2021-02-12 20:46:30.082,SUCCESS


As well as the _Signalruns_ and _Events_ that were created.

In [20]:
datarun = orex.get_datarun(experiment=experiment)
signalruns = orex.get_signalruns(datarun=datarun)
signalruns

Unnamed: 0,signalrun_id,datarun,end_time,insert_time,num_events,signal,start_time,status
0,6026e926b402c68ae42b110a,6026e926b402c68ae42b1109,2021-02-12 20:48:36.182,2021-02-12 20:46:30.091,2,6026e921b402c68ae42b1101,2021-02-12 20:46:30.187,SUCCESS
1,6026e9a4b402c68ae42b110d,6026e926b402c68ae42b1109,2021-02-12 20:51:05.253,2021-02-12 20:48:36.183,0,6026e921b402c68ae42b1103,2021-02-12 20:48:36.184,SUCCESS
2,6026ea39b402c68ae42b110e,6026e926b402c68ae42b1109,2021-02-12 20:54:15.817,2021-02-12 20:51:05.256,1,6026e921b402c68ae42b1104,2021-02-12 20:51:05.258,SUCCESS


In [21]:
signalrun_id = signalruns['signalrun_id'].iloc[-1]
orex.get_events(signalrun=signalrun_id)

Unnamed: 0,event_id,insert_time,num_annotations,severity,signal,signalrun,source,start_time,stop_time
0,6026eaf7b402c68ae42b110f,2021-02-12 20:54:15.815,0,0.080653,6026e921b402c68ae42b1104,6026ea39b402c68ae42b110e,ORION,1392746400,1395057600


## Add Manual Events and Annotations

If we want to add new events manually, we can do so by calling the `add_event` method and
passing:

* `start_time (int)`: The timestamp at which the event starts
* `stop_time (int)`: The timestamp at which the event ends
* `source (str)`: If manual, the string `MANUALLY_CREATED`.
* `signal (Signal or ObjectID or str)`: The id to which the Event is associated.

In [22]:
signal = orex.get_signal(name='P-1')
event = orex.add_event(
    start_time=1393758300,
    stop_time=1408270800,
    source='MANUALLY_CREATED',
    signal=signal
)

And, optionally, add annotations to any of the events.

In [23]:
orex.add_annotation(
    event=event,
    tag='maneuver',
    comment='satellite was maneuvering during this period'
)

<Annotation: Annotation object>

In [24]:
unknown_event = orex.get_events().event_id.iloc[0]
orex.add_annotation(
    event=unknown_event,
    tag='unknown',
    comment='this needs to be investigated'
)

<Annotation: Annotation object>

We can then see the annotations that we just created

In [25]:
orex.get_annotations()

Unnamed: 0,annotation_id,comment,event,insert_time,tag
0,6026eaf8b402c68ae42b1111,satellite was maneuvering during this period,6026eaf7b402c68ae42b1110,2021-02-12 20:54:15.970,maneuver
1,6026eaf8b402c68ae42b1112,this needs to be investigated,6026e9a4b402c68ae42b110b,2021-02-12 20:54:16.131,unknown
