# Using Orion on Multivariate Input

In this notebook, we demonstrate how you can use multivariate time series in Orion. We will walk through the process using NASA's dataset, you can find the original data in [Telemanom](https://github.com/khundman/telemanom) github or directly from their [S3 bucket](https://s3-us-west-2.amazonaws.com/telemanom/data.zip).

## 1. Load the data

In the first step, we setup the environment and load the CSV that we want to process.

To do so, we need to import the `orion.data.load_signal` function and call it passing
the path to the CSV file.

In this case, we will be loading the `S-1.csv` file from inside the `data/multivariate` folder.

In [21]:
import pandas as pd
reference_time = pd.Timestamp('2024-01-01 00:00:00')
data = pd.read_csv("/Users/coloner/IdeaProjects/ml/Orion/data/TimeSeries.csv")

data['timestamp'] = pd.to_datetime(data.index, unit='s', origin=reference_time)


print(data.head())
print(type(data))

     v1    v2     v3   v4   v5   v6     v7     v8     v9    v10    v11  \
0 -2.00  1.51  10.14  0.0  0.0  0.0 -15.78 -22.31 -11.70 -13.57  92.95   
1 -2.00  1.51  10.13  0.0  0.0  0.0 -16.86 -23.38 -10.31 -13.57  92.95   
2 -2.00  1.51  10.13  0.0  0.0  0.0 -16.86 -23.38 -10.31 -13.57  92.95   
3 -1.99  1.51  10.17  0.0  0.0  0.0 -16.86 -23.38 -10.31 -13.57  92.95   
4 -1.99  1.51  10.17  0.0  0.0  0.0 -16.86 -23.38 -10.31 -13.57  92.95   

            timestamp  
0 2024-01-01 00:00:00  
1 2024-01-01 00:00:01  
2 2024-01-01 00:00:02  
3 2024-01-01 00:00:03  
4 2024-01-01 00:00:04  
<class 'pandas.core.frame.DataFrame'>


In [23]:
data['timestamp'] = data['timestamp'].view('int64') 
data['timestamp'] /= 1000000000

In [27]:
data['timestamp'] = data['timestamp'].astype('int64')

In [33]:
data.shape

(509632, 12)

## 2. Detect anomalies using Orion

Once we have the data, let us try to use the LSTM pipeline to analyze it and search for anomalies.

In order to do so, we will import the `Orion` class from `orion.core` and pass it
the loaded data and the path to the pipeline JSON that we want to use.

In this case, we will be using the `lstm_dynamic_threshold` pipeline from inside the `orion` folder. 

In addition, we setup the hyperparameters to correctly identify the signal we are trying to predict. In this case, dimension `0` is the signal value and such we set `target_column` to `0`. Note that `0` refers to the location of the channel rather than the name.

In [29]:
from orion import Orion

hyperparameters = {
    "mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1": {
        'interval': 1
    },
    "mlstars.custom.timeseries_preprocessing.rolling_window_sequences#1": {
        'target_column': 0
    },
    'orion.primitives.aer.AER#1': {
        'epochs': 5,
        'verbose': True
    }
}

orion = Orion(
    pipeline='aer',
    hyperparameters=hyperparameters
)

orion.fit(data)



Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


The output will be a ``pandas.DataFrame`` containing a table with the detected anomalies.

In [30]:
orion.detect(data)



Unnamed: 0,start,end,severity
0,1704459904,1704460350,0.342368
1,1704460750,1704461240,0.78753
2,1704473451,1704474093,0.722064


In [32]:
orion.save("/Users/coloner/IdeaProjects/ml/Orion/data/model.picle")









In [35]:
labels = pd.read_csv("/Users/coloner/IdeaProjects/ml/Orion/data/labelsTimeSeries.csv")

labels['timestamp'] = pd.to_datetime(labels.index, unit='s', origin=reference_time)

          start         end
0    1704077902  1704077902
1    1704078624  1704078624
2    1704079085  1704079085
3    1704079520  1704079520
4    1704079903  1704079903
..          ...         ...
438  1704575911  1704575911
439  1704576012  1704576012
440  1704576250  1704576250
441  1704576580  1704576580
442  1704576695  1704576695

[443 rows x 2 columns]


In [68]:
%%time
current_start = None
result = []
for i in range(len(labels)):
    if current_start != None and labels.iloc[i]['label'] == 0:
        result.append({"start": current_start, "end" : labels.iloc[i-1]["timestamp"]})
        current_start = None
    elif current_start == None and labels.iloc[i]['label'] == 1:
        current_start = row['timestamp']
if current_start != None:
    result.append({"start": current_start, "end" : labels.iloc[-1]["timestamp"]})

df_range = pd.DataFrame(result)
print(df_range)

    
    
        
    

          start         end
0    1704576831  1704077902
1    1704576831  1704078624
2    1704576831  1704079085
3    1704576831  1704079520
4    1704576831  1704079903
..          ...         ...
438  1704576831  1704575911
439  1704576831  1704576012
440  1704576831  1704576250
441  1704576831  1704576580
442  1704576831  1704576695

[443 rows x 2 columns]
CPU times: user 6.61 s, sys: 18.3 ms, total: 6.62 s
Wall time: 6.63 s


In [64]:
%%time
df = pd.DataFrame(data)

# Filter rows where label is 1
df_filtered = labels[labels['label'] == 1]

# Identify consecutive groups
df_filtered['group'] = (df_filtered['timestamp'].diff() != 1).cumsum()

# Aggregate start and end of each group
df_range2 = df_filtered.groupby('group').agg(start=('timestamp', 'first'), end=('timestamp', 'last')).reset_index(drop=True)

print(df_range2)

          start         end
0    1704077902  1704077902
1    1704078624  1704078624
2    1704079085  1704079085
3    1704079520  1704079520
4    1704079903  1704079903
..          ...         ...
438  1704575911  1704575911
439  1704576012  1704576012
440  1704576250  1704576250
441  1704576580  1704576580
442  1704576695  1704576695

[443 rows x 2 columns]
CPU times: user 8.99 ms, sys: 3.12 ms, total: 12.1 ms
Wall time: 9.54 ms


In [69]:
df_filtered['timestamp'].diff() != 1 

10702     True
11424     True
11885     True
12320     True
12703     True
          ... 
508711    True
508812    True
509050    True
509380    True
509495    True
Name: timestamp, Length: 443, dtype: bool

In [36]:
labels['timestamp'] = labels['timestamp'].view('int64') 
labels['timestamp'] /= 1000000000
labels['timestamp'] = labels['timestamp'].astype('int64')