# Using Orion for Custom Data

This notebook is quick tutorial to use Orion on custom data.

Before you start, please use GPU runtime for faster computation. From the top menu `Runtime -> Change runtime type -> T4 GPU`.

## Step 0: install Orion on Colab
Orion is available on pypi: https://pypi.org/project/orion-ml and can be installed directly via

In [1]:
! pip install orion-ml



In [2]:
import os
import orion
import mlstars
from mlblocks import add_pipelines_path, add_primitives_path

# Get the installation directory for the mlstars and orion packages
mlstars_path = os.path.dirname(mlstars.__file__)
orion_path = os.path.dirname(orion.__file__)

# Add the correct, dynamically found DIRECTORY paths
# Changed 'primitives.json' to 'primitives' for the mlstars path
add_primitives_path(os.path.join(mlstars_path, 'primitives'))
add_primitives_path(os.path.join(orion_path, 'primitives'))
add_pipelines_path(os.path.join(orion_path, 'pipelines'))

print("Paths added successfully! ✅")

Paths added successfully! ✅


In [3]:
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

## Step 1: Load your data from Google Drive

This step assumes that the data is already uploaded to Google Drive. If not, please pause here and upload your data to any desired folder in your Drive.

Next, mount your Drive such that Google Colab can access the files there.

In [4]:
from google.colab import drive

drive.mount('/content/gdrive')

Mounted at /content/gdrive


After mounting your drive, you should see a folder named `gdrive` in the "Files" tab on the left hand side.

Specify the location of the file (`file_path`) and load it using `pandas` into a dataframe.

Note: for the sake of running this tutorial, we load an existing signal supported by the Orion library.

In [5]:
import pandas as pd
import numpy as np

# Specify the path to your data file in Google Drive
file_path = '/content/gdrive/MyDrive/Colab Notebooks/Traffic analysis/Datasets/cs_inbound_AT01.csv'

try:
    data = pd.read_csv(file_path)
    print("Data loaded successfully:")
    print(data.head())
except FileNotFoundError:
    print(f"Error: The file was not found at {file_path}")
    print("Please make sure the file path is correct and the file exists in your Google Drive.")


Data loaded successfully:
             timestamp      value
0  2024-11-01 00:00:00  22.580645
1  2024-11-01 00:15:00  74.855305
2  2024-11-01 00:30:00   0.000000
3  2024-11-01 00:45:00   0.000000
4  2024-11-01 01:00:00  19.696970


If your data is not following the Orion standard, we need to format it such that it contains two columns:
- **timestamp**: an integer representation of time.
- **values**: the observed value of the time series at that specific time.

Format the data if necessary

In [6]:
import pandas as pd
import numpy as np

# convert the timestamp column into timestamps (integer values)

timestamps = pd.to_datetime(data['timestamp'])
data['timestamp'] = timestamps.values.astype(np.int64) // 10 ** 9

# rename columns in the: timestamp, and value | condition satisfied already

data = data.rename({"timestamp": "timestamp", "value": "value"})
data.head()

Unnamed: 0,timestamp,value
0,1730419200,22.580645
1,1730420100,74.855305
2,1730421000,0.0
3,1730421900,0.0
4,1730422800,19.69697


## Step 2: Run Orion

Use Orion to find anomalies in your time series signal.

Orion provides a collection of anomaly detection pipelines which you can choose from. You can view the pipelines and their ranking in our [leaderbord](https://github.com/sintel-dev/Orion?tab=readme-ov-file#leaderboard).

In this tutorial, we will use `AER` model.

In [7]:
import matplotlib.pyplot as plt
import pandas as pd
import time
from orion import Orion

# Define all available pipelines with their configurations
PIPELINES_CONFIG = {
    'aer': {
        'hyperparameters': {
            'mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1': {
                'interval': 3600
            },
            'orion.primitives.aer.AER#1': {
                'epochs': 5,
                'verbose': True
            }
        }
    },
    'tadgan': {
        'hyperparameters': {
            'mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1': {
                'interval': 3600
            },
            'orion.primitives.tadgan.TadGAN#1': {
                'epochs': 5,
                'verbose': False
            }
        }
    },
    'lstm_dynamic_threshold': {
        'hyperparameters': {
            'mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1': {
                'interval': 3600
            },
            'keras.Sequential.LSTMTimeSeriesRegressor#1': {
                'epochs': 5,
                'verbose': True
            }
        }
    },
    'lstm_ae': {
        'hyperparameters': {
            'mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1': {
                'interval': 3600
            },
            'keras.Sequential.LSTMAutoEncoder#1': {
                'epochs': 5,
                'verbose': True
            }
        }
    },
    'dense_ae': {
        'hyperparameters': {
            'mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1': {
                'interval': 3600
            },
            'keras.Sequential.DenseAutoEncoder#1': {
                'epochs': 5,
                'verbose': True
            }
        }
    },
    'vae': {
        'hyperparameters': {
            'mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1': {
                'interval': 21600
            },
            'orion.primitives.vae.VAE#1': {
                'epochs': 5,
                'verbose': True
            }
        }
    },
    'arima': {
        'hyperparameters': {
            'mlstars.custom.timeseries_preprocessing.time_segments_aggregate#1': {
                'interval': 21600
            }
        }
    }
}

# Results storage
pipeline_results = {}

def test_pipeline(pipeline_name, config, data):
    """Test a single pipeline and return results"""
    print(f"\n{'='*60}")
    print(f"TESTING PIPELINE: {pipeline_name.upper()}")
    print(f"{'='*60}")

    try:
        start_time = time.time()

        # Create Orion instance
        orion = Orion(
            pipeline=pipeline_name,
            hyperparameters=config['hyperparameters']
        )

        print(f"✅ Pipeline '{pipeline_name}' initialized successfully")

        # Fit the pipeline
        print("🔄 Fitting pipeline...")
        orion.fit(data)
        fit_time = time.time() - start_time
        print(f"✅ Pipeline fitted in {fit_time:.2f} seconds")

        # Detect anomalies
        print("🔍 Detecting anomalies...")
        detect_start = time.time()
        anomalies = orion.detect(data)
        detect_time = time.time() - detect_start

        total_time = time.time() - start_time

        print(f"✅ Anomaly detection completed in {detect_time:.2f} seconds")
        print(f"⏱️  Total execution time: {total_time:.2f} seconds")
        print(f"📊 Number of anomalies detected: {len(anomalies)}")

        if len(anomalies) > 0:
            print("\nDetected Anomalies:")
            print(anomalies)
        else:
            print("No anomalies detected.")

        return {
            'status': 'success',
            'anomalies': anomalies,
            'fit_time': fit_time,
            'detect_time': detect_time,
            'total_time': total_time,
            'num_anomalies': len(anomalies),
            'orion_instance': orion
        }

    except Exception as e:
        error_time = time.time() - start_time
        print(f"❌ Pipeline '{pipeline_name}' failed after {error_time:.2f} seconds")
        print(f"Error: {str(e)}")
        return {
            'status': 'failed',
            'error': str(e),
            'error_time': error_time,
            'anomalies': pd.DataFrame(),
            'num_anomalies': 0
        }

print("Pipeline testing framework ready! 🚀")


Pipeline testing framework ready! 🚀


In [8]:
# Test AER pipeline
pipeline_results['aer'] = test_pipeline('aer', PIPELINES_CONFIG['aer'], data)



TESTING PIPELINE: AER
✅ Pipeline 'aer' initialized successfully
🔄 Fitting pipeline...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
✅ Pipeline fitted in 47.32 seconds
🔍 Detecting anomalies...
✅ Anomaly detection completed in 6.38 seconds
⏱️  Total execution time: 53.71 seconds
📊 Number of anomalies detected: 1

Detected Anomalies:
        start         end  severity
0  1730700000  1732363200  0.135435


In [9]:
# Test TadGAN pipeline
pipeline_results['tadgan'] = test_pipeline('tadgan', PIPELINES_CONFIG['tadgan'], data)



TESTING PIPELINE: TADGAN
✅ Pipeline 'tadgan' initialized successfully
🔄 Fitting pipeline...


  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)
ERROR:mlblocks.mlpipeline:Exception caught fitting MLBlock orion.primitives.tadgan.TadGAN#1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/mlblocks/mlpipeline.py", line 644, in _fit_block
    block.fit(**fit_args)
  File "/usr/local/lib/python3.11/dist-packages/mlblocks/mlblock.py", line 311, in fit
    getattr(self.instance, self.fit_method)(**fit_kwargs)
  File "/usr/local/lib/python3.11/dist-packages/orion/primitives/tadgan.py", line 360, in fit
    self._fit((X, y))
  File "/usr/local/lib/python3.11/dist-packages/orion/primitives/tadgan.py", line 335, in _fit
    losses = self._format_losses([epoch_cx_loss, epoch_cz_loss, epoch_eg_loss])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/orion/primitives/tadgan.py", line 224, in _format_losses
    output[LOSS_NAMES[i][0]] = losses[i][

❌ Pipeline 'tadgan' failed after 12.50 seconds
Error: invalid index to scalar variable.
