# Multivariate Pipeline Tutorial

This notebook demonstrates how to use the multivariate detector pipeline with different formatting methods for multidimensional time series data.


In [1]:
import numpy as np
import pandas as pd
from sigllm.primitives.formatting import (
    JSONFormat,
    UnivariateControl,
    PersistenceControl,
    ValueConcatenation,
    ValueInterleave,
    DigitInterleave
)


## Create Sample Multivariate Data

First, let's create some sample multivariate time series data with 3 dimensions.


In [2]:
# Create sample data with 3 dimensions
N = 25
data = pd.DataFrame({
    'timestamp': np.linspace(0, 3600*(N-1), N),
    'x1': np.linspace(10, 9+N, N) / 100,
    'x2': np.array([i % 2 for i in range(N)]),
    'x3': np.linspace(N+40, 41, N) / 100,
})

print("Sample data shape:", data.shape)
data.head()


Sample data shape: (25, 4)


Unnamed: 0,timestamp,x1,x2,x3
0,0.0,0.1,0,0.65
1,3600.0,0.11,1,0.64
2,7200.0,0.12,0,0.63
3,10800.0,0.13,1,0.62
4,14400.0,0.14,0,0.61


## Available Formatting Methods

The multivariate pipeline supports several formatting methods to convert multi-dimensional data into string representations for LLM processing:

1. **JSONFormat**: Formats as d0:val,d1:val,... per timestamp
2. **ValueConcatenation**: Flattens all dimensions per timestamp
3. **ValueInterleave**: Interleaves values with zero-padding
4. **DigitInterleave**: Interleaves individual digits
5. **UnivariateControl**: Uses only first dimension (baseline)
6. **PersistenceControl**: Returns last value (naive baseline)


For example, given timesteps $t_0$ = [50, 30, 100] and $t_1$ = [55, 28, 104]:
* Value Concatenation - Simply flatten the values across time: 50,30,100,55,28,104
* Value Interleave - Pad values to equal digit length and concatenate timestep by timestep: 050030100,055028104
* Digit Interleave - Interleave digits positionally across dimensions: 001530000,001520584
* JSON Format - Encode as dimension-labeled key:value pairs: d0:50,d1:30,d2:100,d0:55,d1:28,d2:104
* Univariate Control - Keep only one dimension (baseline for comparison): 50,55
* Persistence Control - Bypass the formatting and return last known value: N/A


In [3]:
# Initialize JSONFormat method
json_method = JSONFormat()

# Create windowed test data (simulating pipeline output)
raw_data = np.array(data)[:, 1:]  # Remove timestamp column
windowed_data = np.array([raw_data[i:i+10,:] for i in range(0, len(raw_data)-10, 1)])
int_data = (1000 * windowed_data).astype(int)

print(f"Windowed data shape: {int_data.shape}")
print(f"\nFirst window (first 3 timestamps):")
print(int_data[0][:3])


Testing multivariate formatting method validity
(10, 15, 3)
d0:100,d1:0,d2:650,d0:110,d1:1000,d2:640,d0:120,d1:0,d2:630,d0:130,d1:1000,d2:620,d0:140,d1:0,d2:610,d0:150,d1:1000,d2:600,d0:160,d1:0,d2:590,d0:170,d1:1000,d2:580,d0:180,d1:0,d2:570,d0:190,d1:1000,d2:560,d0:200,d1:0,d2:550,d0:210,d1:1000,d2:540,d0:220,d1:0,d2:530,d0:230,d1:1000,d2:520,d0:240,d1:0,d2:510
[['d0:100,d1:0,d2:650,d0:110,d1:1000,d2:640,d0:120,d1:0,d2:630,d0:130,d1:1000,d2:620,d0:140,d1:0,d2:610,d0:150,d1:1000,d2:600,d0:160,d1:0,d2:590,d0:170,d1:1000,d2:580,d0:180,d1:0,d2:570,d0:190,d1:1000,d2:560,d0:200,d1:0,d2:550,d0:210,d1:1000,d2:540,d0:220,d1:0,d2:530,d0:230,d1:1000,d2:520,d0:240,d1:0,d2:510']
 ['d0:110,d1:1000,d2:640,d0:120,d1:0,d2:630,d0:130,d1:1000,d2:620,d0:140,d1:0,d2:610,d0:150,d1:1000,d2:600,d0:160,d1:0,d2:590,d0:170,d1:1000,d2:580,d0:180,d1:0,d2:570,d0:190,d1:1000,d2:560,d0:200,d1:0,d2:550,d0:210,d1:1000,d2:540,d0:220,d1:0,d2:530,d0:230,d1:1000,d2:520,d0:240,d1:0,d2:510,d0:250,d1:1000,d2:500']
 ['d0:120

In [None]:
# Compare string representations from different methods
methods = {
    'JSONFormat': JSONFormat(),
    'ValueConcatenation': ValueConcatenation(),
    'ValueInterleave': ValueInterleave(),
    'DigitInterleave': DigitInterleave(),
    'UnivariateControl': UnivariateControl(),
    'PersistenceControl': PersistenceControl(),
}


In [9]:
print("Comparison of formatting methods on the same data:\n")
for name, method in methods.items():
    try:
        print(f"{name}:")
        output = method.format_as_string(int_data)
        print(f"  {output[0][:80]}...\n")
    except Exception as e:
        print(f"{name}: Error - {e}\n")

Comparison of formatting methods on the same data:

JSONFormat:
d0:100,d1:0,d2:650,d0:110,d1:1000,d2:640,d0:120,d1:0,d2:630,d0:130,d1:1000,d2:620,d0:140,d1:0,d2:610,d0:150,d1:1000,d2:600,d0:160,d1:0,d2:590,d0:170,d1:1000,d2:580,d0:180,d1:0,d2:570,d0:190,d1:1000,d2:560
  d0:100,d1:0,d2:650,d0:110,d1:1000,d2:640,d0:120,d1:0,d2:630,d0:130,d1:1000,d2:62...

ValueConcatenation:
(15, 10, 3)
['100,0,650,110,1000,640,120,0,630,130,1000,620,140,0,610,150,1000,600,160,0,590,170,1000,580,180,0,570,190,1000,560', '110,1000,640,120,0,630,130,1000,620,140,0,610,150,1000,600,160,0,590,170,1000,580,180,0,570,190,1000,560,200,0,550', '120,0,630,130,1000,620,140,0,610,150,1000,600,160,0,590,170,1000,580,180,0,570,190,1000,560,200,0,550,210,1000,540', '130,1000,620,140,0,610,150,1000,600,160,0,590,170,1000,580,180,0,570,190,1000,560,200,0,550,210,1000,540,220,0,530', '140,0,610,150,1000,600,160,0,590,170,1000,580,180,0,570,190,1000,560,200,0,550,210,1000,540,220,0,530,230,1000,520', '150,1000,600,160,0,5