# Multivariate Pipeline Tutorial

This notebook demonstrates how to use the multivariate detector pipeline with different formatting methods for multidimensional time series data.


In [1]:
import numpy as np
import pandas as pd
from sigllm.primitives.formatting import (
    JSONFormat,
    UnivariateControl,
    PersistenceControl,
    ValueConcatenation,
    ValueInterleave,
    DigitInterleave,
    utils
)


## Create Sample Multivariate Data

First, let's create some sample multivariate time series data with 3 dimensions.


In [2]:
# Create sample data with 3 dimensions
N = 25
raw_data = utils.create_test_data()
print(raw_data.head())

raw_data = raw_data.to_numpy()[:, 1:]
windowed_data = np.array([raw_data[i:i+15,:] for i in range(0, len(raw_data)-15, 1)])
data = (1000 * windowed_data).astype(int)

print("Sample data shape:", data.shape)


   timestamp    x1  x2    x3
0        0.0  0.10   0  0.65
1     3600.0  0.11   1  0.64
2     7200.0  0.12   0  0.63
3    10800.0  0.13   1  0.62
4    14400.0  0.14   0  0.61
Sample data shape: (10, 15, 3)


## Available Formatting Methods

The multivariate pipeline supports several formatting methods to convert multi-dimensional data into string representations for LLM processing:

1. **JSONFormat**: Formats as d0:val,d1:val,... per timestamp
2. **ValueConcatenation**: Flattens all dimensions per timestamp
3. **ValueInterleave**: Interleaves values with zero-padding
4. **DigitInterleave**: Interleaves individual digits
5. **UnivariateControl**: Uses only first dimension (baseline)
6. **PersistenceControl**: Returns last value (naive baseline)


For example, given timesteps $t_0$ = [50, 30, 100] and $t_1$ = [55, 28, 104]:
* Value Concatenation - Simply flatten the values across time: 50,30,100,55,28,104
* Value Interleave - Pad values to equal digit length and concatenate timestep by timestep: 050030100,055028104
* Digit Interleave - Interleave digits positionally across dimensions: 001530000,001520584
* JSON Format - Encode as dimension-labeled key:value pairs: d0:50,d1:30,d2:100,d0:55,d1:28,d2:104
* Univariate Control - Keep only one dimension (baseline for comparison): 50,55
* Persistence Control - Bypass the formatting and return last known value: N/A


In [3]:
# Compare string representations from different methods
methods = {
    'JSONFormat': JSONFormat(),
    'ValueConcatenation': ValueConcatenation(),
    'ValueInterleave': ValueInterleave(),
    'DigitInterleave': DigitInterleave(),
    'UnivariateControl': UnivariateControl(),
    'PersistenceControl': PersistenceControl(),
}


Validation suite passed
Validation suite passed
Validation suite passed
Validation suite passed
Validation suite passed


In [4]:
print("Comparison of formatting methods on the same data:\n")



for name, method in methods.items():
    try:
        print(f"{name}:")
        output = method.format_as_string(data)
        print(f"\t{output[0][:80]}...\n")
    except Exception as e:
        print(f"{name}: Error - {e}\n")

Comparison of formatting methods on the same data:

JSONFormat:
	d0:100,d1:0,d2:650,d0:110,d1:1000,d2:640,d0:120,d1:0,d2:630,d0:130,d1:1000,d2:62...

ValueConcatenation:
	100,0,650,110,1000,640,120,0,630,130,1000,620,140,0,610,150,1000,600,160,0,590,1...

ValueInterleave:
	010000000650,011010000640,012000000630,013010000620,014000000610,015010000600,01...

DigitInterleave:
	000106005000,010106104000,000106203000,010106302000,000106401000,010106500000,00...

UnivariateControl:
	100,110,120,130,140,150,160,170,180,190,200,210,220,230,240...

PersistenceControl:
	100,110,120,130,140,150,160,170,180,190,200,210,220,230,240...



# Deep dive into JSONFormat

In this section, we show an end-to-end use of the multivariate detector pipeline on 

In [5]:
from sigllm.primitives.formatting.utils import test_multivariate_formatting_validity, run_pipeline

method = JSONFormat()
test_multivariate_formatting_validity(method)
errors, y_hat, y = run_pipeline(method, multivariate_allowed_symbols=["d", ":", ","], verbose=False)
print(f"Mean Residual: {np.mean(errors)}")

Validation suite passed
Validation suite passed


  from .autonotebook import tqdm as notebook_tqdm
2026-02-09 02:51:38.434721: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2026-02-09 02:51:38.468218: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2026-02-09 02:51:38.468252: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2026-02-09 02:51:38.468287: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2026-02-09 02:51:38.475691: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical o

Mean Residual: 0.0



