# How to use time-series embeddings

This short guide shows how to use **time-series embedding networks**:
- `CausalCNNEmbedding`
- `TransformerEmbedding`

We’ll use a simple `torch.randn` simulator to demonstrate the workflow.

> In real use, replace `torch.randn` with a dynamical simulator (e.g., SIR, Lotka–Volterra).


In [None]:
import torch
from torch import nn

from sbi.inference import NPE
from sbi.neural_nets import embedding_nets, posterior_nn
from sbi.utils import BoxUniform

Let's define a simple simulator that returns random time-series data to mimic sequential observations.

In [None]:
torch.manual_seed(0)
def simulator(theta):
    return torch.randn(1, 100)
prior = BoxUniform(torch.tensor([0.0]), torch.tensor([1.0]))
thetas = prior.sample((50,))
xs = simulator(thetas)
x_o = simulator(torch.tensor([0.5]))

## Using CausalCNNEmbedding

We use `CausalCNNEmbedding` to extract **local temporal patterns** from the sequence and train a Neural Posterior Estimator

In [None]:
# Define a causal CNN embedding for 1D time-series data
embedding_cnn = embedding_nets.CausalCNNEmbedding(
    input_shape=(100,),     # 1D time-series length
    num_conv_layers=3,      # Number of CNN layers
    pool_kernel_size=10,    # Pooling window for temporal downsampling
    output_dim=16,          # Embedding size for NPE
)


# Build density estimator with embedding
density_estimator_cnn = posterior_nn(
    model="maf",
    embedding_net=embedding_cnn,
    z_score_x="none",
    z_score_y="none",
)

# Create and train NPE using the prior and simulated data
inference_cnn = NPE(prior=prior, density_estimator=density_estimator_cnn)
posterior_cnn = inference_cnn.append_simulations(thetas, xs).train()

# Draw posterior samples given an observed time series
samples_cnn = posterior_cnn.sample(torch.Size([10]), x_o)

## Using TransformerEmbedding

Next, we define a `TransformerEmbedding` that models **global dependencies** via self-attention.

In [None]:
# Transformer configuration for sequence embedding
cfg = dict(
    vit=False,             # Use standard transformer, not ViT-style
    feature_space_dim=192, # Internal embedding dimension (num_heads * head_dim)
    sequence_length=100,   # Number of time points
    output_dim=16,         # Output feature dimension for NPE
    num_layers=3,          # Transformer depth
    num_heads=12,          # Number of attention heads
    head_dim=16,           # Size per attention head
    d_model=192,           # ame as feature_space_dim
)

# Initialize the transformer embedding network
base_trans = embedding_nets.TransformerEmbedding(cfg)

# Project 1D inputs to match transformer feature dimension
class ProjectedTransformer(nn.Module):
    def __init__(self, transformer):
        super().__init__()
        self.proj, self.transformer = nn.Linear(1, 192), transformer

    def forward(self, x):
        if x.ndim == 2:
            x = x.unsqueeze(-1)
        x = self.proj(x)
        return self.transformer(x)

embedding_trans = ProjectedTransformer(base_trans)

# Build and train NPE with transformer embedding
density_estimator_trans = posterior_nn(
    model="maf",
    embedding_net=embedding_trans,
    z_score_x="none",
    z_score_y="none",
)

inference_trans = NPE(prior=prior, density_estimator=density_estimator_trans)
posterior_trans = inference_trans.append_simulations(thetas, xs).train()

# Sample from the learned posterior
samples_trans = posterior_trans.sample(torch.Size([10]), x_o)

### Notes
- **CausalCNNEmbedding** uses temporal convolutions for local dependencies.  
- **TransformerEmbedding** uses self-attention for global dependencies.  
- The small wrapper `ProjectedTransformer` projects scalar time-series to match the transformer’s feature space.
- Both embeddings integrate seamlessly into `posterior_nn`.
> Use GPU for long sequences or large networks.
