# Idea: Combine SetFunTS with convolutional Encoder


Inputs: Triplet (time, value, indicator)

- Reorganize as {channel: (time, value)}
- Group the channels as slow channels and fast channels.
- Apply a convolutional model to the fast channels, and reduce to match slow channels.
    - Alternative: Variable width convolution
- Question: Use 1d convolution? Use 1d convolution with shared params?
- Use 2d convolution over time+value? (or time+value+indicator)?
- Use 2d conv with shared params or 1 per channel?

## Irregular Time Convolution

Convolution: $(f*g)(t) = ∫f(τ)g(t-τ)𝖽τ ≈ ∑_{τᵢ∈𝓝(t)}f(τᵢ)g(t-τᵢ)ω(τᵢ)$

- Pooling: Once a convolutional layer is set up, we ca pool it at arbitrary intermediate points!
    - So where do we actually pool?
    - Pool at observation times of slow channels!
    - Pool at automatically determined times
- Map: $(T⊕ℝ)^* ⟶ (T→ℝᵏ)$


## Convolution with missing values

Simple: Ignore NaN's (only works with 1d convolutions!)





## Implementation Idea 1

- Use separate 2d convolution over time+value (ignore indicator) for each channel
- Use a shared 2d convolution model over time+value for each channel


## Implementation Idea 2

- Use a convolution over all fast channels simultaneously in triplet form.
    - time features, indicator features
    - issue: might need large kernel to get all information.

In [None]:
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'  # always print last expr.
%config InlineBackend.figure_format = 'svg'
%load_ext autoreload
%autoreload 2
%matplotlib inline

# import logging
# logging.basicConfig(level=logging.INFO)

In [None]:
import numpy as np
import pandas as pd

np.set_printoptions(precision=4, floatmode="fixed", suppress=True)
rng = np.random.default_rng()

In [None]:
from tsdm.tasks import KIWI_FINAL_PRODUCT

task = KIWI_FINAL_PRODUCT()
ts = task.timeseries.sort_index(axis="index").sort_index(axis="columns")

In [None]:
channel_freq = pd.notna(ts).mean().sort_values()
fast_channels = channel_freq[channel_freq >= 0.1].index
slow_channels = channel_freq[channel_freq < 0.1].index
FAST = ts[fast_channels].dropna(how="all")
SLOW = ts[slow_channels].dropna(how="all")
groups = {"fast": fast_channels, "slow": slow_channels}

In [None]:
from tsdm.encoders import *

encoder = ChainedEncoder(
    TensorEncoder(names=("time", "value", "index")),
    DataFrameEncoder(
        column_encoders={
            "value": IdentityEncoder(),
            tuple(ts.columns): FloatEncoder("float32"),
        },
        index_encoders=MinMaxScaler() @ DateTimeEncoder(unit="h"),
    ),
    TripletEncoder(sparse=True),
    Standardizer(),
)

In [None]:
DataFrameEncoder: DataFrame -> DataFrame

In [None]:
encoder = ChainedEncoder(
    TensorEncoder(names=("time", "value", "index")),
    DataFrameEncoder(
        column_encoders={
            "value": IdentityEncoder(),
            tuple(ts.columns): FloatEncoder("float32"),
        },
        index_encoders=MinMaxScaler() @ DateTimeEncoder(unit="h"),
    ),
    TripletEncoder() | TripletEncoder(),
    DataFrameSplitter(groups),
    Standardizer(),
)

In [None]:
new = pd.concat((SLOW, FAST), axis="columns")
new = new.sort_index(axis="index").sort_index(axis="columns")

In [None]:
pd.testing.assert_frame_equal(ts, new)

In [None]:
encoder = ChainedEncoder(
    # TensorEncoder(names=("time", "value", "index")),
    DataFrameEncoder(
        column_encoders=IdentityEncoder(),
        index_encoders=IdentityEncoder(),
    ),
    Standardizer(),
)

In [None]:
ts = task.timeseries  # .loc[439, 15325]
encoder.fit(ts)
encoded = encoder.encode(ts)
decoded = encoder.decode(encoded)