# In this notebook, we compute the TE (transfer entropy) to perform statistical significance tests

$TE_{B \rightarrow A} = H(A^+|A)-H(A^+|A,B)$

where $H(X|Y) = -E(\log P(X|Y)) = -\sum\limits_{x,y} P(x,y)\log P(x|y)$ is the conditional entropy of $X$ given $Y$, $A$ and $B$ are time series, and $A^+$ is the "future" of $A$

Here $TE_{XBT \rightarrow ETH}$ represents how much knowing XBT’s past helps predict ETH, beyond what ETH’s own past can tell us.

We first synchronize the asynchronous time series of XBT's features and ETH's prices. Then we we compute the transfer entropy to test for the statistical significance of the different features with a $\chi^2$ test (cf. [arXiv:2206.10173v1](https://arxiv.org/abs/2206.10173#) by Christian Bongiorno & Damien Challet) using a repository on [Christian Bongiorno's github PV-TE](https://github.com/bongiornoc/PV-TE).

In [None]:
import numpy as np
import pandas as pd
import scipy
import requests
from typing import Tuple


url_tepv = "https://raw.githubusercontent.com/bongiornoc/PV-TE/refs/heads/main/TEpv.py"
response = requests.get(url_tepv)
if response.status_code == 200:
    code = response.text
    # Execute the code dynamically
    exec(code)
else:
    print(f"Failed to fetch the file: {response.status_code}")



In [None]:
features = pd.read_parquet("../data/features/DATA_0/XBT_EUR.parquet")

eth = pd.read_parquet("../data/preprocessed/DATA_0/ETH_EUR.parquet")
eth = eth["level-1-bid-price"]

In [None]:
def backward_matching(A: pd.DataFrame, B: pd.DataFrame, timeshift=pd.Timedelta('0s'), 
                  fill_method: str = 'linear') -> Tuple[pd.DataFrame, pd.DataFrame]:
    """
    Transform asynchronous time series into synchronous time series using the union method.
    For each timestamp from A, takes the lates timestamp of B shifted by timeshift.
    
    Args:
        A (pd.DataFrame): target time series with datetime index
        B (pd.DataFrame): base time series with datetime index (the one that will be synced)
    
    Returns:
        pd.DataFrame: Synchronized time series B_sync (with respect to A)
    """
    # Shift B by the specified timeshift
    B_shifted = B.shift(freq=timeshift)
    
    # Reindex B to match the index of A_shifted, using the latest available values
    A_sync = B.reindex(A.index, method='ffill')
    
    return B_sync

In [None]:
synced_features = backward_matching(features, eth, timeshift=pd.Timedelta('1ms'), fill_method='linear')
TE_test_result = [transfer_entropy_analysis(features.iloc[:-1][feat], eth.iloc[:-1], eth.iloc[1:]) for feat in features.columns if feat != 'timestamp']