# The TD2C Hypothesis: Information Asymmetry

Our framework is built on a physical intuition: **Causality creates an irreversible flow of information.**

Consider two variables $X$ and $Y$. If $X \to Y$, information flows from the past of $X$ to the present of $Y$. If we try to look at it backwards (from $Y$ to $X$), the statistical signature looks different.

## The Three Scenarios
We analyzed this asymmetry by counting "open paths" of information flow in three increasingly complex causal graphs.

### Scenario 1: Simple Chain
$$ X_{t-1} \to X_t \to Y_{t+1} $$
Here, the asymmetry is obvious. $X$ helps predict $Y$, but $Y$ does not help predict $X$ (before $Y$ happens).

### Scenario 2: Feedback and Self-Loops
When systems have strong auto-correlation ($Y_{t-1} \to Y_t$) or feedback loops, standard Granger tests often fail because "everything correlates with everything." However, our analysis shows that the **quantity** of information transferred is still higher in the causal direction than the anti-causal direction.

### Scenario 3: Latent Confounders
$$ X \leftarrow L \rightarrow Y $$
This is the hardest case. Both $X$ and $Y$ are correlated due to $L$. However, the *temporal structure* of how $L$ affects $X$ and $Y$ creates a subtle but detectable difference in the distribution of conditional mutual information.

## Operationalizing the Hypothesis
We capture this asymmetry using **Generalized Transfer Entropy**:

$$ \text{TE}_{fwd}^{(k)} = I(X_{t-1}; Y_t \mid Y_{t-k}) $$
$$ \text{TE}_{bwd}^{(k)} = I(Y_{t-1}; X_t \mid X_{t-k}) $$

TD2C computes these values for various lags $k$ and feeds them into a classifier. The classifier learns to distinguish the "fingerprint" of a true causal link ($X \to Y$) from a spurious one ($X \leftarrow L \to Y$).

## Visualizing the Temporal Markov Blanket

In the paper, we assume that for a variable $X_t$, its information is shielded from the rest of the graph by its immediate temporal neighbors. 

While a true Markov Blanket might include parents of children (spouses) across other variables, our approximation is:
$$ \mathbf{M}_{X_t} \approx \{ X_{t-1}, X_{t+1} \} $$

This greatly simplifies feature extraction. Let's visualize this specific "Time-Slice" view.

In [None]:
import networkx as nx
import matplotlib.pyplot as plt

def plot_temporal_blanket():
    G = nx.DiGraph()
    # Create a simple chain X_t-1 -> X_t -> X_t+1
    nodes = ["X_{t-1}", "X_t", "X_{t+1}"]
    G.add_path(nodes)
    
    # Add some "other" variables to show what we are ignoring
    G.add_edge("Y_{t-1}", "Y_t")
    G.add_edge("Y_t", "X_t") # A cross-sectional parent (often ignored in our approx)
    
    pos = {
        "X_{t-1}": (0, 1), "X_t": (1, 1), "X_{t+1}": (2, 1),
        "Y_{t-1}": (0, 0), "Y_t": (1, 0)
    }
    
    plt.figure(figsize=(8, 4))
    
    # Draw all nodes/edges grey first
    nx.draw(G, pos, with_labels=True, node_color='lightgrey', node_size=2000, font_size=10)
    
    # Highlight the Target and its assumed Blanket
    nx.draw_networkx_nodes(G, pos, nodelist=["X_t"], node_color='#CC4125', node_size=2000) # Target Red
    nx.draw_networkx_nodes(G, pos, nodelist=["X_{t-1}", "X_{t+1}"], node_color='#FC8D62', node_size=2000) # Blanket Orange
    
    plt.title("The TD2C Temporal Markov Blanket Assumption")
    plt.axis('off')
    plt.show()

plot_temporal_blanket()

# The TD2C Hypothesis

In this section, we adapt the static D2C framework to the temporal domain, introducing the **Temporal Dependency to Causality (TD2C)** approach. We posit that time adds a strict constraint to the causal graph that allows for robust feature engineering.

## 1. Assumptions for the Dynamic Case

To render the problem tractable, we make two specific assumptions for time series data:

1.  **Causal Stationarity:** The causal mechanisms (functions) do not change over the observation window.
2.  **First-Order Markov Self-Causality:** We assume that every variable is influenced by its own immediate past state:
    $$ \mathbf{z}_{i}^{(t-1)} \to \mathbf{z}_{i}^{(t)} $$

## 2. A Temporally-Aware Markov Blanket

In standard causal discovery, identifying the Markov Blanket is a complex estimation task often prone to errors. However, leveraging the self-causality assumption allows us to define a stable, "minimum" Markov Blanket for any variable $\mathbf{z}_i^{(t)}$ consisting of its immediate temporal neighbors:

$$
\mathbf{M}_{\mathbf{z}_i^{(t)}} = \{\mathbf{z}_i^{(t-1)}, \mathbf{z}_i^{(t+1)}\}
$$

This definition provides a canonically ordered set of neighbors—a guaranteed **parent** ($\mathbf{z}_i^{(t-1)}$) and a guaranteed **child** ($\mathbf{z}_i^{(t+1)}$). This allows us to move beyond statistical summaries (quantiles) used in static D2C and engineer highly specific features based on these known relationships.

## 3. The Hypothesis: Information Asymmetry

We hypothesize that a causal link creates a fundamental asymmetry in the flow of information through the temporal graph. Even when simple conditional independence tests fail due to confounding or non-linearity, a **quantitative asymmetry** persists in the number of open information paths.

Consider the following three scenarios of increasing complexity:

1.  **Scenario 1:** A simple causal chain.
2.  **Scenario 2:** Added complexity with longer lags.
3.  **Scenario 3:** The presence of a latent confounder $\mathbf{L}$.

![Causal Scenarios](../figures/scenarios.png)
*Figure 2: Three causal scenarios. Even as complexity increases from simple chains (a) to confounded systems (c), the asymmetry in information flow remains detectable.*

### Path Counting Experiment

To validate this, we analyzed the number of open d-separation paths for a Transfer Entropy-like test ($I(\mathbf{z}_i^{(t-k)}; \mathbf{z}_j^{(t)} \mid \mathbf{z}_j^{(t-k)})$) versus its reverse direction. As shown in the table below, the difference ($\Delta$) in open paths grows with complexity.

| Test Direction | Lag 1 | Lag 2 | Lag 3 | Lag 4 | Lag 5 | ... | Lag 9 |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| **Scenario 1: Simple** | | | | | | | |
| Forward ($\rightarrow$) | 1 | 2 | 3 | 4 | 5 | ... | 9 |
| Backward ($\leftarrow$) | 0 | 0 | 0 | 0 | 0 | ... | 0 |
| **Difference ($\Delta$)** | **1** | **2** | **3** | **4** | **5** | ... | **9** |
| | | | | | | | |
| **Scenario 2: Complex** | | | | | | | |
| Forward ($\rightarrow$) | 370 | 228 | 324 | 320 | 374 | ... | 615 |
| Backward ($\leftarrow$) | 312 | 153 | 227 | 183 | 194 | ... | 71 |
| **Difference ($\Delta$)** | **58** | **75** | **97** | **137** | **180** | ... | **544** |
| | | | | | | | |
| **Scenario 3: Confounded**| | | | | | | |
| Forward ($\rightarrow$) | 1186 | 880 | 1138 | 1123 | 1187 | ... | 1271 |
| Backward ($\leftarrow$) | 898 | 502 | 701 | 580 | 596 | ... | 290 |
| **Difference ($\Delta$)** | **288** | **378** | **437** | **543** | **591** | ... | **981** |

*Table 1: Number of open information paths. The quantitative asymmetry ($\Delta$) serves as a learnable signature for the classifier.*

## 4. Mutual Information Estimation

To capture the non-linear dependencies implied in these scenarios, we cannot rely on linear correlation. We employ the **KSG (Kraskov-Stögbauer-Grassberger)** estimator, a non-parametric k-nearest neighbor method.

For conditional MI, we use the identity:
$$
I(\mathbf{z}_i; \mathbf{z}_j \mid \mathbf{z}_k) = I((\mathbf{z}_i, \mathbf{z}_k); \mathbf{z}_j) - I(\mathbf{z}_k; \mathbf{z}_j)
$$

This ensures we capture complex, non-Gaussian interactions inherent in the definition of the descriptors.