# LLM Probing for Financial Signals

## Context
This is my most unique angle for Numerai. My PhD research focuses on mechanistic interpretability — understanding what information is encoded inside LLM representations. My ICLR 2025 paper (NNsight/NDIF) built infrastructure for probing LLM internals at scale.

**Key insight**: Before investing compute in fine-tuning an LLM on financial text, we should first *probe* the model to understand what financial knowledge already exists in its representations, and at which layers.

## Pipeline
Financial text → nnterp tracing → Extract hidden states per layer → Train linear probes → Identify which layers encode financial signals

**This notebook uses nnterp** (standardized wrapper around nnsight, my own tool) for clean hidden state extraction.

## Why This Matters for Numerai
- Different layers encode different information: early layers capture syntax, middle layers capture semantics, late layers capture task-specific features
- By probing, we can select the BEST layer for feature extraction — not just use the last layer
- This is a principled approach to "which representation should I use?" — a question most practitioners answer by guessing

## My Background
- **NNsight & NDIF** (ICLR 2025): Open-source suite for probing & manipulating LLM weights. Ray GCS backend with AWS object storage and VLLM.
- **Token Entanglement** (NeurIPS MechInterp 2025): Investigation of how information flows between tokens in LLMs
- **PhD Advisor**: David Bau (pioneer of network dissection and representation engineering)

In [None]:
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from nnterp import StandardizedTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler

## 1. Load a Small Language Model
We use Gemma-3-270M-IT (270M params, 18 transformer layers, 2048 hidden dim) for demonstration. In production, you'd probe larger models (Llama-3, Mistral) or financial-specific models.

In [None]:
model_name = "google/gemma-3-270m-it"
model = StandardizedTransformer(model_name, device_map="auto", dispatch=True)

n_layers = model.config.num_hidden_layers
hidden_dim = model.config.hidden_size
print(f"Model: {model_name}")
print(f"Layers: {n_layers}, Hidden dim: {hidden_dim}")
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

## 2. Financial Text Dataset with Labels
We create headlines labeled with return direction (up/down). The question: does Gemma-3 encode return-relevant information, and at which layer?

In [None]:
# Financial headlines with binary return labels
# In production: real headlines + real returns
data = [
    # Positive returns (label=1)
    ("Apple reports record quarterly revenue exceeding all analyst estimates", 1),
    ("NVIDIA data center revenue surges 400% on AI chip demand", 1),
    ("Microsoft Azure growth re-accelerates beating expectations", 1),
    ("Company announces massive share buyback program boosting stock", 1),
    ("Strong jobs report signals robust economic growth ahead", 1),
    ("Tech giant beats earnings estimates by wide margin", 1),
    ("Breakthrough drug receives FDA approval for rare disease", 1),
    ("Major acquisition deal completed at premium valuation", 1),
    ("Company raises full-year guidance citing strong demand", 1),
    ("Revenue growth accelerates driven by new product launch", 1),
    ("Profit margins expand as cost cutting measures take effect", 1),
    ("Company wins billion dollar government defense contract", 1),
    ("Subscriber growth exceeds expectations in latest quarter", 1),
    ("New partnership with industry leader boosts outlook", 1),
    ("Company achieves profitability ahead of schedule", 1),
    ("Strong holiday sales drive record fourth quarter results", 1),
    ("Analyst upgrades stock citing improved business fundamentals", 1),
    ("Company expands market share in key growth segment", 1),
    ("Cash flow reaches record high enabling increased dividends", 1),
    ("Patent portfolio expansion strengthens competitive position", 1),
    # Negative returns (label=0)
    ("Company misses revenue expectations amid weakening demand", 0),
    ("Tesla recalls 2 million vehicles over safety concerns", 0),
    ("Major data breach exposes millions of customer records", 0),
    ("Company announces significant workforce reduction layoffs", 0),
    ("Regulatory investigation launched into business practices", 0),
    ("Supply chain disruptions force production shutdown", 0),
    ("Product recall due to safety defects affects millions", 0),
    ("Company warns of lower guidance citing headwinds", 0),
    ("Unexpected quarterly loss shocks Wall Street analysts", 0),
    ("Key executive departure raises governance concerns", 0),
    ("Antitrust lawsuit threatens core business model", 0),
    ("Credit rating downgrade increases borrowing costs", 0),
    ("Market share losses accelerate in competitive landscape", 0),
    ("Accounting irregularities discovered during audit review", 0),
    ("Failed clinical trial sends biotech shares plummeting", 0),
    ("Trade restrictions limit access to key international market", 0),
    ("Debt levels raise concerns about financial sustainability", 0),
    ("Customer churn rate increases in latest reporting period", 0),
    ("Class action lawsuit filed against company by shareholders", 0),
    ("Competitor launches superior product at lower price point", 0),
]

texts = [t for t, _ in data]
labels = np.array([l for _, l in data])
print(f"Dataset: {len(texts)} headlines ({labels.sum()} positive, {len(labels) - labels.sum()} negative)")

## 3. Extract Hidden States at Every Layer

This is the core of probing. We use **nnterp** (a standardized wrapper around nnsight, my ICLR 2025 paper) to cleanly extract hidden states from every transformer layer. The key advantage: `model.layers_output[i]` works identically across all HuggingFace architectures — no need to know the model-specific module names.

We use the last token's representation (standard for causal LMs with left-to-right attention).

In [None]:
def extract_hidden_states(texts, model, n_layers):
    """Extract hidden states from every transformer layer using nnterp.

    Uses nnterp's standardized layer access (model.layers_output[i]) for clean extraction.
    Returns: dict mapping layer_idx -> np.array of shape (n_texts, hidden_dim)
    """
    all_hidden = {l: [] for l in range(n_layers)}

    for text in texts:
        with model.trace(text):
            # Each transformer layer output (last token = causal LM standard)
            layer_outs = []
            for i in range(n_layers):
                h = model.layers_output[i][:, -1, :].save()
                layer_outs.append(h)

        for i in range(n_layers):
            all_hidden[i].append(layer_outs[i].value.detach().cpu().float().numpy().squeeze())

    return {l: np.stack(v) for l, v in all_hidden.items()}

# Extract hidden states from all layers
print(f"Extracting hidden states from {n_layers} transformer layers using nnterp...")
hidden_states = extract_hidden_states(texts, model, n_layers)
print(f"Done! Shape per layer: {hidden_states[0].shape}")

## 4. Train Linear Probes Per Layer

A linear probe is a simple logistic regression trained on the hidden states to predict the label. If a layer's probe has high accuracy, that layer encodes the relevant information.

**This is the key experiment**: which layer of Gemma-3 best encodes financial sentiment/return direction?

In [None]:
# Train a linear probe at each layer
probe_results = []

for layer_idx in range(n_layers):
    X = hidden_states[layer_idx]

    # Standardize features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # Cross-validated logistic regression
    clf = LogisticRegression(max_iter=1000, C=1.0, random_state=42)
    scores = cross_val_score(clf, X_scaled, labels, cv=5, scoring='accuracy')

    probe_results.append({
        "layer": layer_idx,
        "mean_accuracy": scores.mean(),
        "std_accuracy": scores.std(),
        "layer_type": f"transformer_{layer_idx}",
    })

probe_df = pd.DataFrame(probe_results)
print("Probing Results by Layer:")
print(probe_df[["layer", "mean_accuracy", "std_accuracy"]].to_string(index=False, float_format="%.3f"))

## 5. The Probing Curve
This is the signature plot of probing analysis. It reveals WHERE in the network financial-relevant information is encoded.

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

# Plot probing accuracy by layer
ax.plot(probe_df["layer"], probe_df["mean_accuracy"], 'o-', color='steelblue', 
        linewidth=2, markersize=8, label="Probe Accuracy")
ax.fill_between(probe_df["layer"], 
                probe_df["mean_accuracy"] - probe_df["std_accuracy"],
                probe_df["mean_accuracy"] + probe_df["std_accuracy"],
                alpha=0.2, color='steelblue')

# Reference lines
ax.axhline(y=0.5, color='red', linestyle='--', alpha=0.5, label="Random Baseline (50%)")
best_layer = probe_df.loc[probe_df["mean_accuracy"].idxmax()]
ax.axvline(x=best_layer["layer"], color='green', linestyle='--', alpha=0.5,
           label=f"Best Layer: {int(best_layer['layer'])} ({best_layer['mean_accuracy']:.1%})")

ax.set_xlabel("Transformer Layer Index", fontsize=12)
ax.set_ylabel("5-Fold CV Accuracy", fontsize=12)
ax.set_title("Financial Sentiment Probing Curve (Gemma-3-270M)", fontsize=14, fontweight='bold')
ax.legend(fontsize=10)
ax.set_xticks(range(n_layers))
ax.set_ylim(0.4, 1.0)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nBest probing layer: {int(best_layer['layer'])} with accuracy {best_layer['mean_accuracy']:.1%}")
print(f"Interpretation: Layer {int(best_layer['layer'])} encodes the most financial-return-relevant information")

## 6. Compare Feature Quality Across Layers
Use the best probing layer as the feature extractor, not the last layer.

In [None]:
# Compare embeddings from different layers
layers_to_compare = [0, n_layers // 4, n_layers // 2, 3 * n_layers // 4, n_layers - 1, int(best_layer["layer"])]
layers_to_compare = sorted(set(layers_to_compare))

fig, axes = plt.subplots(2, 3, figsize=(18, 10))
axes = axes.flatten()

from sklearn.decomposition import PCA

for idx, layer in enumerate(layers_to_compare[:6]):
    pca = PCA(n_components=2)
    X_2d = pca.fit_transform(hidden_states[layer])
    
    ax = axes[idx]
    for label, color, name in [(1, 'green', 'Positive'), (0, 'red', 'Negative')]:
        mask = labels == label
        ax.scatter(X_2d[mask, 0], X_2d[mask, 1], c=color, label=name, alpha=0.6, s=50, edgecolors='black', linewidth=0.3)
    
    acc = probe_df[probe_df["layer"] == layer]["mean_accuracy"].values[0]
    layer_name = f"layer {layer}"
    is_best = layer == int(best_layer["layer"])
    title = f"{'\u2605 ' if is_best else ''}Layer {layer} ({layer_name})\nProbe acc: {acc:.1%}"
    ax.set_title(title, fontweight='bold' if is_best else 'normal')
    ax.legend(fontsize=8)

plt.suptitle("Hidden State Geometry Across Layers", fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## 7. Using Probing Results for Feature Engineering

The practical implication: **don't just use the last layer**. Use the layer where financial information is most accessible.

In [None]:
# Practical comparison: features from last layer vs best probing layer
best_l = int(best_layer["layer"])
last_l = n_layers - 1

from sklearn.metrics import classification_report

for layer, name in [(last_l, "Last Layer"), (best_l, "Best Probing Layer")]:
    X = StandardScaler().fit_transform(hidden_states[layer])
    clf = LogisticRegression(max_iter=1000, random_state=42)
    scores = cross_val_score(clf, X, labels, cv=5, scoring='accuracy')
    print(f"\n{name} (layer {layer}):")
    print(f"  CV Accuracy: {scores.mean():.3f} +/- {scores.std():.3f}")

print(f"\nConclusion: For feature extraction, use layer {best_l} instead of layer {last_l}")
print("This principle applies to any LLM — always probe first, then extract features from the best layer.")

## Discussion & Interview Talking Points

### Why This Is My Strongest Unique Contribution
- **Nobody else at Numerai is thinking about this.** Standard practice is to use the last layer or [CLS] token. Probing reveals that this is often suboptimal.
- **NNsight/NDIF** (my ICLR paper) makes this trivial at scale — we built infrastructure to probe ANY open-weight model without engineering overhead.
- **nnterp** standardizes layer access across architectures: `model.layers_output[i]` works for Gemma, Llama, Mistral, etc.
- **David Bau's lab** pioneered network dissection and representation engineering. This is my daily research.

### The Research Insight
- Early layers: capture syntax and token identity
- Middle layers: capture semantics, entities, relationships
- Late layers: capture task-specific features (for pre-training task, not necessarily YOUR task)
- **For financial applications**: The best layer is likely in the middle-to-upper range, where semantic understanding peaks but pre-training task specialization hasn't yet dominated.

### How I'd Apply This at Numerai
1. **Probe first**: Before any fine-tuning, probe each layer of the target LLM for financial signal
2. **Layer selection**: Use the best probing layer for feature extraction
3. **Multi-layer features**: Concatenate representations from multiple informative layers
4. **Probing as model selection**: Compare different LLMs (Llama, Mistral, FinBERT) by their probing curves — which model has the most financial signal built-in?
5. **Probing during fine-tuning**: Track how probing accuracy changes during fine-tuning to detect overfitting

### Connection to Token Entanglement (NeurIPS 2025)
My NeurIPS workshop paper studied how information flows between tokens. Applied to finance:
- When a headline mentions "Apple" and "antitrust", how does the model's representation of Apple change?
- Token entanglement analysis could reveal HOW the model processes financial relationships

### Extensions (TODO)
- [ ] Probe for more nuanced labels: return magnitude, volatility, sector
- [ ] Compare probing curves across models: Gemma-3, Llama-3, FinBERT, SmolLM
- [ ] Multi-layer feature concatenation (best 3 layers)
- [ ] Causal probing: not just where, but HOW financial knowledge is encoded
- [ ] Probing with larger datasets (1000+ headlines with real returns)