# Contrastive Financial Embeddings

## Context
Off-the-shelf text embeddings (NB02) capture general semantics but aren't optimized for financial signal. Contrastive learning fine-tunes embeddings so that texts about stocks with similar returns are close together, and texts about diverging stocks are far apart.

## My Experience
At Creyon Bio, I used contrastive learning to predict oligo toxicity from 3D electrostatic maps, pushing AUC from 0.73 to 0.88. Same framework applies here: learn a representation where the downstream signal (stock returns) is encoded in embedding similarity.

## Pipeline
Financial headlines + stock returns → Contrastive pairs (same return quintile = positive, different = negative) → Fine-tune encoder → Financial-signal-optimized embeddings

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import PCA

## 1. Create Training Data
We need (headline, stock_return) pairs. The contrastive loss will learn embeddings where stocks with similar returns have similar headline embeddings.

In [None]:
# Synthetic dataset: headlines with associated stock returns
# In production: real headlines from news APIs + real returns from market data

# TODO: implement
...

## 2. Baseline: Off-the-Shelf Embeddings
Before fine-tuning, see how well vanilla embeddings cluster by return quintile.

In [None]:
# Generate baseline embeddings

# TODO: implement
...

## 3. Contrastive Learning Setup
We use a simple contrastive loss: minimize distance between same-quintile pairs, maximize distance between different-quintile pairs.

In [None]:
class ContrastivePairDataset(Dataset):
    """Generate pairs for contrastive learning.
    Positive pairs: same return quintile
    Negative pairs: different return quintile
    """
    ...

    def __init__(self, embeddings, quintiles, n_pairs=1000):
        ...

    def __len__(self):
        ...

    def __getitem__(self, idx):
        ...

class ProjectionHead(nn.Module):
    """Small projection head to fine-tune embedding space."""
    ...

    def __init__(self, input_dim=384, hidden_dim=128, output_dim=64):
        ...

    def forward(self, x):
        ...

class ContrastiveLoss(nn.Module):
    """Contrastive loss with cosine similarity."""
    ...

    def __init__(self, margin=0.5):
        ...

    def forward(self, emb1, emb2, label):
        ...


## 4. Train the Projection Head

In [None]:
# Create dataset and dataloader

# TODO: implement
...

## 5. Compare: Baseline vs Contrastive Embeddings

In [None]:
# Project embeddings through trained projection head

# TODO: implement
...

## 6. Quantitative Evaluation

In [None]:
def avg_similarity_by_group(embeddings, labels):
    """Compute average cosine similarity for same-group and different-group pairs."""
    ...


## Discussion & Interview Talking Points

### Connection to My Experience
- **Creyon Bio**: Used contrastive learning on 3D electrostatic maps to predict oligo toxicity. Pushed AUC from 0.73 to 0.88.
- **Same framework**: Instead of electrostatic maps → toxicity, we have financial text → returns. The contrastive objective is identical.
- **DPO experience**: My recent DPO fine-tuning work (Qwen 32B) uses a similar preference-based optimization — DPO is conceptually a contrastive method.

### Strengths
- **Domain-adapted**: Embeddings are optimized for the actual downstream task (return prediction)
- **Beyond sentiment**: Captures whatever textual patterns correlate with returns, not just positive/negative
- **Composable**: Can use contrastive embeddings as input to any downstream model

### Weaknesses & Considerations
- **Requires labeled data**: Need (text, return) pairs, which means historical market data
- **Temporal leakage risk**: Must use strict temporal train/test splits
- **Overfitting**: With small datasets, the projection head can memorize rather than generalize

### For Numerai
- Contrastive embeddings trained on returns that have been **factor-neutralized** would directly optimize for Numerai's scoring metric
- Could combine with graph features (NB04): contrastive loss where graph-neighbors should have similar embeddings

### Extensions (TODO)
- [ ] Use triplet loss instead of pairwise contrastive
- [ ] Fine-tune the encoder itself (not just a projection head) with LoRA
- [ ] Train on factor-neutralized returns for Numerai-specific optimization
- [ ] Add hard negative mining (most confusing cross-quintile pairs)
- [ ] Compare InfoNCE, NT-Xent, and supervised contrastive losses