# ðŸ““ Cross-DEX Statistical Arbitrage Analysis
**Date:** 2025-04-02

## Overview
This notebook explores intra-day statistical arbitrage opportunities for tokens listed on multiple Solana-based decentralized exchanges (DEXes). The objective is to identify and quantify short-term price inefficiencies between DEXes for the same token, and evaluate the statistical validity of arbitrage signals.


## ðŸ§ª Experiment Scope

- **Chains**: Solana  
- **Exchanges**: Raydium, Orca, Meteora  
- **Token universe**: Tokens listed on â‰¥2 DEXes  
- **Data granularity**: Minute-level time series (1-day partitions per token per exchange)  
- **Source schema**: `SOL_EXCHANGE_TOKEN_FAST`


## ðŸŽ¯ Objectives

- Align and compare intra-day token price series across DEXes
- Detect price spread violations and convergence patterns
- Apply statistical tests to validate arbitrage opportunities
- Generate per-token/per-day summary metrics


## ðŸ§­ Notebook Plan

### 1. Token Filtering
- Load token mapping from `SOL_EXCHANGE_TOKEN_FAST`
- Filter tokens with multiple DEX listings

### 2. Time-Aligned Price Series
- Load and align price series from each exchange
- Normalize and clean data

### 3. Statistical Diagnostics
- Calculate price spreads and visualize
- Run cointegration and stationarity tests
- Analyze rolling correlation and spread behavior

### 4. Arbitrage Signal Detection
- Detect statistical deviations from spread equilibrium
- Estimate reversion metrics

### 5. Summary Reporting
- Output summary table of opportunities
- Generate token-level charts and diagnostics


In [15]:
import psycopg2
import os

In [16]:
# ðŸ“¥ Load Data
# Load token-exchange mapping from the database

filters = {
    "any raydium and any meteora": "(raydium_clmm = TRUE OR raydium_cpmm = TRUE OR raydium_lp = TRUE) AND (meteora_dlmm = TRUE OR meteora_lp = TRUE)",
    "any raydium and any orca": "(raydium_clmm = TRUE OR raydium_cpmm = TRUE OR raydium_lp = TRUE) AND (orca = TRUE)",
}

query = f"""
SELECT contract_address 
FROM SOL_EXCHANGE_TOKEN_FAST 
WHERE {' OR '.join(filters.values())}
"""

conn = psycopg2.connect(
    host=os.getenv("DB_HOST"),
    database="crypto",
    user=os.getenv("DB_USER"),
    password=os.getenv("DB_PASSWORD"),
)
cur = conn.cursor()

cur.execute(query)

rows = cur.fetchall()

tokens = [row[0] for row in rows]


In [19]:
print(tokens[:10])
print(len(tokens))

['FzAwkijFzSga76USEpo3126GyTRmxcMDbA9TYD2PPBt5', '7LsX88bhz8KiFcEFQYcdP6xMRsiMYPUXgYMPZMUzpump', '8estAzppA3vm4NYVoz3U4vqWpwiveDPC4wf6Bu75pump', '39cAtBLnSxhRWTV2wUR6Q3qJZHrPUg2sU5ebjRwDsqqt', 'DvmWZk2CtXStQnadYoPEiPtrtGfJLqxCqnc5Kss4pump', 'BW6MVG1Bw7KWQe7MNpLn1eVjmwSg3Bixh1YTSAhuQFrw', '3vQrMByMk3iuidz7LKsGXxb5CZtVbVqq6gRC1qvbpump', 'CRQdQmb9TDmG9FFTPEL9gqDvfyF6HxGaHwiq5eybpump', '579t4FvQQ6JsWtoGrAVHWSK6AgRcw3XBJGaDefVL92e1', 'EkpJg7odiCvYgY2Xcm7PuXwjVJyi9Jkvgi4ytuzLQCPw']
22848


In [None]:
# ðŸ”„ Align Price Series
# For each token, extract and align price data across all DEXes

# TODO: Add price loading and alignment logic

In [None]:
# ðŸ“Š Statistical Analysis
# Run cointegration, ADF, spread stats, correlation

# TODO: Add statistical tests and visualizations

In [None]:
# ðŸš¨ Signal Detection
# Detect spread breaks and reversions

# TODO: Add threshold logic and opportunity tracking

In [None]:
# ðŸ“ˆ Reporting
# Generate summary tables and plots

# TODO: Create dashboards and export results