# Bayesian Inference for Ethereum Price Movements

### Objective
This notebook analyzes the conditional probabilities of large price movements in Ethereum (ETH-USD) across different timescales. The goal is to determine if an extreme price event in a shorter time frame (e.g., a 2-sigma day) increases the probability of a subsequent extreme event in a longer time frame (e.g., a 2-sigma week or month).

The insights derived can be used to inform options trading strategies by identifying statistically significant momentum patterns.

### Methodology
1.  **Data Fetching**: Load historical ETH-USD data from Yahoo Finance.
2.  **Event Definition**: Define "large price movements" as 1, 2, and 3-sigma events for returns over 1-day, 1-week, 2-week, 1-month, and 2-month periods.
3.  **Conditional Probability**: Calculate `P(Future Event | Initial Event)`. For example: *Given a 2-sigma move today, what is the probability of a 2-sigma move over the next month?*
4.  **Lift Analysis**: Compare the conditional probability to the baseline (unconditional) probability to quantify the predictive power of an event.
5.  **Significance Testing**: Use a Chi-Square test to validate that the observed relationships are not due to random chance.
6.  **Trading Insights**: Synthesize the results into actionable strategies for options trading.

## 1. Setup and Configuration

In [16]:
import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from scipy import stats
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# --- Plotting and Display Configuration ---
plt.style.use('dark_background')
sns.set_palette("viridis")
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.float_format', '{:.2f}'.format)

print("✅ Libraries imported and environment configured.")

✅ Libraries imported and environment configured.


## 2. Data Fetching and Preparation

In [17]:
# --- Analysis Parameters ---
TICKER = 'ETH-USD'
START_DATE = '2017-01-01'
END_DATE = datetime.now().strftime('%Y-%m-%d')

# Time periods for analysis (in days)
TIME_PERIODS = {
    '1 Day': 1,
    '1 Week': 7,
    '2 Weeks': 14,
    '1 Month': 30,
    '2 Months': 60
}

# Sigma levels to test
SIGMA_LEVELS = [1, 2, 3]

# --- Data Fetching ---
print(f"📊 Fetching Ethereum data from {START_DATE} to {END_DATE}...")
try:
    eth_data = yf.download(TICKER, start=START_DATE, end=END_DATE)
    eth_prices = eth_data['Close'].dropna()

    if eth_prices.empty:
        raise ValueError("No data fetched. Check ticker or date range.")

    print(f"✅ Data fetched successfully.")
    print(f"   - Data range: {eth_prices.index.min().strftime('%Y-%m-%d')} to {eth_prices.index.max().strftime('%Y-%m-%d')}")
    print(f"   - Total days: {len(eth_prices):,}")

    # Convert to a standard Python float before formatting to avoid TypeError
    current_price = float(eth_prices.iloc[-1])
    print(f"   - Current ETH price: ${current_price:,.2f}")

except Exception as e:
    print(f"❌ Error fetching data: {e}")

[*********************100%***********************]  1 of 1 completed

📊 Fetching Ethereum data from 2017-01-01 to 2025-07-17...
✅ Data fetched successfully.
   - Data range: 2017-11-09 to 2025-07-16
   - Total days: 2,807
   - Current ETH price: $3,371.51





## 3. Calculate Rolling Returns and Sigma Events

Here, we define what constitutes a 1, 2, or 3-sigma event for each time period by calculating rolling returns and their statistical properties.

In [24]:
def calculate_sigma_events(prices, time_periods, sigma_levels):
    """
    Calculates rolling returns, thresholds, and identifies sigma events.
    """
    analysis_results = {}
    summary_data = []

    print("🔍 Calculating rolling returns and sigma thresholds...")
    
    for period_name, days in time_periods.items():
        returns = prices.pct_change(days).dropna()
        
        mean_return = returns.mean()
        std_return = returns.std()
        
        sigma_thresholds = {f'{s}σ': mean_return + s * std_return for s in sigma_levels}
        sigma_events = {f'{s}σ': returns >= threshold for s, threshold in sigma_thresholds.items()}
        
        analysis_results[period_name] = {
            'returns': returns,
            'mean': mean_return,
            'std': std_return,
            'sigma_thresholds': sigma_thresholds,
            'sigma_events': sigma_events
        }
        
        # For summary table
        row = {'Period': period_name, 'Mean Return': mean_return, 'Std Dev': std_return}
        for s, t in sigma_thresholds.items():
            row[f'{s} Threshold'] = t
        summary_data.append(row)

    print("✅ Calculations complete.")
    
    # Create and display summary table
    summary_df = pd.DataFrame(summary_data).set_index('Period')
    print("\n📋 Sigma Threshold Summary (Required Return for Event):")
    print(summary_df.style.format('{:.2%}'))
    
    return analysis_results, summary_df

eth_analysis, summary_df = calculate_sigma_events(eth_prices, TIME_PERIODS, SIGMA_LEVELS                                            )
# display(eth_analysis)
print(eth_analysis)
# df = pd.DataFrame(eth_analysis)
# display(df)

🔍 Calculating rolling returns and sigma thresholds...
✅ Calculations complete.

📋 Sigma Threshold Summary (Required Return for Event):
<pandas.io.formats.style.Styler object at 0x0000021E73C7EFA0>
{'1 Day': {'returns': Ticker      ETH-USD
Date               
2017-11-10    -0.07
2017-11-11     0.05
2017-11-12    -0.02
2017-11-13     0.03
2017-11-14     0.07
...             ...
2025-07-12    -0.01
2025-07-13     0.01
2025-07-14     0.01
2025-07-15     0.04
2025-07-16     0.07

[2806 rows x 1 columns], 'mean': Ticker
ETH-USD   0.00
dtype: float64, 'std': Ticker
ETH-USD   0.05
dtype: float64, 'sigma_thresholds': {'1σ': Ticker
ETH-USD   0.05
dtype: float64, '2σ': Ticker
ETH-USD   0.09
dtype: float64, '3σ': Ticker
ETH-USD   0.14
dtype: float64}, 'sigma_events': {'1σσ': Ticker      ETH-USD
Date               
2017-11-10    False
2017-11-11     True
2017-11-12    False
2017-11-13    False
2017-11-14     True
...             ...
2025-07-12    False
2025-07-13    False
2025-07-14    False
2025-0

## 4. Conditional Probability Analysis

This is the core of the analysis. We build a function to calculate the conditional probability of a future event given an initial event has occurred. We also calculate the "lift" to see how much more likely the event becomes.

In [19]:
def analyze_conditional_probabilities(analysis_data, time_periods, sigma_levels):
    """
    Calculates the conditional probability and lift for all event combinations.
    """
    results = []
    period_names = list(time_periods.keys())

    print("🧮 Analyzing conditional probabilities...")

    # Iterate through each combination of periods and sigma levels
    for i, initial_period in enumerate(period_names):
        for future_period in period_names[i+1:]: # Only look at longer future periods
            for initial_sigma in sigma_levels:
                for future_sigma in sigma_levels:
                    
                    initial_event_key = f'{initial_sigma}σ'
                    future_event_key = f'{future_sigma}σ'
                    
                    # Get the series of boolean flags for when events occurred
                    initial_events = analysis_data[initial_period]['sigma_events'][initial_event_key]
                    future_events = analysis_data[future_period]['sigma_events'][future_event_key]
                    
                    # Align events by date
                    df = pd.DataFrame({'initial': initial_events, 'future': future_events}).dropna()
                    
                    # Filter for dates where the initial event happened
                    initial_event_dates = df[df['initial']]
                    
                    if not initial_event_dates.empty:
                        # Count how many of those were followed by a future event
                        joint_events = initial_event_dates['future'].sum()
                        total_initial_events = len(initial_event_dates)
                        
                        # Calculate probabilities
                        conditional_prob = joint_events / total_initial_events
                        baseline_prob = df['future'].mean()
                        lift = conditional_prob / baseline_prob if baseline_prob > 0 else 0
                        
                        # Chi-Square Test for significance
                        contingency_table = pd.crosstab(df['initial'], df['future'])
                        if contingency_table.shape == (2, 2):
                            chi2, p_value, _, _ = stats.chi2_contingency(contingency_table)
                        else:
                            p_value = 1.0 # Not enough data for a valid test

                        results.append({
                            'Initial Period': initial_period,
                            'Initial Sigma': initial_sigma,
                            'Future Period': future_period,
                            'Future Sigma': future_sigma,
                            'Cond. Prob.': conditional_prob,
                            'Baseline Prob.': baseline_prob,
                            'Lift': lift,
                            'P-Value': p_value,
                            'Initial Events': total_initial_events
                        })

    print("✅ Analysis complete.")
    return pd.DataFrame(results)

conditional_results_df = analyze_conditional_probabilities(eth_analysis, TIME_PERIODS, SIGMA_LEVELS)

🧮 Analyzing conditional probabilities...


KeyError: '1σ'

## 5. Results and Interpretation

Now we can analyze the results. We will focus on scenarios with:
1.  **High Lift**: Where the conditional probability is much higher than the baseline.
2.  **Statistical Significance**: Where the P-Value is low (e.g., < 0.05), indicating the relationship is likely not random.
3.  **Sufficient Sample Size**: Where there are enough initial events to make the result credible.

In [None]:
print("📈 Top 15 Most Significant Momentum Patterns (Sorted by Lift)")

significant_patterns = conditional_results_df[
    (conditional_results_df['P-Value'] < 0.05) & 
    (conditional_results_df['Initial Events'] >= 10) # Ensure a reasonable sample size
].sort_values('Lift', ascending=False)

if significant_patterns.empty:
    print("\n❌ No statistically significant patterns found with at least 10 initial events.")
    print("Consider relaxing the P-Value or sample size constraints.")
else:
    # Formatting for display
    display_format = {
        'Cond. Prob.': '{:.2%}',
        'Baseline Prob.': '{:.2%}',
        'Lift': '{:.2f}x',
        'P-Value': '{:.3f}'
    }
    display(significant_patterns.head(15).style.format(display_format).background_gradient(cmap='viridis', subset=['Lift', 'Cond. Prob.']))

### Building the Probability Chain

Let's examine the specific question: **"Given a 2-sigma event in a shorter period, what happens next?"**

In [None]:
def display_probability_chain(df, initial_period, initial_sigma):
    """
    Filters and displays the chain of conditional probabilities for a given starting event.
    """
    print(f"\n🔗 Probability Chain: Given a {initial_sigma}σ move in {initial_period}")
    print("-" * (40 + len(initial_period) + len(str(initial_sigma))))
    
    chain = df[
        (df['Initial Period'] == initial_period) & 
        (df['Initial Sigma'] == initial_sigma) & 
        (df['P-Value'] < 0.1) # Use a slightly more relaxed p-value for exploration
    ].sort_values(['Future Period', 'Future Sigma'])
    
    if chain.empty:
        print("No significant follow-on events found for this initial condition.")
        return
        
    # Formatting for display
    display_format = {
        'Cond. Prob.': '{:.2%}',
        'Baseline Prob.': '{:.2%}',
        'Lift': '{:.2f}x',
        'P-Value': '{:.3f}'
    }
    
    display(chain[['Future Period', 'Future Sigma', 'Cond. Prob.', 'Baseline Prob.', 'Lift', 'P-Value']]
            .style.format(display_format).background_gradient(cmap='magma', subset=['Lift']))

# --- Display Chains for Key Scenarios ---
display_probability_chain(conditional_results_df, '1 Day', 2)
display_probability_chain(conditional_results_df, '1 Week', 2)
display_probability_chain(conditional_results_df, '1 Week', 3)

## 6. Visualization of Key Relationships

A heatmap is an effective way to visualize the "Lift" across all combinations. We are looking for hot spots, which indicate a strong predictive relationship.

In [None]:
def plot_lift_heatmap(df, initial_sigma, future_sigma):
    """
    Creates a heatmap of the Lift factor for a specific sigma-to-sigma relationship.
    """
    plt.figure(figsize=(10, 7))
    
    # Filter for the specific relationship, e.g., 2-sigma moves leading to 2-sigma moves
    filtered_df = df[
        (df['Initial Sigma'] == initial_sigma) & 
        (df['Future Sigma'] == future_sigma)
    ]
    
    if filtered_df.empty:
        print(f"No data for {initial_sigma}σ -> {future_sigma}σ relationship.")
        return
        
    # Pivot the data to create a matrix for the heatmap
    pivot_table = filtered_df.pivot(index='Initial Period', columns='Future Period', values='Lift')
    
    # Order the axes logically
    period_order = list(TIME_PERIODS.keys())
    pivot_table = pivot_table.reindex(index=period_order, columns=period_order)
    
    sns.heatmap(pivot_table, annot=True, fmt=".2f", cmap="viridis", linewidths=.5, cbar_kws={'label': 'Lift Factor'})
    plt.title(f'Lift Factor for {initial_sigma}σ Initial Moves to {future_sigma}σ Future Moves')
    plt.ylabel('Initial Time Period')
    plt.xlabel('Future Time Period')
    plt.show()

print("🔥 Visualizing Lift: How much more likely is a future event given an initial one?")
# Heatmap for 2-sigma events leading to 2-sigma events
plot_lift_heatmap(conditional_results_df, initial_sigma=2, future_sigma=2)

# Heatmap for 2-sigma events leading to 1-sigma events
plot_lift_heatmap(conditional_results_df, initial_sigma=2, future_sigma=1)

## 7. Summary and Trading Implications

Based on the analysis, we can draw several conclusions for options trading.

### Key Findings:
1.  **Momentum is Real**: The analysis shows statistically significant evidence that large upward price movements in Ethereum tend to be followed by further upward movements. The lift factors are often well above 1.0x.
2.  **Time-Dependent Patterns**: The strength of this momentum varies by timescale. Short-term events (1-Day, 1-Week) appear to be strong predictors for medium-term outcomes (2-Weeks, 1-Month).
3.  **Magnitude Matters**: 3-sigma events, while rare, often have a very high lift factor, indicating a strong signal when they do occur.

### How to Use This for Options Trading:
*   **Trade Entry Signal**: A statistically significant event (e.g., a 2-sigma move over the past week with a p-value < 0.05) can serve as a high-probability entry signal for a directional trade.
*   **Strategy: Buying Calls or Bull Call Spreads**: When you observe a strong initial event, the data suggests a higher probability of continued upward movement. This is a favorable environment for buying calls or setting up bull call spreads.
*   **Choosing Expiration**: The `Future Period` in the analysis provides a data-driven guide for selecting an option's expiration date. If a '1 Week' event has the highest lift for a '1 Month' future period, consider options with 30-45 days to expiration.
*   **Position Sizing**: The `Lift` factor can inform position sizing. A signal with a 4.0x lift is a much higher conviction setup than one with a 1.5x lift and may justify a larger allocation of capital.

**Example Trade Idea:**
If the analysis shows that a `1 Week` `2σ` event leads to a `1 Month` `2σ` event with a high lift and low p-value:
1.  **Monitor**: Watch for a 7-day return that exceeds the calculated 2-sigma threshold.
2.  **Action**: When the event is triggered, buy a call option with approximately 30 days to expiration.
3.  **Strike Selection**: The `1 Month` 2-sigma return threshold gives you a potential price target, helping you select a strike price.

**Disclaimer**: This analysis is based on historical data and does not guarantee future results. It should be used as one tool among many for making trading decisions. Always use proper risk management.