# Scaled Position Update Strategy

In this notebook, we implement a mean reversion strategy that:
1. Scales position size linearly from 0 to 1 (or -1) as price moves from 2000 to 2000 ± 200
2. Updates positions every X timestamps, where X is a parameter
3. Evaluates performance with different update frequencies

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['figure.dpi'] = 100

## 1. Load Real Data

First, let's load the Squid_Ink price data from the actual data files.

In [None]:
# Define function to load price data from CSV files
def load_price_data(round_num, day_num):
    import os
    
    # Path to data directory
    data_path = '../../../Prosperity 3 Data'
    
    # Construct file path
    file_path = os.path.join(data_path, f'Round {round_num}/prices_round_{round_num}_day_{day_num}.csv')
    
    # Check if file exists
    if not os.path.exists(file_path):
        print(f"File not found: {file_path}")
        return pd.DataFrame()
    
    # Load data
    try:
        data = pd.read_csv(file_path, sep=';')
        print(f"Successfully loaded {len(data)} rows from {file_path}")
    except Exception as e:
        print(f"Error loading file {file_path}: {e}")
        return pd.DataFrame()
    
    return data

# Load data for all days in round 1
print("Loading price data...")
all_data = pd.DataFrame()

for day in range(-2, 1):
    day_data = load_price_data(1, day)
    if len(day_data) > 0:
        # Add day offset to timestamp for continuity
        day_data['timestamp'] += 10**6 * (day+2)
        all_data = pd.concat([all_data, day_data])

# Check if data was loaded successfully
if len(all_data) == 0:
    raise ValueError("No data was loaded. Please check the data directory path.")

In [None]:
# Check the columns in the loaded data
print(f"Columns in loaded data: {all_data.columns.tolist()}")

# Filter for SQUID_INK
if 'product' in all_data.columns:
    prices = all_data[all_data['product'] == 'SQUID_INK'].copy()
    print(f"Filtered for SQUID_INK using 'product' column: {len(prices)} rows")
elif 'symbol' in all_data.columns:
    prices = all_data[all_data['symbol'] == 'SQUID_INK'].copy()
    print(f"Filtered for SQUID_INK using 'symbol' column: {len(prices)} rows")
else:
    raise ValueError("Could not find 'product' or 'symbol' column in the data.")

# Check if we have any data after filtering
if len(prices) == 0:
    raise ValueError("No SQUID_INK data found after filtering.")

print(f"Loaded {len(prices)} SQUID_INK price data points")

# Sort by timestamp to ensure chronological order
prices = prices.sort_values('timestamp').reset_index(drop=True)
print(f"Sorted data by timestamp")

# Limit to first 20,000 timestamps (in-sample data)
in_sample_prices = prices.iloc[:20000]
print(f"Limited to {len(in_sample_prices)} in-sample data points")

In [None]:
# Extract price data from the real data
print(f"Available columns in price data: {in_sample_prices.columns.tolist()}")

# Use mid_price as our primary price source
if 'mid_price' in in_sample_prices.columns:
    squid_price = in_sample_prices['mid_price']
    print("Using mid_price column for price data")
# If mid_price is not available, calculate it from bid and ask
elif 'bid_price_1' in in_sample_prices.columns and 'ask_price_1' in in_sample_prices.columns:
    squid_price = (in_sample_prices['bid_price_1'] + in_sample_prices['ask_price_1']) / 2
    print("Calculated mid price from bid_price_1 and ask_price_1")
# Fall back to other price columns if available
elif 'vwap' in in_sample_prices.columns:
    squid_price = in_sample_prices['vwap']
    print("Using vwap column for price data")
elif 'price' in in_sample_prices.columns:
    squid_price = in_sample_prices['price']
    print("Using price column for price data")
else:
    raise ValueError("Could not find appropriate price columns in the data")

print(f"Price range: {squid_price.min()} to {squid_price.max()}")

# Calculate returns
returns = squid_price.pct_change().dropna()
print(f"Calculated {len(returns)} return data points")

# Define the fair price and price range for scaling
FAIR_PRICE = 2000
PRICE_RANGE = 200  # Position scales from 0 to 1 (or -1) as price moves from fair price to fair price ± 200

# Create a DataFrame with price and timestamp for easier manipulation
price_df = pd.DataFrame({
    'timestamp': in_sample_prices['timestamp'],
    'price': squid_price,
    'returns': returns
}).dropna().reset_index(drop=True)

### 1.1 Visualize Price Data

Let's visualize the price data and the scaling bands.

In [None]:
# Plot price data with scaling bands
plt.figure(figsize=(12, 6))
plt.plot(price_df.index, price_df['price'], label='Squid_Ink Price')

# Add fair price and scaling bands
plt.axhline(y=FAIR_PRICE, color='r', linestyle='--', label='Fair Price (2000)')
plt.axhline(y=FAIR_PRICE + PRICE_RANGE, color='g', linestyle='--', label='Upper Band (2200)')
plt.axhline(y=FAIR_PRICE - PRICE_RANGE, color='g', linestyle='--', label='Lower Band (1800)')

plt.title('Squid_Ink Price with Scaling Bands')
plt.xlabel('Timestamp Index')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.show()

## 2. Implement Scaled Position Update Strategy

Now, let's implement a strategy that:
1. Scales position size linearly from 0 to 1 (or -1) as price moves from 2000 to 2000 ± 200
2. Updates positions every X timestamps, where X is a parameter

In [None]:
def calculate_scaled_position(price, fair_price=2000, price_range=200):
    """
    Calculate position size that scales linearly from 0 to 1 (or -1) as price moves from fair_price to fair_price ± price_range.
    
    Parameters:
        price (float): Current price
        fair_price (float): Fair price (position is 0 at this price)
        price_range (float): Price range for scaling (position is 1 or -1 at fair_price ± price_range)
        
    Returns:
        float: Position size between -1 and 1
    """
    # Calculate deviation from fair price
    deviation = price - fair_price
    
    # Calculate position size (negative deviation = positive position)
    if deviation > 0:
        # Price is above fair price -> short position
        position = -min(deviation / price_range, 1.0)
    else:
        # Price is below fair price -> long position
        position = min(-deviation / price_range, 1.0)
    
    return position

def scaled_position_update_strategy(price_df, update_frequency, fair_price=2000, price_range=200):
    """
    Implement a strategy that scales position size linearly and updates positions at specified frequency.
    
    Parameters:
        price_df (pd.DataFrame): DataFrame with price data
        update_frequency (int): Number of timestamps between position updates
        fair_price (float): Fair price (position is 0 at this price)
        price_range (float): Price range for scaling (position is 1 or -1 at fair_price ± price_range)
        
    Returns:
        pd.Series: Portfolio positions
    """
    # Initialize positions
    positions = pd.Series(0.0, index=price_df.index)
    
    # Calculate initial position
    current_position = calculate_scaled_position(price_df['price'].iloc[0], fair_price, price_range)
    positions.iloc[0] = current_position
    
    # Update positions at specified frequency
    for i in range(1, len(price_df)):
        if i % update_frequency == 0:
            # Update position based on current price
            current_position = calculate_scaled_position(price_df['price'].iloc[i], fair_price, price_range)
        
        positions.iloc[i] = current_position
    
    return positions

### 2.1 Test Different Update Frequencies

Let's test the strategy with different update frequencies.

In [None]:
# Test different update frequencies
update_frequencies = [1, 5, 10, 20, 50, 100]  # Update frequencies to test

# Initialize results dictionary
results = []

# Store positions for visualization
all_positions = {}

# Test different update frequencies
for update_freq in update_frequencies:
    # Run the strategy
    positions = scaled_position_update_strategy(price_df, update_freq, FAIR_PRICE, PRICE_RANGE)
    all_positions[update_freq] = positions
    
    # Calculate strategy returns
    strategy_returns = positions.shift(1) * price_df['returns']
    strategy_returns = strategy_returns.dropna()
    
    # Calculate cumulative returns
    cumulative_returns = (1 + strategy_returns).cumprod() - 1
    
    # Calculate performance metrics
    total_return = cumulative_returns.iloc[-1]
    annualized_return = (1 + total_return) ** (252 / len(strategy_returns)) - 1
    annualized_volatility = strategy_returns.std() * np.sqrt(252)
    sharpe_ratio = annualized_return / annualized_volatility if annualized_volatility != 0 else 0
    max_drawdown = (cumulative_returns - cumulative_returns.cummax()).min()
    win_rate = (strategy_returns > 0).mean()
    
    # Calculate position changes for transaction cost analysis
    position_changes = positions.diff().fillna(0)
    num_trades = (position_changes != 0).sum()
    avg_position_size = positions.abs().mean()
    
    # Store results
    results.append({
        'update_frequency': update_freq,
        'total_return': total_return,
        'annualized_return': annualized_return,
        'annualized_volatility': annualized_volatility,
        'sharpe_ratio': sharpe_ratio,
        'max_drawdown': max_drawdown,
        'win_rate': win_rate,
        'num_trades': num_trades,
        'avg_position_size': avg_position_size
    })

# Convert results to DataFrame
results_df = pd.DataFrame(results)

# Sort by total return
results_df = results_df.sort_values('total_return', ascending=False)

# Display results
print("Performance Metrics for Different Update Frequencies:")
display(results_df)

### 2.2 Visualize Positions for Different Update Frequencies

Let's visualize how positions change with different update frequencies.

In [None]:
# Select a subset of update frequencies to visualize
frequencies_to_plot = [1, 10, 50]

# Create a figure with two subplots
fig, axes = plt.subplots(2, 1, figsize=(15, 12), sharex=True)

# Plot price
axes[0].plot(price_df.index, price_df['price'], label='Squid_Ink Price')
axes[0].axhline(y=FAIR_PRICE, color='r', linestyle='--', label='Fair Price (2000)')
axes[0].axhline(y=FAIR_PRICE + PRICE_RANGE, color='g', linestyle='--', label='Upper Band (2200)')
axes[0].axhline(y=FAIR_PRICE - PRICE_RANGE, color='g', linestyle='--', label='Lower Band (1800)')
axes[0].set_title('Squid_Ink Price with Scaling Bands')
axes[0].set_ylabel('Price')
axes[0].legend()
axes[0].grid(True)

# Plot positions for selected update frequencies
for freq in frequencies_to_plot:
    axes[1].plot(price_df.index, all_positions[freq], label=f'Update Frequency = {freq}')

axes[1].axhline(y=0, color='r', linestyle='--')
axes[1].set_title('Positions for Different Update Frequencies')
axes[1].set_xlabel('Timestamp Index')
axes[1].set_ylabel('Position Size')
axes[1].set_ylim(-1.1, 1.1)
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()
plt.show()

### 2.3 Visualize Cumulative Returns

Let's visualize the cumulative returns for different update frequencies.

In [None]:
# Calculate cumulative returns for all update frequencies
cumulative_returns_dict = {}

for freq in update_frequencies:
    # Calculate strategy returns
    strategy_returns = all_positions[freq].shift(1) * price_df['returns']
    strategy_returns = strategy_returns.dropna()
    
    # Calculate cumulative returns
    cumulative_returns = (1 + strategy_returns).cumprod() - 1
    cumulative_returns_dict[freq] = cumulative_returns

# Calculate buy and hold returns for comparison
buy_hold_returns = price_df['returns']
buy_hold_cumulative_returns = (1 + buy_hold_returns).cumprod() - 1

# Plot cumulative returns
plt.figure(figsize=(12, 6))

# Plot buy and hold returns
plt.plot(buy_hold_cumulative_returns, 'k--', label='Buy & Hold')

# Plot returns for selected update frequencies
for freq in frequencies_to_plot:
    plt.plot(cumulative_returns_dict[freq], label=f'Update Frequency = {freq}')

plt.title('Cumulative Returns for Different Update Frequencies')
plt.xlabel('Timestamp Index')
plt.ylabel('Cumulative Return')
plt.legend()
plt.grid(True)
plt.show()

## 3. Implement Transaction Costs

Now, let's implement transaction costs to make our analysis more realistic.

In [None]:
# Define transaction cost (1.5/2000 = 0.075% per dollar traded)
transaction_cost = 1.5/2000  # 0.075% per dollar traded

# Initialize results dictionary with transaction costs
results_with_costs = []
cumulative_returns_with_costs_dict = {}

for freq in update_frequencies:
    # Get positions
    positions = all_positions[freq]
    
    # Calculate position changes
    position_changes = positions.diff().fillna(0)
    
    # Calculate transaction costs
    transaction_costs = position_changes.abs() * transaction_cost
    
    # Calculate strategy returns with transaction costs
    strategy_returns_with_costs = positions.shift(1) * price_df['returns'] - transaction_costs.shift(1)
    strategy_returns_with_costs = strategy_returns_with_costs.dropna()
    
    # Calculate cumulative returns with transaction costs
    cumulative_returns_with_costs = (1 + strategy_returns_with_costs).cumprod() - 1
    cumulative_returns_with_costs_dict[freq] = cumulative_returns_with_costs
    
    # Calculate performance metrics with transaction costs
    total_return_with_costs = cumulative_returns_with_costs.iloc[-1]
    annualized_return_with_costs = (1 + total_return_with_costs) ** (252 / len(strategy_returns_with_costs)) - 1
    annualized_volatility_with_costs = strategy_returns_with_costs.std() * np.sqrt(252)
    sharpe_ratio_with_costs = annualized_return_with_costs / annualized_volatility_with_costs if annualized_volatility_with_costs != 0 else 0
    max_drawdown_with_costs = (cumulative_returns_with_costs - cumulative_returns_with_costs.cummax()).min()
    win_rate_with_costs = (strategy_returns_with_costs > 0).mean()
    
    # Calculate number of trades and total transaction costs
    num_trades = (position_changes != 0).sum()
    total_transaction_costs = transaction_costs.sum()
    
    # Store results
    results_with_costs.append({
        'update_frequency': freq,
        'total_return': total_return_with_costs,
        'annualized_return': annualized_return_with_costs,
        'annualized_volatility': annualized_volatility_with_costs,
        'sharpe_ratio': sharpe_ratio_with_costs,
        'max_drawdown': max_drawdown_with_costs,
        'win_rate': win_rate_with_costs,
        'num_trades': num_trades,
        'total_transaction_costs': total_transaction_costs
    })

# Convert results to DataFrame
results_with_costs_df = pd.DataFrame(results_with_costs)

# Sort by total return
results_with_costs_df = results_with_costs_df.sort_values('total_return', ascending=False)

# Display results
print("Performance Metrics with Transaction Costs:")
display(results_with_costs_df)

### 3.1 Visualize Cumulative Returns with Transaction Costs

Let's visualize the cumulative returns with transaction costs for different update frequencies.

In [None]:
# Plot cumulative returns with transaction costs
plt.figure(figsize=(12, 6))

# Plot buy and hold returns
plt.plot(buy_hold_cumulative_returns, 'k--', label='Buy & Hold')

# Plot returns for selected update frequencies
for freq in frequencies_to_plot:
    plt.plot(cumulative_returns_with_costs_dict[freq], label=f'Update Frequency = {freq} (with costs)')

plt.title('Cumulative Returns with Transaction Costs')
plt.xlabel('Timestamp Index')
plt.ylabel('Cumulative Return')
plt.legend()
plt.grid(True)
plt.show()

### 3.2 Compare Transaction Costs Impact

Let's compare the impact of transaction costs on different update frequencies.

In [None]:
# Create a comparison DataFrame
comparison_df = pd.DataFrame()

for freq in update_frequencies:
    # Get results without transaction costs
    without_costs = results_df[results_df['update_frequency'] == freq].iloc[0]
    
    # Get results with transaction costs
    with_costs = results_with_costs_df[results_with_costs_df['update_frequency'] == freq].iloc[0]
    
    # Calculate impact of transaction costs
    impact = {
        'update_frequency': freq,
        'return_without_costs': without_costs['total_return'],
        'return_with_costs': with_costs['total_return'],
        'return_impact': with_costs['total_return'] - without_costs['total_return'],
        'return_impact_pct': (with_costs['total_return'] - without_costs['total_return']) / abs(without_costs['total_return']) * 100 if without_costs['total_return'] != 0 else 0,
        'sharpe_without_costs': without_costs['sharpe_ratio'],
        'sharpe_with_costs': with_costs['sharpe_ratio'],
        'sharpe_impact': with_costs['sharpe_ratio'] - without_costs['sharpe_ratio'],
        'num_trades': with_costs['num_trades'],
        'total_transaction_costs': with_costs['total_transaction_costs']
    }
    
    # Add to comparison DataFrame
    comparison_df = pd.concat([comparison_df, pd.DataFrame([impact])], ignore_index=True)

# Sort by update frequency
comparison_df = comparison_df.sort_values('update_frequency')

# Display comparison
print("Transaction Costs Impact by Update Frequency:")
display(comparison_df)

## 4. Conclusion

In this notebook, we implemented a mean reversion strategy that:
1. Scales position size linearly from 0 to 1 (or -1) as price moves from 2000 to 2000 ± 200
2. Updates positions every X timestamps, where X is a parameter

Key findings:

1. The strategy dynamically adjusts position size based on the distance from the fair price, taking larger positions when the mispricing is more significant.

2. We tested different update frequencies and found that [insert findings about optimal update frequency].

3. Transaction costs have a significant impact on performance, especially for strategies with frequent updates.

4. The optimal update frequency balances the benefits of timely position adjustments with the costs of frequent trading.

Future improvements could include:

1. Testing different scaling functions (e.g., exponential, logarithmic) instead of linear scaling
2. Implementing adaptive update frequencies based on market conditions
3. Combining with other indicators to filter trades
4. Optimizing the price range parameter