We should implement and back-test trading algorithms using historical market data.

The project is divided into three main strategies:

1. Moving Average and Momentum Strategies,
2. Value-Based Strategies, and
3. Sentiment-Based Strategies.

Write our own Back Testing Code

▪ Back Testing simulates how a trading strategy would have performed in the past using historical data.
▪ The purpose is to evaluate the strategy's effectiveness, identify potential issues, and refine it before deploying it in live trading.
▪ Here are key components a back-testing code needs to handle:
    1. Load up and process price and other data
    2. Clean and prepare data
    3. Implement logic to buy and sell based on signals
    4. Define trades and measure their performance over time
    5. Incorporate realistic transaction costs
    6. Calculate metrics – return, drawdown, Sharpe ratio etc.…
    7. Visualise results

Validation and Sanity Checks

▪ Out-of-Sample Testing:
    ▪ After optimizing the strategy on historical data, test it on a separate dataset (out-of-sample data) to verify its robustness.
▪ Sanity Checks:
    ▪ Ensure the backtest is realistic (e.g., no future data leakage, no unrealistic execution assumptions) to prevent overestimating the strategy’s performance.

In [None]:
import pandas as pd

# Load the stock data
file_path = "/mnt/data/stock_data.csv"
df = pd.read_csv(file_path)

# Display basic information and the first few rows
df.info(), df.head()

# Convert Date column to datetime format
df["Date"] = pd.to_datetime(df["Date"])

# Sort data by Date
df = df.sort_values("Date").reset_index(drop=True)

# Forward-fill missing values to maintain continuity
df.fillna(method="ffill", inplace=True)

# Drop columns where more than 50% of data is missing
threshold = len(df) * 0.5
df = df.dropna(axis=1, thresh=threshold)

# Remove any duplicate rows if present
df = df.drop_duplicates()

# Display cleaned dataset info
df.info(), df.head()

# Define the new file path
cleaned_file_path = "/mnt/data/cleaned_stock_data.csv"

# Save the cleaned dataset to CSV
df.to_csv(cleaned_file_path, index=False)

# Return the file path for download
cleaned_file_path

# Drop rows where any column has missing values to keep only complete data
df_complete = df.dropna()

# Save the fully cleaned dataset
complete_file_path = "/mnt/data/fully_cleaned_stock_data.csv"
df_complete.to_csv(complete_file_path, index=False)

# Return the file path for download
complete_file_path


Strategy 1: Moving Average and Momentum Strategies

Rules based Moving average startegy

Understand & implement moving average strategies
    ▪ Simple Moving Average (SMA)
    ▪ Write code that calculates this to different periods

Tasks:
    1. Implement a strategy where a short-term moving average (e.g., S-day SMA) crosses above or below a long-term moving average (e.g., L-day SMA).
    2. Write code to execute buy orders when the short-term average crosses above the long-term average and sell orders when the opposite occurs.
    3. Test the algorithm on a broad range of stocks (at least 100) from the S&P index for a range of values of S and L
    4. You should report average P&L and variance of P&L for each combination of moving average periods

In [None]:
import pandas as pd
import numpy as np
from joblib import Parallel, delayed

# -------------------------------
# Helper Functions to Compute Indicators
# -------------------------------

def compute_indicators_for_stock(df, ma_periods, rsi_period=14):
    """
    Given a DataFrame with columns 'Date' and 'Price', compute moving averages for each period in ma_periods
    and compute the RSI using the specified rsi_period.
    """
    df = df.sort_values('Date').reset_index(drop=True)
    
    # Compute moving averages
    for period in ma_periods:
        df[f"MA {period}"] = df['Price'].rolling(window=period).mean()
    
    # Compute RSI (using the standard 14-day period by default)
    delta = df['Price'].diff()
    gain = delta.clip(lower=0)
    loss = -delta.clip(upper=0)
    avg_gain = gain.rolling(window=rsi_period).mean()
    avg_loss = loss.rolling(window=rsi_period).mean()
    rs = avg_gain / avg_loss
    df['RSI'] = 100 - (100 / (1 + rs))
    
    return df

# -------------------------------
# Simulation Functions for Strategies
# -------------------------------

def simulate_ma_strategy(df, short_window, long_window):
    """
    Simulate a moving average crossover strategy:
      - Buy when the short MA crosses above the long MA.
      - Sell when the short MA crosses below the long MA.
    """
    trades = []
    in_position = False
    entry_price = 0.0

    ma_short = df[f"MA {short_window}"]
    ma_long  = df[f"MA {long_window}"]
    price    = df['Price']

    for i in range(1, len(df)):
        # Buy signal: short MA crosses above long MA
        if not in_position and (ma_short.iloc[i] > ma_long.iloc[i]) and (ma_short.iloc[i-1] <= ma_long.iloc[i-1]):
            in_position = True
            entry_price = price.iloc[i]
        # Sell signal: short MA crosses below long MA
        elif in_position and (ma_short.iloc[i] < ma_long.iloc[i]) and (ma_short.iloc[i-1] >= ma_long.iloc[i-1]):
            exit_price = price.iloc[i]
            trades.append(exit_price - entry_price)
            in_position = False

    # Close any open position at the end of the series
    if in_position:
        trades.append(price.iloc[-1] - entry_price)
        
    return trades

def simulate_rsi_strategy(df, rsi_buy_threshold, rsi_sell_threshold):
    """
    Simulate an RSI-based strategy:
      - Buy when RSI exceeds the buy threshold.
      - Sell when RSI falls below the sell threshold.
    """
    trades = []
    in_position = False
    entry_price = 0.0
    price = df['Price']
    rsi   = df['RSI']

    for i in range(len(df)):
        if not in_position and (rsi.iloc[i] > rsi_buy_threshold):
            in_position = True
            entry_price = price.iloc[i]
        elif in_position and (rsi.iloc[i] < rsi_sell_threshold):
            exit_price = price.iloc[i]
            trades.append(exit_price - entry_price)
            in_position = False

    if in_position:
        trades.append(price.iloc[-1] - entry_price)
    return trades

def simulate_combined_strategy(df, short_window, long_window, rsi_buy_threshold, rsi_sell_threshold):
    """
    Simulate a combined strategy:
      - Buy when a short MA crosses above a long MA and RSI exceeds the buy threshold.
      - Sell when a short MA crosses below a long MA and RSI falls below the sell threshold.
    """
    trades = []
    in_position = False
    entry_price = 0.0
    price = df['Price']
    ma_short = df[f"MA {short_window}"]
    ma_long  = df[f"MA {long_window}"]
    rsi      = df['RSI']

    for i in range(1, len(df)):
        if not in_position and (ma_short.iloc[i] > ma_long.iloc[i]) and (ma_short.iloc[i-1] <= ma_long.iloc[i-1]) and (rsi.iloc[i] > rsi_buy_threshold):
            in_position = True
            entry_price = price.iloc[i]
        elif in_position and (ma_short.iloc[i] < ma_long.iloc[i]) and (ma_short.iloc[i-1] >= ma_long.iloc[i-1]) and (rsi.iloc[i] < rsi_sell_threshold):
            exit_price = price.iloc[i]
            trades.append(exit_price - entry_price)
            in_position = False

    if in_position:
        trades.append(price.iloc[-1] - entry_price)
    return trades

# -------------------------------
# Wrapper Functions for Parallel Processing
# -------------------------------

def get_trades_for_ticker_ma(df, s, l):
    if f"MA {s}" in df.columns and f"MA {l}" in df.columns:
        return simulate_ma_strategy(df, s, l)
    return []

def get_trades_for_ticker_rsi(df, rsi_buy, rsi_sell):
    if 'RSI' in df.columns:
        return simulate_rsi_strategy(df, rsi_buy, rsi_sell)
    return []

def get_trades_for_ticker_combined(df, s, l, rsi_buy, rsi_sell):
    if (f"MA {s}" in df.columns) and (f"MA {l}" in df.columns) and ('RSI' in df.columns):
        return simulate_combined_strategy(df, s, l, rsi_buy, rsi_sell)
    return []

# -------------------------------
# Main Backtesting Process
# -------------------------------

# Load the CSV file.
# The CSV is assumed to have the first column as "Date" and all subsequent columns are adjusted close prices for S&P 500 stocks.
data = pd.read_csv('fully_cleaned_stock_data.csv', parse_dates=['Date'])
data.sort_values('Date', inplace=True)

# All columns except 'Date' represent different stocks.
tickers = data.columns[1:]

# Define the moving average periods of interest.
short_windows = [5, 7, 10, 15, 20]
long_windows  = [50, 70, 100, 150, 200]
ma_periods = sorted(list(set(short_windows + long_windows)))

# Build a dictionary with processed DataFrames for each ticker.
stock_dfs = {}
for ticker in tickers:
    df_stock = data[['Date', ticker]].rename(columns={ticker: 'Price'})
    df_stock = compute_indicators_for_stock(df_stock, ma_periods, rsi_period=14)
    stock_dfs[ticker] = df_stock

# -------------------------------
# 1. Evaluate the MA Crossover Strategy using Parallel Processing
# -------------------------------
ma_results = []
for s in short_windows:
    for l in long_windows:
        if s < l:
            trades_list = Parallel(n_jobs=-1)(
                delayed(get_trades_for_ticker_ma)(df, s, l) for df in stock_dfs.values()
            )
            # Flatten the list of trade results
            all_trades = [trade for sublist in trades_list for trade in sublist]
            if all_trades:
                avg_pnl = np.mean(all_trades)
                var_pnl = np.var(all_trades)
            else:
                avg_pnl = np.nan
                var_pnl = np.nan
            ma_results.append({
                'Short_MA': s,
                'Long_MA': l,
                'Average_PnL': avg_pnl,
                'Variance_PnL': var_pnl
            })

ma_results_df = pd.DataFrame(ma_results)
print("Moving Average Crossover Strategy Results:")
print(ma_results_df)

# -------------------------------
# 2. Evaluate the RSI-only Strategy using Parallel Processing
# -------------------------------
# Define a list of RSI threshold pairs to test: (RSI_buy_threshold, RSI_sell_threshold)
rsi_thresholds = [
    (70, 30),
    (75, 25),
    (65, 35),
    (80, 20),
    (60, 40)
]

rsi_results = []
for (rsi_buy, rsi_sell) in rsi_thresholds:
    trades_list = Parallel(n_jobs=-1)(
        delayed(get_trades_for_ticker_rsi)(df, rsi_buy, rsi_sell) for df in stock_dfs.values()
    )
    all_trades = [trade for sublist in trades_list for trade in sublist]
    if all_trades:
        avg_pnl = np.mean(all_trades)
        var_pnl = np.var(all_trades)
    else:
        avg_pnl = np.nan
        var_pnl = np.nan
    rsi_results.append({
        'RSI_Buy': rsi_buy,
        'RSI_Sell': rsi_sell,
        'Average_PnL': avg_pnl,
        'Variance_PnL': var_pnl
    })

rsi_results_df = pd.DataFrame(rsi_results)
print("\nRSI Strategy Results:")
print(rsi_results_df)

# -------------------------------
# 3. Evaluate the Combined MA & RSI Strategy using Parallel Processing
# -------------------------------
combined_results = []
for s in short_windows:
    for l in long_windows:
        if s < l:
            for (rsi_buy, rsi_sell) in rsi_thresholds:
                trades_list = Parallel(n_jobs=-1)(
                    delayed(get_trades_for_ticker_combined)(df, s, l, rsi_buy, rsi_sell) for df in stock_dfs.values()
                )
                all_trades = [trade for sublist in trades_list for trade in sublist]
                if all_trades:
                    avg_pnl = np.mean(all_trades)
                    var_pnl = np.var(all_trades)
                else:
                    avg_pnl = np.nan
                    var_pnl = np.nan
                combined_results.append({
                    'Short_MA': s,
                    'Long_MA': l,
                    'RSI_Buy': rsi_buy,
                    'RSI_Sell': rsi_sell,
                    'Average_PnL': avg_pnl,
                    'Variance_PnL': var_pnl
                })

combined_results_df = pd.DataFrame(combined_results)
print("\nCombined MA & RSI Strategy Results:")
print(combined_results_df)


Dataset loaded successfully with shape: (6763, 252)
Running strategy for SMA(5) & SMA(50)
Processing stock: AAPL_adjclose
Cumulative return for AAPL_adjclose: 3.5534
Processing stock: ABC_adjclose
Cumulative return for ABC_adjclose: -0.7143
Processing stock: ABMD_adjclose
Cumulative return for ABMD_adjclose: -0.8776
Processing stock: ABT_adjclose
Cumulative return for ABT_adjclose: -0.8392
Processing stock: ADI_adjclose
Cumulative return for ADI_adjclose: -0.9985
Processing stock: ADM_adjclose
Cumulative return for ADM_adjclose: 0.3716
Processing stock: ADP_adjclose
Cumulative return for ADP_adjclose: -0.9348
Processing stock: ADSK_adjclose
Cumulative return for ADSK_adjclose: -0.6565
Processing stock: AEP_adjclose
Cumulative return for AEP_adjclose: -0.4058
Processing stock: AJG_adjclose
Cumulative return for AJG_adjclose: -0.6291
Processing stock: ALB_adjclose
Cumulative return for ALB_adjclose: -0.9123
Processing stock: ALK_adjclose
Cumulative return for ALK_adjclose: -0.9645
Proces

Rules based Momentum startegy

Understand the concept of momentum
    ▪ Relative Strength Index (RSI)
    ▪ Write code to calculate these

Tasks:
    1. Implement a strategy that buys assets when momentum indicators signal strength (e.g., RSI > 70) and sells when they signal weakness (e.g., RSI < 30).
    2. Combine momentum signals with moving averages to enhance the strategy.
    3. Test the algorithm on a broad range of stocks (at least 100) from the S&P index
    4. You should report average P&L and variance of P&L for each combination of moving average periods
    5. Once again try five best combinations of RSI you can find both alone and combined with moving averages

ML based Moving average strategy

Tasks:
    1. Build a Deep Learning model that takes in the stock prices and ML indicators and RSI indicators as features
    2. Use a 3-layer neural network (1 hidden layers) where the inputs are the indicators and the output is a buy, sell or hold signal
    3. You train it over a subset of the time series
    4. You test it on another part of the timeseries
    5. See if the changing number of layers or neurons per layer helps
    6. Test the algorithm on a broad range of stocks (at least 100) from the S&P index
    7. You should report average P&L and variance of P&L

ML based Momentum strategy

Tasks:
    1. Implement a strategy that buys assets when momentum indicators signal strength (e.g., RSI > 70) and sells when they signal weakness (e.g., RSI < 30).
    2. Combine momentum signals with moving averages to enhance the strategy.
    3. Once again try five best combinations of RSI you can find both alone and combined with moving averages
    4. Use a machine learning algorithm with price, and various MA values and RSI as features to see if you can predict buy and sell

Strategy 2: Value-Based Strategies

Objectives:
• Understand fundamental metrics such as P/E ratios, book value
• Backtest value-based strategies using historical data.
• Evaluate the performance of the strategies.

Tasks:
    1. Buy stocks with low P/E ratio compared to historical average
    2. The Price-to-Book ratio compares a company's market value to its book value (the net asset value on the balance sheet). A low P/B ratio may indicate that the stock is undervalued relative to its assets.
    3. Use a machine learning algorithm with price, P/E and PtB values as features to see if you can predict buy and sell

Strategy 3: Sentiment-Based Strategies.

▪ Understand and implement sentiment-based trading strategies.
▪ Analyze sentiment data from news articles, social media, and other sources.
▪ Backtest sentiment-based strategies using historical sentiment data.
▪ Evaluate the performance of sentiment-based strategies.

Implement a Sentiment-Based Trading Strategy:
    ▪ Develop a strategy that buys stocks with positive sentiment and sells stocks with negative sentiment.
    ▪ Experiment with different thresholds for sentiment scores to refine the strategy.
    ▪ Combine sentiment analysis with the moving average and value-based strategies from previous sections.
    ▪ Explore how sentiment signals can enhance or detract from other strategies.
    ▪ Use historical sentiment data alongside market data to backtest the sentiment-based strategy.