We should implement and back-test trading algorithms using historical market data.

The project is divided into three main strategies:

1. Moving Average and Momentum Strategies,
2. Value-Based Strategies, and
3. Sentiment-Based Strategies.

Write our own Back Testing Code

▪ Back Testing simulates how a trading strategy would have performed in the past using historical data.
▪ The purpose is to evaluate the strategy's effectiveness, identify potential issues, and refine it before deploying it in live trading.
▪ Here are key components a back-testing code needs to handle:
    1. Load up and process price and other data
    2. Clean and prepare data
    3. Implement logic to buy and sell based on signals
    4. Define trades and measure their performance over time
    5. Incorporate realistic transaction costs
    6. Calculate metrics – return, drawdown, Sharpe ratio etc.…
    7. Visualise results

Validation and Sanity Checks

▪ Out-of-Sample Testing:
    ▪ After optimizing the strategy on historical data, test it on a separate dataset (out-of-sample data) to verify its robustness.
▪ Sanity Checks:
    ▪ Ensure the backtest is realistic (e.g., no future data leakage, no unrealistic execution assumptions) to prevent overestimating the strategy’s performance.

In [None]:
import pandas as pd

# Load the stock data
file_path = "/mnt/data/stock_data.csv"
df = pd.read_csv(file_path)

# Display basic information and the first few rows
df.info(), df.head()

# Convert Date column to datetime format
df["Date"] = pd.to_datetime(df["Date"])

# Sort data by Date
df = df.sort_values("Date").reset_index(drop=True)

# Forward-fill missing values to maintain continuity
df.fillna(method="ffill", inplace=True)

# Drop columns where more than 50% of data is missing
threshold = len(df) * 0.5
df = df.dropna(axis=1, thresh=threshold)

# Remove any duplicate rows if present
df = df.drop_duplicates()

# Display cleaned dataset info
df.info(), df.head()

# Define the new file path
cleaned_file_path = "/mnt/data/cleaned_stock_data.csv"

# Save the cleaned dataset to CSV
df.to_csv(cleaned_file_path, index=False)

# Return the file path for download
cleaned_file_path

# Drop rows where any column has missing values to keep only complete data
df_complete = df.dropna()

# Save the fully cleaned dataset
complete_file_path = "/mnt/data/fully_cleaned_stock_data.csv"
df_complete.to_csv(complete_file_path, index=False)

# Return the file path for download
complete_file_path


Strategy 1: Moving Average and Momentum Strategies

Rules based Moving average startegy

Understand & implement moving average strategies
    ▪ Simple Moving Average (SMA)
    ▪ Write code that calculates this to different periods

Tasks:
    1. Implement a strategy where a short-term moving average (e.g., S-day SMA) crosses above or below a long-term moving average (e.g., L-day SMA).
    2. Write code to execute buy orders when the short-term average crosses above the long-term average and sell orders when the opposite occurs.
    3. Test the algorithm on a broad range of stocks (at least 100) from the S&P index for a range of values of S and L
    4. You should report average P&L and variance of P&L for each combination of moving average periods

In [4]:
import pandas as pd
import numpy as np
from itertools import product

# Load dataset
file_path = "fully_cleaned_stock_data.csv"
df = pd.read_csv(file_path)
df['Date'] = pd.to_datetime(df['Date'])  # Convert date column to datetime format

print("Dataset loaded successfully with shape:", df.shape)

df

Dataset loaded successfully with shape: (6763, 253)


Unnamed: 0,Date,AAPL_adjclose,ABC_adjclose,ABMD_adjclose,ABT_adjclose,ADI_adjclose,ADM_adjclose,ADP_adjclose,ADSK_adjclose,AEP_adjclose,...,WEC_adjclose,WHR_adjclose,WM_adjclose,WMB_adjclose,WRB_adjclose,WST_adjclose,WY_adjclose,XEL_adjclose,XOM_adjclose,ZION_adjclose
0,1996-02-01,0.215958,2.986069,7.250000,5.315176,5.811177,8.108320,9.436206,7.100580,12.827035,...,5.974281,28.367496,11.960501,4.755478,1.971051,4.115513,6.994101,7.907068,8.867104,12.244139
1,1996-02-02,0.222618,2.889744,6.750000,5.330627,5.659843,8.108320,9.524950,6.983215,12.718023,...,5.974281,27.745419,12.033428,4.767860,1.971051,4.136405,6.994101,7.887832,8.826117,12.244139
2,1996-02-05,0.222618,2.841581,6.750000,5.361532,5.932241,8.108320,9.406628,7.335310,12.790698,...,5.998176,27.869825,12.106361,4.743094,1.922020,4.136405,6.974671,7.926310,8.826117,12.244139
3,1996-02-06,0.225472,2.817500,6.375000,5.268825,5.962509,8.161662,9.465786,7.217945,12.718023,...,5.998176,27.745419,11.814639,4.705940,1.971051,4.136405,7.091239,8.041742,8.894431,12.284556
4,1996-02-07,0.215007,2.913825,6.500000,5.361532,6.537573,8.161662,9.524950,8.127523,12.790698,...,5.998176,28.056448,11.814639,4.705940,1.971051,4.073730,7.188381,8.022505,8.908089,12.324958
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6758,2022-12-06,142.910004,169.320007,377.950012,103.860001,166.509995,90.849998,257.283325,194.610001,96.370003,...,95.419998,141.860001,165.559998,33.049026,75.410004,234.990005,31.020000,69.320000,103.879997,47.139999
6759,2022-12-07,140.940002,170.289993,377.790009,104.809998,166.009995,93.169998,257.970001,193.339996,96.669998,...,94.589996,142.759995,165.210007,33.365002,74.650002,235.889999,31.690001,68.870003,103.650002,47.049999
6760,2022-12-08,142.649994,170.460007,380.750000,106.919998,169.649994,92.400002,260.049988,196.630005,97.720001,...,95.660004,145.360001,166.009995,32.910000,74.440002,236.029999,31.670000,69.809998,104.419998,47.380001
6761,2022-12-09,142.160004,165.330002,380.750000,107.510002,168.679993,91.879997,257.200012,194.309998,96.580002,...,95.730003,143.809998,166.830002,32.590000,73.559998,239.009995,31.480000,69.839996,103.540001,47.490002


In [5]:
import plotly.express as px
# Plotting the adjusted close price of AAPL over time
px.line(df,x='Date',y='AAPL_adjclose', title='AAPL Adjusted Close Price')

In [1]:
import pandas as pd
import numpy as np
from itertools import product

# Load dataset
file_path = "fully_cleaned_stock_data.csv"
df = pd.read_csv(file_path)
df['Date'] = pd.to_datetime(df['Date'])  # Convert date column to datetime format
df.set_index('Date', inplace=True)  # Set Date as index for time-series analysis

print("Dataset loaded successfully with shape:", df.shape)

# Define function to calculate moving averages
def moving_average_crossover_strategy(df, short_window, long_window):
    results = {}
    print(f"Running strategy for SMA({short_window}) & SMA({long_window})")
    
    for stock in df.columns:
        print(f"Processing stock: {stock}")
        data = df.loc[:, [stock]].copy()  # Extract specific stock data and create a copy
        data.dropna(inplace=True)  # Remove missing values to prevent calculation errors
        
        data['SMA_short'] = data[stock].rolling(window=short_window, min_periods=1).mean()  # Compute short-term SMA
        data['SMA_long'] = data[stock].rolling(window=long_window, min_periods=1).mean()  # Compute long-term SMA
        
        # Generate buy/sell signals based on SMA crossovers
        data['Signal'] = np.where(data['SMA_short'] > data['SMA_long'], 1, -1)
        
        # Calculate daily returns
        data['Daily_Return'] = data[stock].pct_change().fillna(0)
        
        # Apply strategy: shift signal by 1 to avoid look-ahead bias
        data['Strategy_Return'] = data['Signal'].shift(1).fillna(0) * data['Daily_Return']
        
        # Compute cumulative returns using cumulative product method
        cumulative_return = (1 + data['Strategy_Return']).cumprod().iloc[-1] - 1 if not data.empty else 0
        results[stock] = cumulative_return  # Store cumulative return for each stock
        print(f"Cumulative return for {stock}: {cumulative_return:.4f}")
    
    return results

# Define moving average periods to test
short_windows = [5, 10, 20]  # List of short-term SMA periods
long_windows = [50, 100, 200]  # List of long-term SMA periods

# Store results
strategy_results = {}

for short, long in product(short_windows, long_windows):
    if short < long:  # Ensure short-term is smaller than long-term
        pnl_results = moving_average_crossover_strategy(df, short, long)
        avg_pnl = np.mean(list(pnl_results.values()))  # Calculate average profit & loss
        var_pnl = np.var(list(pnl_results.values()))  # Calculate variance of profit & loss
        strategy_results[(short, long)] = {'Average P&L': avg_pnl, 'Variance P&L': var_pnl}
        print(f"Results for SMA({short}) & SMA({long}): Avg P&L = {avg_pnl:.4f}, Variance P&L = {var_pnl:.6f}")

# Convert results to DataFrame for better readability and exportability
results_df = pd.DataFrame.from_dict(strategy_results, orient='index', columns=['Average P&L', 'Variance P&L'])
print("Final strategy results:")
print(results_df)

Dataset loaded successfully with shape: (6763, 252)
Running strategy for SMA(5) & SMA(50)
Processing stock: AAPL_adjclose
Cumulative return for AAPL_adjclose: 3.5534
Processing stock: ABC_adjclose
Cumulative return for ABC_adjclose: -0.7143
Processing stock: ABMD_adjclose
Cumulative return for ABMD_adjclose: -0.8776
Processing stock: ABT_adjclose
Cumulative return for ABT_adjclose: -0.8392
Processing stock: ADI_adjclose
Cumulative return for ADI_adjclose: -0.9985
Processing stock: ADM_adjclose
Cumulative return for ADM_adjclose: 0.3716
Processing stock: ADP_adjclose
Cumulative return for ADP_adjclose: -0.9348
Processing stock: ADSK_adjclose
Cumulative return for ADSK_adjclose: -0.6565
Processing stock: AEP_adjclose
Cumulative return for AEP_adjclose: -0.4058
Processing stock: AJG_adjclose
Cumulative return for AJG_adjclose: -0.6291
Processing stock: ALB_adjclose
Cumulative return for ALB_adjclose: -0.9123
Processing stock: ALK_adjclose
Cumulative return for ALK_adjclose: -0.9645
Proces

Rules based Momentum startegy

Understand the concept of momentum
    ▪ Relative Strength Index (RSI)
    ▪ Write code to calculate these

Tasks:
    1. Implement a strategy that buys assets when momentum indicators signal strength (e.g., RSI > 70) and sells when they signal weakness (e.g., RSI < 30).
    2. Combine momentum signals with moving averages to enhance the strategy.
    3. Test the algorithm on a broad range of stocks (at least 100) from the S&P index
    4. You should report average P&L and variance of P&L for each combination of moving average periods
    5. Once again try five best combinations of RSI you can find both alone and combined with moving averages

In [2]:
import pandas as pd
import numpy as np
import itertools

# -----------------------
# Helper Functions
# -----------------------

def compute_rsi(series, period=14):
    """
    Compute the Relative Strength Index (RSI) for a pandas Series.
    """
    delta = series.diff()
    gain = delta.clip(lower=0)
    loss = -delta.clip(upper=0)
    
    # Use exponential moving average for smoothing
    avg_gain = gain.ewm(alpha=1/period, min_periods=period).mean()
    avg_loss = loss.ewm(alpha=1/period, min_periods=period).mean()
    
    rs = avg_gain / avg_loss
    rsi = 100 - (100 / (1 + rs))
    return rsi

def calculate_moving_average(series, window):
    """
    Compute the moving average over a specified window.
    """
    return series.rolling(window=window, min_periods=1).mean()

# -----------------------
# Backtesting Functions
# -----------------------

def backtest_rsi_strategy(prices, rsi_upper=70, rsi_lower=30, rsi_period=14):
    """
    Backtest a simple RSI-based strategy for a single stock price series:
      - Enter a long position when RSI > rsi_upper.
      - Exit the position when RSI < rsi_lower.
    
    Returns a list of trade returns.
    """
    df = pd.DataFrame({'Close': prices})
    df['RSI'] = compute_rsi(df['Close'], period=rsi_period)
    position = 0  # 0 means no position, 1 means long
    entry_price = 0
    trade_returns = []
    
    for idx, row in df.iterrows():
        if position == 0 and row['RSI'] > rsi_upper:
            # Enter long position
            position = 1
            entry_price = row['Close']
        elif position == 1 and row['RSI'] < rsi_lower:
            # Exit position: calculate return percentage
            ret = (row['Close'] - entry_price) / entry_price
            trade_returns.append(ret)
            position = 0  # close the position
    # Exit any open trade at the end
    if position == 1:
        ret = (df.iloc[-1]['Close'] - entry_price) / entry_price
        trade_returns.append(ret)
    return trade_returns

def backtest_rsi_ma_strategy(prices, rsi_upper=70, rsi_lower=30, rsi_period=14,
                             short_window=10, long_window=50):
    """
    Backtest a strategy combining RSI signals with moving averages for a single stock price series:
      - Enter long when RSI > rsi_upper AND short MA > long MA.
      - Exit when RSI < rsi_lower OR the moving average condition fails.
    
    Returns a list of trade returns.
    """
    df = pd.DataFrame({'Close': prices})
    df['RSI'] = compute_rsi(df['Close'], period=rsi_period)
    df['MA_short'] = calculate_moving_average(df['Close'], window=short_window)
    df['MA_long'] = calculate_moving_average(df['Close'], window=long_window)
    
    position = 0
    entry_price = 0
    trade_returns = []
    
    for idx, row in df.iterrows():
        ma_condition = row['MA_short'] > row['MA_long']
        if position == 0 and (row['RSI'] > rsi_upper) and ma_condition:
            # Enter long position if trend is confirmed by MA condition
            position = 1
            entry_price = row['Close']
        elif position == 1 and ((row['RSI'] < rsi_lower) or (not ma_condition)):
            # Exit position: calculate return percentage
            ret = (row['Close'] - entry_price) / entry_price
            trade_returns.append(ret)
            position = 0
    # Exit any open trade at the end
    if position == 1:
        ret = (df.iloc[-1]['Close'] - entry_price) / entry_price
        trade_returns.append(ret)
    return trade_returns

# -----------------------
# Load and Prepare Data
# -----------------------

# Read CSV file. The first column is 'Date' and the rest are stock price series.
data = pd.read_csv("fully_cleaned_stock_data.csv", parse_dates=['Date'])
data.sort_values('Date', inplace=True)

# Get list of tickers from the columns (all columns except 'Date')
tickers = data.columns.drop('Date')
print(f"Number of tickers: {len(tickers)}")

# -----------------------
# Parameter Combinations
# -----------------------

# Define RSI threshold combinations for the RSI-only strategy.
rsi_threshold_combos = [
    {'rsi_upper': 70, 'rsi_lower': 30},
    {'rsi_upper': 65, 'rsi_lower': 35},
    {'rsi_upper': 75, 'rsi_lower': 25},
    {'rsi_upper': 60, 'rsi_lower': 40},
    {'rsi_upper': 80, 'rsi_lower': 20}
]

# Define moving average combinations for the combined strategy.
ma_combos = list(itertools.product([5, 10, 15], [20, 30, 50]))
ma_combos = [combo for combo in ma_combos if combo[0] < combo[1]]  # Ensure short_window < long_window

# -----------------------
# Backtest Across Stocks
# -----------------------

results_rsi = []
results_rsi_ma = []

# Backtest RSI-only strategy across all tickers
for combo in rsi_threshold_combos:
    all_trade_returns = []
    for ticker in tickers:
        prices = data[ticker]
        returns = backtest_rsi_strategy(prices,
                                        rsi_upper=combo['rsi_upper'],
                                        rsi_lower=combo['rsi_lower'],
                                        rsi_period=14)
        all_trade_returns.extend(returns)
    if all_trade_returns:
        avg_pnl = np.mean(all_trade_returns)
        var_pnl = np.var(all_trade_returns)
    else:
        avg_pnl = np.nan
        var_pnl = np.nan
    results_rsi.append({
        'Ticker_Strategy': 'RSI-only',
        'RSI_upper': combo['rsi_upper'],
        'RSI_lower': combo['rsi_lower'],
        'Avg_PnL': avg_pnl,
        'Var_PnL': var_pnl,
        'Num_Trades': len(all_trade_returns)
    })

# Backtest RSI + MA combined strategy across all tickers
for rsi_combo in rsi_threshold_combos:
    for ma_combo in ma_combos:
        all_trade_returns = []
        for ticker in tickers:
            prices = data[ticker]
            returns = backtest_rsi_ma_strategy(prices,
                                               rsi_upper=rsi_combo['rsi_upper'],
                                               rsi_lower=rsi_combo['rsi_lower'],
                                               rsi_period=14,
                                               short_window=ma_combo[0],
                                               long_window=ma_combo[1])
            all_trade_returns.extend(returns)
        if all_trade_returns:
            avg_pnl = np.mean(all_trade_returns)
            var_pnl = np.var(all_trade_returns)
        else:
            avg_pnl = np.nan
            var_pnl = np.nan
        results_rsi_ma.append({
            'Ticker_Strategy': 'RSI+MA',
            'RSI_upper': rsi_combo['rsi_upper'],
            'RSI_lower': rsi_combo['rsi_lower'],
            'MA_short': ma_combo[0],
            'MA_long': ma_combo[1],
            'Avg_PnL': avg_pnl,
            'Var_PnL': var_pnl,
            'Num_Trades': len(all_trade_returns)
        })

# -----------------------
# Reporting the Results
# -----------------------

results_rsi_df = pd.DataFrame(results_rsi)
results_rsi_ma_df = pd.DataFrame(results_rsi_ma)

print("RSI-only Strategy Results (Top 5 by Avg_PnL):")
print(results_rsi_df.sort_values(by='Avg_PnL', ascending=False).head(5))

print("\nRSI + Moving Averages Combined Strategy Results (Top 5 by Avg_PnL):")
print(results_rsi_ma_df.sort_values(by='Avg_PnL', ascending=False).head(5))


Number of tickers: 252
RSI-only Strategy Results (Top 5 by Avg_PnL):
  Ticker_Strategy  RSI_upper  RSI_lower   Avg_PnL     Var_PnL  Num_Trades
4        RSI-only         80         20  1.833465  200.588793        1008
2        RSI-only         75         25  0.279897    1.237309        2774
0        RSI-only         70         30  0.114664    0.595637        6010
1        RSI-only         65         35  0.050799    0.071060       11283
3        RSI-only         60         40  0.027161    0.033086       19394

RSI + Moving Averages Combined Strategy Results (Top 5 by Avg_PnL):
   Ticker_Strategy  RSI_upper  RSI_lower  MA_short  MA_long   Avg_PnL  \
17          RSI+MA         65         35        15       50  0.027349   
8           RSI+MA         70         30        15       50  0.026804   
26          RSI+MA         75         25        15       50  0.026693   
14          RSI+MA         65         35        10       50  0.026277   
5           RSI+MA         70         30        10   

ML based Moving average strategy

Tasks:
    1. Build a Deep Learning model that takes in the stock prices and ML indicators and RSI indicators as features
    2. Use a 3-layer neural network (1 hidden layers) where the inputs are the indicators and the output is a buy, sell or hold signal
    3. You train it over a subset of the time series
    4. You test it on another part of the timeseries
    5. See if the changing number of layers or neurons per layer helps
    6. Test the algorithm on a broad range of stocks (at least 100) from the S&P index
    7. You should report average P&L and variance of P&L

ML based Momentum strategy

Tasks:
    1. Implement a strategy that buys assets when momentum indicators signal strength (e.g., RSI > 70) and sells when they signal weakness (e.g., RSI < 30).
    2. Combine momentum signals with moving averages to enhance the strategy.
    3. Once again try five best combinations of RSI you can find both alone and combined with moving averages
    4. Use a machine learning algorithm with price, and various MA values and RSI as features to see if you can predict buy and sell

Strategy 2: Value-Based Strategies

Objectives:
• Understand fundamental metrics such as P/E ratios, book value
• Backtest value-based strategies using historical data.
• Evaluate the performance of the strategies.

Tasks:
    1. Buy stocks with low P/E ratio compared to historical average
    2. The Price-to-Book ratio compares a company's market value to its book value (the net asset value on the balance sheet). A low P/B ratio may indicate that the stock is undervalued relative to its assets.
    3. Use a machine learning algorithm with price, P/E and PtB values as features to see if you can predict buy and sell

Strategy 3: Sentiment-Based Strategies.

▪ Understand and implement sentiment-based trading strategies.
▪ Analyze sentiment data from news articles, social media, and other sources.
▪ Backtest sentiment-based strategies using historical sentiment data.
▪ Evaluate the performance of sentiment-based strategies.

Implement a Sentiment-Based Trading Strategy:
    ▪ Develop a strategy that buys stocks with positive sentiment and sells stocks with negative sentiment.
    ▪ Experiment with different thresholds for sentiment scores to refine the strategy.
    ▪ Combine sentiment analysis with the moving average and value-based strategies from previous sections.
    ▪ Explore how sentiment signals can enhance or detract from other strategies.
    ▪ Use historical sentiment data alongside market data to backtest the sentiment-based strategy.