Franco Carlos | frnccc@uw.edu

CFRM 523: Advanced Trading Systems

Paper Replication Project:

**A New Anomaly: The Cross-Sectional Profitability of Technical Analysis**

by Yufeng Han, Ke Yang, and Guofu Zhou

The summary of the paper and literature review are submitted in full seperately. We begin with an abbreviated summary of the hypothesis in this notebook then replicate the trading strategy.

#Summary of Strategy

This trading strategy utilizes moving average timing of technical analysis, applied to portfolios sorted by volatility.

The key analytical technique for the strategy that we will replicate is as follows:

1.   **Obtain** the list of stocks (in the study, all stocks listed on the NYSE/Amex were used).
2.   **Sort** each stock by their volatility, measured as the standard deviations of daily returns within a given year. **Divide** these stocks into ten equal deciles, or groups. This means that the first decile has the stocks with the lowest 10% volatility, and the tenth decile has the stocks with the highest 10% volatility. Each stock is equally weighted in a decile portfolio, and each year the portfolios are rebalanced according to the updated volatility.
3.   For each decile portfolio, **compute** the 10-day moving average of the daily closing prices (or index levels) of the portfolio. Each day, if a portfolio's current price is above its 10-day MA, the **signal** is to buy or hold the portfolio. If below, the **signal** is to switch to the risk-free asset. **Apply the MA timing strategy** daily for each decile portfolio, moving between the portfolio and a risk-free asset based on the timing signals.
4. **Track the returns** generated by the MA timing strategy for each decile portfolio. The paper uses the CAPM and Fama-French three-factor models to calculate risk-adjusted returns (alphas) of the MA timing strategy for comparison against the benchmark (buy-and-hold strategy).


Our hypothesis is that creating such a strategy that involves using the moving average on volatility-sorted decile portfolios will have a pattern of returns that differs from simply buying-and-holding these same portfolios.

In order to assess the probability that the strategy was overfit, the paper offered several robustness measures for the strategy. First, the paper considered alternative lag lengths for moving averages. They performed this strategy again for the MA(20) MA(50), MA(100) and MA(200) moving averages. Second, they performed this strategy by ordering the stocks based on their market cap instead of their volatility, and creating the decile portfolios based on this ordering.

We replicate their key analytical technqiues as well as their robustness measures for overfitting.

# Data Preparation

**Data Description and Source**

The paper obtained all their data from the Center for Research in Security Prices (CRSP). They obtained the adjusted closing prices for all listed NYSE/Amex stocks from July 1, 1963 to December 31, 2009.

However, their data was obtained under an institutional license granted to them, inaccessible to individual academic users. In that regard, the replication will need to address the following limitations in obtaining data:

1.   Data needs to be readily available without a necessary institutional license from the respective source.
2.   Data acquisition needs to be automated (i.e. ran through a library such as yfinance) in order to not have to be individually downloaded by hand.
2.   Automated data acquisition must adhere to rate limits from the respective library.

Thus, in this replication, the 10 volatility decile portfolios will be constructed using the stocks listed in the S&P 500, obtained using the yfinance library. Our frame of time will be from January 1 1980 to December 31 2022. This extends the analysis with more recent data, in a different asset index. Upon completing the strategy, we acknowledge the fundamental differences in the properties between NYSE/Amex and SP500, taking these into consideration when comparing these results to the paper's.

**Loading, Cleaning, and Preparing Data**

The first step is to load all necessary packages.

In [None]:
import pandas as pd
import yfinance as yf
import pandas as pd
import time
import os
import numpy as np
import pandas_datareader as pdr
import datetime
import statsmodels.api as sm

This next step imports the tickers of all stocks listed on the S&P500.

In [None]:
# Load the data from the file
datapath = 'https://raw.githubusercontent.com/franco-rey/cfrm-523-replication-project/main/SP500.csv'
data = pd.read_csv(datapath, delimiter=',')
tickers = data['Symbol'].tolist()
descriptions = data['Details'].tolist()

# Display the first few rows of the dataframe
print(data.head())
print(data.columns)
# Display the first first 10 tickers and descriptions seperately
print(tickers[:10])
print(descriptions[:10])

  Symbol                Details
0   MSFT  Microsoft Corporation
1   AAPL              Apple Inc
2   GOOG          Alphabet Inc.
3  GOOGL          Alphabet Inc.
4   NVDA     NVIDIA Corporation
Index(['Symbol', 'Details'], dtype='object')
['MSFT', 'AAPL', 'GOOG', 'GOOGL', 'NVDA', 'AMZN', 'META', 'BRK.B', 'LLY', 'AVGO']
['Microsoft Corporation', 'Apple Inc', 'Alphabet Inc.', 'Alphabet Inc.', 'NVIDIA Corporation', 'Amazon.com, Inc.', 'Meta Platforms, Inc.', 'Berkshire Hathaway Inc.', 'Eli Lilly and Company', 'Broadcom Inc.']


This code defined the start and end date, as well as the function, to fetch and download the adjusted closing price data.

In [None]:
# Define a start and end date for the historical data
start_date = '1980-01-01'
end_date = '2022-12-31'

# Path to save the CSV file
file_path = '/content/sp500_adj_close_data.csv'

# Download historical data for each ticker
def get_data(tickers, batch_size=10, sleep_time=2, file_path=file_path):
    if os.path.exists(file_path):
        print("Data file exists. Loading data...")
        return pd.read_csv(file_path, index_col='Date', parse_dates=True)

    print("Downloading new data...")
    all_data = pd.DataFrame()
    for i in range(0, len(tickers), batch_size):
        batch = tickers[i:i + batch_size]
        data = yf.download(batch, start=start_date, end=end_date, progress=False)['Adj Close']
        all_data = pd.concat([all_data, data], axis=1)
        time.sleep(sleep_time)  # Pause for a few seconds between each batch
    all_data.to_csv(file_path)
    return all_data

This cell is not necessary to run. It downloaded and exported the .csv of closing prices.

For computational efficiency, the data has already been saved and exported to an external GitHub repository.

In [None]:
# Calculate daily returns and annualized volatility
def calculate_volatility(data):
    daily_returns = data.pct_change()
    volatility = daily_returns.std() * np.sqrt(252)  # Annualizing the standard deviation
    return volatility

# Sort stocks into deciles based on volatility
def sort_into_deciles(volatility):
    deciles = pd.qcut(volatility, 10, labels=False, duplicates='drop')
    return deciles

# Function to process all steps
def process_stock_data(tickers):
    data = get_data(tickers)
    if not data.empty:
        volatility = calculate_volatility(data)
        deciles = sort_into_deciles(volatility)
        return deciles, data
    else:
        return None, None

# Execute the function
deciles, stock_data = process_stock_data(tickers)

# Check and print the decile information if available
if deciles is not None and stock_data is not None:
    for decile in range(10):
        print(f"Stocks in Decile {decile+1}:")
        print(stock_data.columns[deciles == decile])
else:
    print("No data available.")

This code takes our adjusted closing prices, computes the volatilties for each asset, and sorts them into ten decile portfolios based on their volatilities. It begins by obtaining the closing prices from the linked GitHub .csv file.

In [None]:
# Path to save the CSV file
file_path = 'https://raw.githubusercontent.com/franco-rey/cfrm-523-replication-project/main/sp500_adj_close.csv'

# Load the data
def load_data(file_path):
    return pd.read_csv(file_path, index_col='Date', parse_dates=True)

# Calculate daily returns and annualized volatility
def calculate_volatility(data):
    daily_returns = data.pct_change()
    volatility = daily_returns.std() * np.sqrt(252)  # Annualizing the standard deviation
    return volatility

# Sort stocks into deciles based on volatility
def sort_into_deciles(volatility):
    deciles = pd.qcut(volatility, 10, labels=False, duplicates='drop')
    return deciles

# Function to process all steps
def process_stock_data(file_path):
    data = load_data(file_path)
    if not data.empty:
        volatility = calculate_volatility(data)
        deciles = sort_into_deciles(volatility)
        return deciles, data
    else:
        return None, None

# Execute the function
deciles, stock_data = process_stock_data(file_path)

# Check and print the decile information if available
if deciles is not None and stock_data is not None:
    for decile in range(10):
        print(f"Stocks in Decile {decile+1}:")
        print(stock_data.columns[deciles == decile])
else:
    print("No data available.")

Stocks in Decile 1:
Index(['JNJ', 'PG', 'XOM', 'KO', 'PEP', 'MCD', 'ABT', 'VZ', 'PM', 'NEE', 'T',
       'UPS', 'MDLZ', 'DUK', 'SO', 'CL', 'BDX', 'ZTS', 'AJG', 'MMM', 'AEP',
       'SRE', 'KMB', 'D', 'GIS', 'EXC', 'HSY', 'PEG', 'VRSK', 'ED', 'XEL',
       'WEC', 'AWK', 'DTE', 'ETR', 'GPC', 'FE', 'ES', 'AEE', 'CBOE', 'HRL',
       'K', 'PPL', 'ATO', 'AMCR', 'CPB', 'LNT', 'NI', 'SJM', 'EVRG'],
      dtype='object')
Stocks in Decile 2:
Index(['LLY', 'WMT', 'ABBV', 'CVX', 'MRK', 'IBM', 'PFE', 'RTX', 'UNP', 'ADP',
       'BMY', 'MMC', 'GD', 'ITW', 'MO', 'NOC', 'AON', 'ECL', 'EMR', 'WELL',
       'TRV', 'GWW', 'KDP', 'PSA', 'SYY', 'DG', 'PPG', 'XYL', 'FTV', 'EIX',
       'WTW', 'CHD', 'BR', 'IFF', 'WRB', 'BAX', 'INVH', 'MKC', 'CINF', 'CLX',
       'LDOS', 'L', 'ESS', 'CAG', 'MAA', 'UDR', 'ALLE', 'HII', 'FRT', 'PNW'],
      dtype='object')
Stocks in Decile 3:
Index(['V', 'ACN', 'LIN', 'GE', 'SPGI', 'ETN', 'HON', 'PGR', 'LMT', 'MDT',
       'CVS', 'SHW', 'RSG', 'NSC', 'APD', 'HLT', 'KHC', 'O',

**Summary Statistics**

We report the same summary statistics as presented in the original research paper. This includes the average returns, standard deviation, skewness, and Sharpe ratios for each volatility decile portfolio, the MA(10) timing strategy portfolios, and each Moving Average Portfolio (MAP), expressed as the difference between the MA(10) and volatility decile portfolios.

In [None]:
# Function to calculate statistics from data
def calculate_statistics_test1(data):
    daily_returns = data.pct_change()
    mean_returns = daily_returns.mean().mean() * 252
    std_dev = daily_returns.std().mean() * np.sqrt(252)
    skewness = daily_returns.skew().mean()
    sharpe_ratio = mean_returns / std_dev if std_dev != 0 else np.nan
    return mean_returns, std_dev, skewness, sharpe_ratio

# Assume 'deciles' and 'stock_data' are predefined variables
results_test1 = {}

if deciles is not None and stock_data is not None:
    for decile in range(10):
        decile_data = stock_data.loc[:, deciles == decile]
        if not decile_data.empty:
            results = calculate_statistics_test1(decile_data)
            results_test1[f"Decile {decile + 1}"] = results
        else:
            results_test1[f"Decile {decile + 1}"] = None

if deciles is not None and stock_data is not None:
    for decile in range(10):
        decile_data = stock_data.loc[:, deciles == decile]
        if not decile_data.empty:
            mean_returns, std_dev, skewness, sharpe_ratio = calculate_statistics_test1(decile_data)
            print(f"\nSummary for Decile {decile + 1}:")
            print(f"Average Returns: {mean_returns:.6f}")
            print(f"Standard Deviation: {std_dev:.6f}")
            print(f"Skewness: {skewness:.6f}")
            print(f"Sharpe Ratio: {sharpe_ratio:.6f}")
        else:
            print(f"\nDecile {decile + 1} has no data.")
else:
    print("No data available.")


Summary for Decile 1:
Average Returns: 0.142380
Standard Deviation: 0.233098
Skewness: 0.029865
Sharpe Ratio: 0.610816

Summary for Decile 2:
Average Returns: 0.161559
Standard Deviation: 0.270982
Skewness: 0.124398
Sharpe Ratio: 0.596197

Summary for Decile 3:
Average Returns: 0.167197
Standard Deviation: 0.290885
Skewness: 0.183948
Sharpe Ratio: 0.574788

Summary for Decile 4:
Average Returns: 0.154358
Standard Deviation: 0.313884
Skewness: -0.037878
Sharpe Ratio: 0.491765

Summary for Decile 5:
Average Returns: 0.199001
Standard Deviation: 0.338661
Skewness: 0.124165
Sharpe Ratio: 0.587612

Summary for Decile 6:
Average Returns: 0.199437
Standard Deviation: 0.366211
Skewness: 0.017831
Sharpe Ratio: 0.544594

Summary for Decile 7:
Average Returns: 0.199749
Standard Deviation: 0.396372
Skewness: 0.156311
Sharpe Ratio: 0.503943

Summary for Decile 8:
Average Returns: 0.224954
Standard Deviation: 0.437704
Skewness: 0.264194
Sharpe Ratio: 0.513942

Summary for Decile 9:
Average Returns:

Computing the MA(10) portfolio and MAP returns are part of the analytical technique replication, and their summary statistics will follow.

#Analytical Technique Replication

**Analytical Technique Replication**

In [None]:
def calculate_moving_average(data, window=10):
    return data.rolling(window=window).mean()

# Apply the MA strategy
def apply_ma_strategy(prices, ma_prices, risk_free_rate):
    signals = prices.shift(1) > ma_prices.shift(1)
    daily_returns = prices.pct_change()
    strategy_returns = np.where(signals, daily_returns, risk_free_rate)  # Apply risk-free returns when out of the market
    strategy_returns = pd.Series(strategy_returns, index=prices.index)  # Convert numpy array to pandas Series
    return strategy_returns.cumsum()  # Cumulative returns

def implement_ma_strategy_for_deciles(stock_data, deciles, risk_free_rate=0.0001 / 252):
    strategy_results = {}
    for i in range(10):
        decile_stocks = stock_data.columns[deciles == i]
        decile_prices = stock_data[decile_stocks].mean(axis=1)  # Average price for each decile
        ma_prices = calculate_moving_average(decile_prices)
        strategy_results[f"Decile {i+1}"] = apply_ma_strategy(decile_prices, ma_prices, risk_free_rate)
    return strategy_results

if deciles is not None and stock_data is not None:
    strategy_results = implement_ma_strategy_for_deciles(stock_data, deciles)
    for key, value in strategy_results.items():
        print(f"{key}: Last cumulative return: {value.iloc[-1]}")  # Now 'value' is a pandas Series
else:
    print("No data available.")

Decile 1: Last cumulative return: 2.762424685029065
Decile 2: Last cumulative return: 2.837077816665813
Decile 3: Last cumulative return: 4.175456641337996
Decile 4: Last cumulative return: 3.9969945720764
Decile 5: Last cumulative return: 4.270129454834881
Decile 6: Last cumulative return: 4.129935741437478
Decile 7: Last cumulative return: 2.670110749025901
Decile 8: Last cumulative return: 3.169593747712552
Decile 9: Last cumulative return: 5.771735768127628
Decile 10: Last cumulative return: 6.349655200848867


In [None]:
# Function to calculate statistics from daily returns
def calculate_statistics_test2(daily_returns):
    mean_returns = daily_returns.mean() * 252
    std_dev = daily_returns.std() * np.sqrt(252)
    skewness = daily_returns.skew()
    sharpe_ratio = mean_returns / std_dev if std_dev != 0 else np.nan
    return mean_returns, std_dev, skewness, sharpe_ratio

def calculate_daily_returns_from_cumulative(cumulative_returns):
    daily_returns = (cumulative_returns.pct_change().fillna(0) + 1).pow(1 / 252) - 1
    return daily_returns

# Assume 'strategy_results' contains the cumulative returns for each decile
results_test2 = {}

if strategy_results:
    for key, cumulative_returns in strategy_results.items():
        daily_returns = calculate_daily_returns_from_cumulative(cumulative_returns)
        results = calculate_statistics_test2(daily_returns)
        results_test2[key] = results

#Results of Hypothesis Tests, Comparison to Original Paper Results

In [None]:
def calculate_statistics(daily_returns):
    """Calculate aggregated mean return, standard deviation, skewness, and Sharpe Ratio for the provided data."""
    mean_returns = daily_returns.mean() * 252  # Annualize the mean returns
    std_dev = daily_returns.std() * np.sqrt(252)  # Annualize the standard deviation
    skewness = daily_returns.skew()  # Calculate skewness
    sharpe_ratio = mean_returns / std_dev if std_dev != 0 else np.nan  # Calculate Sharpe Ratio
    return mean_returns, std_dev, skewness, sharpe_ratio

def calculate_daily_returns_from_cumulative(cumulative_returns):
    """Convert cumulative returns to daily returns."""
    daily_returns = (cumulative_returns.pct_change().fillna(0) + 1).pow(1 / 252) - 1  # Convert cumulative to daily and normalize to daily returns
    return daily_returns

In [None]:
# Function to print results side by side with differences, sorted by decile number
def print_results_side_by_side(results1, results2):
    print(f"{'Metric':>20} {'Test 1':>20} {'Test 2':>20} {'Difference':>20}")

    # Custom sorting function for keys that likely contain numeric values
    sorted_keys = sorted(results1.keys(), key=lambda x: int(x.split()[1]))

    for key in sorted_keys:
        result1 = results1.get(key, ('n/a', 'n/a', 'n/a', 'n/a'))
        result2 = results2.get(key, ('n/a', 'n/a', 'n/a', 'n/a'))
        differences = []

        # Calculate differences where applicable
        for i in range(len(result1)):
            if isinstance(result1[i], (float, int)) and isinstance(result2[i], (float, int)):
                differences.append(result2[i] - result1[i])
            else:
                differences.append('n/a')

        print(f"\n{key}:")
        print(f"{'Average Returns:':>20} {result1[0]:>20.6f} {result2[0]:>20.6f} {differences[0]:>20.6f}")
        print(f"{'Standard Deviation:':>20} {result1[1]:>20.6f} {result2[1]:>20.6f} {differences[1]:>20.6f}")
        print(f"{'Skewness:':>20} {result1[2]:>20.6f} {result2[2]:>20.6f} {differences[2]:>20.6f}")
        print(f"{'Sharpe Ratio:':>20} {result1[3]:>20.6f} {result2[3]:>20.6f} {differences[3]:>20.6f}")

# Call the function to display results
print_results_side_by_side(results_test1, results_test2)


              Metric               Test 1               Test 2           Difference

Decile 1:
    Average Returns:             0.142380             0.000948            -0.141432
 Standard Deviation:             0.233098             0.001910            -0.231188
           Skewness:             0.029865            34.118681            34.088816
       Sharpe Ratio:             0.610816             0.496211            -0.114605

Decile 2:
    Average Returns:             0.161559             0.000866            -0.160692
 Standard Deviation:             0.270982             0.003311            -0.267671
           Skewness:             0.124398             5.197377             5.072979
       Sharpe Ratio:             0.596197             0.261617            -0.334579

Decile 3:
    Average Returns:             0.167197             0.001423            -0.165774
 Standard Deviation:             0.290885             0.005016            -0.285870
           Skewness:             0.183948  

**Results Comparison**

Ultimately, the results from replicating this strategy using the S&P500 as the universe of assets did not demonstrate abnormal or outperforming returns when compared to buying-and-holding the same asset portfolios. These results are contrary to the original results from the paper.

Though this replication produced a results contrary to the results of the original hypothesis test, these results can be attributed to the differences in NYSE/Amex and the S&P500. We compare the application of the paper’s techniques to its original method using the NYSE/Amex and the replicated techniques using the S&P500.

The NYSE American lists a broad range of companies, including small to mid-cap stocks. The S&P500 is an index that includes the 500 largest companies listed on exchanges in the United States. This means that it is more heavily weighted towards large-cap industry leaders within their asset class.

Small and mid-cap stocks have the chance to exhibit more volatile returns. This is because smaller companies have the potential to exhibit more potential growth. In comparison, the large-cap stocks listed on the S&P500 exhibit lower reactivity, thus leading to potentially more stable moving average portfolios and lower volatilities. Both of these technical analyses indicators were the core emphasis for this trading strategy. Additionally, the S&P500 is subsequently less diversified, as it focuses on the largest stocks in the US. Smaller stocks listed on NYSE/Amex can potentially exhibit macroeconomic reactions to the market.

In conclusion, implementing the moving average timing strategy for volatility decile portfolios would have a less significant impact onto indices with assets of similar traits such as the S&P500. It is noted that the S&P500 was chosen for its computational efficiency given the scope of the project, and data availability, given that the source from the original paper was proprietary. NYSE/Amex volatility deciles were already obtained in their sorted portfolios from the Center for Research in Security Prices, but needed to be recomputed and sorted in replication. Finally, the Python library yfinance limits requests to 2,000 requests per hour per IP, limiting the amount of data that can be acquired from an individual retail perspective.

#Overfitness Assessment

Although the hypothesis test did not return the same results, the results are still tested for overfitness in the same manner as proposed by the original paper. There are two ways that the paper assesses overfitness in order to ensure a more robust and comprehensive hypothesis test.

The first method was to extend the same stest using alternative moving average lag lengths. These moving averages used the time frame of 20, 50, 100, and 200.

**MA(20) Portfolio**

In [None]:
def calculate_moving_average20(data, window=20):
    return data.rolling(window=window).mean()

# Apply the MA strategy
def apply_ma_strategy(prices, ma_prices, risk_free_rate):
    signals = prices.shift(1) > ma_prices.shift(1)
    daily_returns = prices.pct_change()
    strategy_returns = np.where(signals, daily_returns, risk_free_rate)  # Apply risk-free returns when out of the market
    strategy_returns = pd.Series(strategy_returns, index=prices.index)  # Convert numpy array to pandas Series
    return strategy_returns.cumsum()  # Cumulative returns

def implement_ma_strategy_for_deciles(stock_data, deciles, risk_free_rate=0.0001 / 252):
    strategy_results = {}
    for i in range(10):
        decile_stocks = stock_data.columns[deciles == i]
        decile_prices = stock_data[decile_stocks].mean(axis=1)  # Average price for each decile
        ma_prices = calculate_moving_average20(decile_prices)
        strategy_results[f"Decile {i+1}"] = apply_ma_strategy(decile_prices, ma_prices, risk_free_rate)
    return strategy_results

print("MA(20) Return Data:")
if deciles is not None and stock_data is not None:
    strategy_results = implement_ma_strategy_for_deciles(stock_data, deciles)
    for key, value in strategy_results.items():
        print(f"{key}: Last cumulative return: {value.iloc[-1]}")  # Now 'value' is a pandas Series
else:
    print("No data available.")

MA(20) Return Data:
Decile 1: Last cumulative return: 3.2513505444172006
Decile 2: Last cumulative return: 3.2569565559988973
Decile 3: Last cumulative return: 4.431888703850937
Decile 4: Last cumulative return: 4.096161499400203
Decile 5: Last cumulative return: 3.905553091793939
Decile 6: Last cumulative return: 4.316364260435685
Decile 7: Last cumulative return: 2.6169873204573855
Decile 8: Last cumulative return: 3.8244068420207435
Decile 9: Last cumulative return: 5.151978881562911
Decile 10: Last cumulative return: 9.36108466059101


**MA(50) Portfolio**

In [None]:
def calculate_moving_average50(data, window=50):
    return data.rolling(window=window).mean()

# Apply the MA strategy
def apply_ma_strategy(prices, ma_prices, risk_free_rate):
    signals = prices.shift(1) > ma_prices.shift(1)
    daily_returns = prices.pct_change()
    strategy_returns = np.where(signals, daily_returns, risk_free_rate)  # Apply risk-free returns when out of the market
    strategy_returns = pd.Series(strategy_returns, index=prices.index)  # Convert numpy array to pandas Series
    return strategy_returns.cumsum()  # Cumulative returns

def implement_ma_strategy_for_deciles(stock_data, deciles, risk_free_rate=0.0001 / 252):
    strategy_results = {}
    for i in range(10):
        decile_stocks = stock_data.columns[deciles == i]
        decile_prices = stock_data[decile_stocks].mean(axis=1)  # Average price for each decile
        ma_prices = calculate_moving_average50(decile_prices)
        strategy_results[f"Decile {i+1}"] = apply_ma_strategy(decile_prices, ma_prices, risk_free_rate)
    return strategy_results

print("MA(50) Return Data:")
if deciles is not None and stock_data is not None:
    strategy_results = implement_ma_strategy_for_deciles(stock_data, deciles)
    for key, value in strategy_results.items():
        print(f"{key}: Last cumulative return: {value.iloc[-1]}")  # Now 'value' is a pandas Series
else:
    print("No data available.")

MA(50) Return Data:
Decile 1: Last cumulative return: 3.1830539407735445
Decile 2: Last cumulative return: 3.0067136814452278
Decile 3: Last cumulative return: 4.369687863354437
Decile 4: Last cumulative return: 4.083189979965746
Decile 5: Last cumulative return: 4.275888519532658
Decile 6: Last cumulative return: 4.734519340664028
Decile 7: Last cumulative return: 3.625534471395323
Decile 8: Last cumulative return: 3.3989876306092053
Decile 9: Last cumulative return: 5.298297924770311
Decile 10: Last cumulative return: 9.03834261706009


**MA(100) Portfolio**

In [None]:
def calculate_moving_average100(data, window=100):
    return data.rolling(window=window).mean()

# Apply the MA strategy
def apply_ma_strategy(prices, ma_prices, risk_free_rate):
    signals = prices.shift(1) > ma_prices.shift(1)
    daily_returns = prices.pct_change()
    strategy_returns = np.where(signals, daily_returns, risk_free_rate)  # Apply risk-free returns when out of the market
    strategy_returns = pd.Series(strategy_returns, index=prices.index)  # Convert numpy array to pandas Series
    return strategy_returns.cumsum()  # Cumulative returns

def implement_ma_strategy_for_deciles(stock_data, deciles, risk_free_rate=0.0001 / 252):
    strategy_results = {}
    for i in range(10):
        decile_stocks = stock_data.columns[deciles == i]
        decile_prices = stock_data[decile_stocks].mean(axis=1)  # Average price for each decile
        ma_prices = calculate_moving_average100(decile_prices)
        strategy_results[f"Decile {i+1}"] = apply_ma_strategy(decile_prices, ma_prices, risk_free_rate)
    return strategy_results

print("MA(100) Return Data:")
if deciles is not None and stock_data is not None:
    strategy_results = implement_ma_strategy_for_deciles(stock_data, deciles)
    for key, value in strategy_results.items():
        print(f"{key}: Last cumulative return: {value.iloc[-1]}")  # Now 'value' is a pandas Series
else:
    print("No data available.")

MA(100) Return Data:
Decile 1: Last cumulative return: 3.2932045914775134
Decile 2: Last cumulative return: 3.597665967575709
Decile 3: Last cumulative return: 4.165724498624777
Decile 4: Last cumulative return: 4.633507106567501
Decile 5: Last cumulative return: 4.824700502376047
Decile 6: Last cumulative return: 4.695362238985849
Decile 7: Last cumulative return: 3.895736174113976
Decile 8: Last cumulative return: 3.080697254426471
Decile 9: Last cumulative return: 5.619524140451109
Decile 10: Last cumulative return: 7.752378240878296


**MA(200) Portfolio**

In [None]:
def calculate_moving_average200(data, window=200):
    return data.rolling(window=window).mean()

# Apply the MA strategy
def apply_ma_strategy(prices, ma_prices, risk_free_rate):
    signals = prices.shift(1) > ma_prices.shift(1)
    daily_returns = prices.pct_change()
    strategy_returns = np.where(signals, daily_returns, risk_free_rate)  # Apply risk-free returns when out of the market
    strategy_returns = pd.Series(strategy_returns, index=prices.index)  # Convert numpy array to pandas Series
    return strategy_returns.cumsum()  # Cumulative returns

def implement_ma_strategy_for_deciles(stock_data, deciles, risk_free_rate=0.0001 / 252):
    strategy_results = {}
    for i in range(10):
        decile_stocks = stock_data.columns[deciles == i]
        decile_prices = stock_data[decile_stocks].mean(axis=1)  # Average price for each decile
        ma_prices = calculate_moving_average200(decile_prices)
        strategy_results[f"Decile {i+1}"] = apply_ma_strategy(decile_prices, ma_prices, risk_free_rate)
    return strategy_results

print("MA(200) Return Data:")
if deciles is not None and stock_data is not None:
    strategy_results = implement_ma_strategy_for_deciles(stock_data, deciles)
    for key, value in strategy_results.items():
        print(f"{key}: Last cumulative return: {value.iloc[-1]}")  # Now 'value' is a pandas Series
else:
    print("No data available.")

MA(200) Return Data:
Decile 1: Last cumulative return: 3.818912230267257
Decile 2: Last cumulative return: 3.9199394728428785
Decile 3: Last cumulative return: 4.169707833346385
Decile 4: Last cumulative return: 4.275336658973091
Decile 5: Last cumulative return: 4.470790876525682
Decile 6: Last cumulative return: 3.915161491096401
Decile 7: Last cumulative return: 3.2638025573355973
Decile 8: Last cumulative return: 3.8256868800916015
Decile 9: Last cumulative return: 5.114735770237555
Decile 10: Last cumulative return: 5.883137887832329


The paper originally reported that as the moving averages' time horizons increased, the reported returns from the study decreased. Replicating their alternative moving averages' lag lengths produced the same results.

The second assessment for overfitness was using an alternative method to sort and partition the stocks into their decile portfolios. Alongside the volatility, decile portfolios were constructed by sorting the value-weighted market capitalization for all of their assessed stocks. The moving average portfolio strategy was then applied to it. We first partition the decile portfolios according to market cap rather than volatility:

In [None]:
# Function to load stock tickers from a CSV file
def load_stock_tickers(datapath):
    return pd.read_csv(datapath)

# Function to fetch market cap data using yfinance
def fetch_data(tickers):
    market_caps = {}
    for ticker in tickers:
        try:
            market_cap = yf.Ticker(ticker).info.get('marketCap', None)
            market_caps[ticker] = market_cap
        except Exception as e:
            print(f"Failed to fetch data for {ticker}: {e}")
    return pd.DataFrame({'Ticker': tickers, 'MarketCap': [market_caps.get(ticker) for ticker in tickers]})

# Function to sort stocks into market cap deciles
def sort_into_market_cap_deciles(data):
    data = data.dropna(subset=['MarketCap'])  # Remove rows where MarketCap is None
    data.sort_values(by='MarketCap', ascending=False, inplace=True)
    data['MarketCapDecile'] = pd.qcut(data['MarketCap'], 10, labels=range(1, 11))
    return data

# Main function to process stock data
def process_stock_data(datapath):
    tickers_df = load_stock_tickers(datapath)
    tickers = tickers_df['Symbol'].tolist()
    fetched_data = fetch_data(tickers)
    if not fetched_data.empty:
        sorted_data = sort_into_market_cap_deciles(fetched_data)
        return sorted_data
    return pd.DataFrame()  # Return an empty DataFrame if no data is fetched

# Check and print the decile information if available
if not stock_data.empty:
    for decile in range(1, 11):
        decile_tickers = stock_data[stock_data['MarketCapDecile'] == decile]['Ticker']
        print(f"Stocks in Market Cap Decile {decile}:")
        print(decile_tickers.tolist())
else:
    print("No data available.")

Stocks in Market Cap Decile 1:
['CRL', 'AES', 'LW', 'JNPR', 'IPG', 'INCY', 'DVA', 'PODD', 'EMN', 'REG', 'ALLE', 'WYNN', 'HII', 'PAYC', 'UHS', 'SOLV', 'KMX', 'FFIV', 'CPT', 'RL', 'QRVO', 'CTLT', 'MOS', 'BBWI', 'TECH', 'TFX', 'BXP', 'AAL', 'HSIC', 'TPR', 'DAY', 'AIZ', 'MTCH', 'PARA', 'PNW', 'FRT', 'CZR', 'CHRW', 'GNRC', 'BIO', 'NCLH', 'ETSY', 'HAS', 'MKTX', 'BWA', 'RHI', 'FMC', 'MHK', 'IVZ', 'CMA', 'GL']
Stocks in Market Cap Decile 2:
['CFG', 'MRO', 'SWKS', 'WBA', 'BG', 'AKAM', 'MAA', 'NRG', 'ENPH', 'TER', 'NDSN', 'CAG', 'CF', 'DGX', 'TRMB', 'JBL', 'FOXA', 'FOX', 'EPAM', 'SNA', 'POOL', 'NWS', 'NWSA', 'ZBRA', 'KEY', 'SWK', 'TAP', 'HST', 'BEN', 'MGM', 'CPB', 'VTRS', 'ALB', 'UDR', 'PNR', 'AMCR', 'GEN', 'LKQ', 'DOC', 'LNT', 'AOS', 'KIM', 'NI', 'SJM', 'RVTY', 'WRK', 'APA', 'IP', 'JKHY', 'EVRG']
Stocks in Market Cap Decile 3:
['STE', 'AEE', 'K', 'HBAN', 'HRL', 'ILMN', 'TDY', 'PFG', 'APTV', 'CBOE', 'CINF', 'FSLR', 'CCL', 'VRSN', 'DRI', 'OMC', 'CNP', 'J', 'TXT', 'CLX', 'EXPE', 'CMS', 'COO', 'HOL

In [None]:
# Function to load stock tickers from a CSV file
def load_stock_tickers(datapath):
    return pd.read_csv(datapath)

# Function to fetch market cap data using yfinance
def fetch_data(tickers):
    market_caps = {}
    for ticker in tickers:
        try:
            market_cap = yf.Ticker(ticker).info.get('marketCap', None)
            market_caps[ticker] = market_cap
        except Exception as e:
            print(f"Failed to fetch data for {ticker}: {e}")
    return pd.DataFrame({'Ticker': tickers, 'MarketCap': [market_caps.get(ticker) for ticker in tickers]})

# Function to sort stocks into market cap deciles
def sort_into_market_cap_deciles(data):
    data = data.dropna(subset=['MarketCap'])  # Remove rows where MarketCap is None
    data.sort_values(by='MarketCap', ascending=False, inplace=True)
    data['MarketCapDecile'] = pd.qcut(data['MarketCap'], 10, labels=range(1, 11))
    return data

# Function to calculate statistics from data
def calculate_statistics(data):
    daily_returns = data.pct_change()
    portfolio_returns = daily_returns.mean(axis=1)
    mean_returns = portfolio_returns.mean() * 252
    std_dev = portfolio_returns.std() * np.sqrt(252)
    skewness = portfolio_returns.skew()
    sharpe_ratio = mean_returns / std_dev if std_dev != 0 else np.nan
    return mean_returns, std_dev, skewness, sharpe_ratio

# Main function to process stock data
def process_stock_data(datapath, historical_data_path):
    tickers_df = load_stock_tickers(datapath)
    tickers = tickers_df['Symbol'].tolist()
    fetched_data = fetch_data(tickers)
    historical_data = pd.read_csv(historical_data_path, index_col=0)

    if not fetched_data.empty:
        sorted_data = sort_into_market_cap_deciles(fetched_data)
        decile_stats = {}
        for decile in range(1, 11):
            decile_tickers = sorted_data[sorted_data['MarketCapDecile'] == decile]['Ticker'].tolist()
            decile_data = historical_data[decile_tickers]
            if not decile_data.empty:
                stats = calculate_statistics(decile_data)
                decile_stats[f"Decile {decile}"] = stats
            else:
                decile_stats[f"Decile {decile}"] = "No data available."
        return decile_stats
    return {}

datapath = 'https://raw.githubusercontent.com/franco-rey/cfrm-523-replication-project/main/SP500.csv'
historical_data_path = 'https://raw.githubusercontent.com/franco-rey/cfrm-523-replication-project/main/sp500_adj_close.csv'
decile_statistics = process_stock_data(datapath, historical_data_path)

for decile, stats in decile_statistics.items():
    if isinstance(stats, tuple):
        mean_returns, std_dev, skewness, sharpe_ratio = stats
        print(f"\nSummary for Marketp Cap {decile}:")
        print(f"Average Returns: {mean_returns:.6f}")
        print(f"Standard Deviation: {std_dev:.6f}")
        print(f"Skewness: {skewness:.6f}")
        print(f"Sharpe Ratio: {sharpe_ratio:.6f}")
    else:
        print(f"\nSummary for Decile {decile}: {stats}")


Summary for Marketp Cap Decile 1:
Average Returns: 0.203169
Standard Deviation: 0.200430
Skewness: -0.231571
Sharpe Ratio: 1.013665

Summary for Marketp Cap Decile 2:
Average Returns: 0.188167
Standard Deviation: 0.184553
Skewness: -0.347091
Sharpe Ratio: 1.019582

Summary for Marketp Cap Decile 3:
Average Returns: 0.191625
Standard Deviation: 0.181705
Skewness: -0.419724
Sharpe Ratio: 1.054598

Summary for Marketp Cap Decile 4:
Average Returns: 0.227776
Standard Deviation: 0.237059
Skewness: 26.879787
Sharpe Ratio: 0.960840

Summary for Marketp Cap Decile 5:
Average Returns: 0.204059
Standard Deviation: 0.183402
Skewness: -0.400493
Sharpe Ratio: 1.112635

Summary for Marketp Cap Decile 6:
Average Returns: 0.200680
Standard Deviation: 0.181565
Skewness: -0.510929
Sharpe Ratio: 1.105281

Summary for Marketp Cap Decile 7:
Average Returns: 0.197629
Standard Deviation: 0.183237
Skewness: -0.456821
Sharpe Ratio: 1.078539

Summary for Marketp Cap Decile 8:
Average Returns: 0.191918
Standard

We then perform the Moving Average Portfolio for MA(10), MA(20), MA(50), MA(100), and MA(200) to these portfolios

In [None]:
# Function to calculate the moving average for the given data over a specified window
def calculate_moving_average(data, window=10):
    return data.rolling(window=window).mean()

# Function to apply the Moving Average (MA) trading strategy
def apply_ma_strategy(prices, ma_prices, risk_free_rate):
    signals = prices.shift(1) > ma_prices.shift(1)  # True if prices are above the MA of the previous day
    daily_returns = prices.pct_change()
    strategy_returns = np.where(signals, daily_returns, risk_free_rate)  # Apply risk-free returns when the signal is not true
    strategy_returns = pd.Series(strategy_returns, index=prices.index)
    return strategy_returns.cumsum()  # Calculate cumulative returns

# Function to implement the moving average strategy for each market cap decile
def implement_ma_strategy_for_deciles(sorted_data, historical_data, risk_free_rate=0.0001 / 252):
    strategy_results = {}
    for decile in range(1, 11):
        decile_tickers = sorted_data[sorted_data['MarketCapDecile'] == decile]['Ticker'].tolist()
        if decile_tickers:
            decile_prices = historical_data[decile_tickers].mean(axis=1)  # Calculate average prices for the decile
            ma_prices = calculate_moving_average(decile_prices)
            strategy_results[f"Decile {decile}"] = apply_ma_strategy(decile_prices, ma_prices, risk_free_rate)
        else:
            strategy_results[f"Decile {decile}"] = "No data available"
    return strategy_results

# Load and process the stock data
datapath = 'https://raw.githubusercontent.com/franco-rey/cfrm-523-replication-project/main/SP500.csv'
historical_data_path = 'https://raw.githubusercontent.com/franco-rey/cfrm-523-replication-project/main/sp500_adj_close.csv'
tickers_df = load_stock_tickers(datapath)
tickers = tickers_df['Symbol'].tolist()
fetched_data = fetch_data(tickers)
historical_data = pd.read_csv(historical_data_path, index_col=0)

# Sort the data into market cap deciles and apply the moving average strategy
if not fetched_data.empty:
    sorted_data = sort_into_market_cap_deciles(fetched_data)
    strategy_results = implement_ma_strategy_for_deciles(sorted_data, historical_data)
    for key, value in strategy_results.items():
        if isinstance(value, pd.Series):
            print(f"{key}: Last cumulative return: {value.iloc[-1]}")
        else:
            print(f"{key}: {value}")
else:
    print("No data available.")

Decile 1: Last cumulative return: 4.543385627855605
Decile 2: Last cumulative return: 3.940010001932284
Decile 3: Last cumulative return: 3.458903704702103
Decile 4: Last cumulative return: 3.4257126370512
Decile 5: Last cumulative return: 3.900175625099975
Decile 6: Last cumulative return: 4.006173735871204
Decile 7: Last cumulative return: 2.74501971015376
Decile 8: Last cumulative return: 5.034718115316057
Decile 9: Last cumulative return: 5.1451773125227245
Decile 10: Last cumulative return: 2.6108429573064087


In [None]:
# Function to convert cumulative returns to daily returns
def calculate_daily_returns_from_cumulative(cumulative_returns):
    # The first value will be NaN because pct_change has nothing to compare it to
    daily_returns = cumulative_returns.pct_change().fillna(0)
    return daily_returns


# Function to calculate statistics
def calculate_statistics_from_returns(daily_returns):
    mean_returns = daily_returns.mean() * 252
    std_dev = daily_returns.std() * np.sqrt(252)
    skewness = daily_returns.skew()
    sharpe_ratio = mean_returns / std_dev if std_dev != 0 else np.nan
    return mean_returns, std_dev, skewness, sharpe_ratio

# Prepare for the comparison
comparison_results = {}

# Make sure to calculate statistics for both the original and strategy returns
for decile in range(1, 11):
    decile_key = f"Decile {decile}"
    original_stats = decile_statistics[decile_key] if decile_key in decile_statistics else (np.nan, np.nan, np.nan, np.nan)
    strategy_series = strategy_results[decile_key] if decile_key in strategy_results and isinstance(strategy_results[decile_key], pd.Series) else pd.Series(np.nan, index=historical_data.index)

    # Convert cumulative strategy returns to daily returns
    strategy_daily_returns = calculate_daily_returns_from_cumulative(strategy_series)
    strategy_stats = calculate_statistics_from_returns(strategy_daily_returns)

    # Calculate the differences
    differences = tuple(s - o for o, s in zip(original_stats, strategy_stats))

    # Store results
    comparison_results[decile_key] = (original_stats, strategy_stats, differences)

# Printing function
def print_comparative_results(comparison_results):
    metrics = ['Average Returns', 'Standard Deviation', 'Skewness', 'Sharpe Ratio']
    print(f"{'Decile':<10}{'Metric':<20}{'Test 1':<20}{'Test 2':<20}{'Difference':<20}")

    for decile in range(1, 11):
        decile_key = f"Decile {decile}"
        print(f"{decile_key:<10}")
        for i, metric in enumerate(metrics):
            original, strategy, differences = comparison_results[decile_key]
            print(f"{'':<10}{metric:<20}{original[i]:<20.6f}{strategy[i]:<20.6f}{differences[i]:<20.6f}")
        print('-' * 80)

# Run the printing function
print_comparative_results(comparison_results)

Decile    Metric              Test 1              Test 2              Difference          
Decile 1  
          Average Returns     0.203169            -6.541615           -6.744784           
          Standard Deviation  0.200430            49.086749           48.886319           
          Skewness            -0.231571           -95.688716          -95.457145          
          Sharpe Ratio        1.013665            -0.133266           -1.146932           
--------------------------------------------------------------------------------
Decile 2  
          Average Returns     0.188167            31.931501           31.743334           
          Standard Deviation  0.184553            203.988456          203.803903          
          Skewness            -0.347091           104.061960          104.409050          
          Sharpe Ratio        1.019582            0.156536            -0.863046           
------------------------------------------------------------------------------

The inconsistency of these resutls can be attributed to the differences in characteristics between the S&P500 and NYSE/Amex that were mentioned previously.

#Opportunities for Future Research



1.   The first natural opportunity for further research would be to extend this strategy into subsequent markets and asset classes, investigating the cross-sectional profitability of this strategy when expsoed to different conditions.
2.   In the study, the Capital Asset Pricing and Fama-French Three Factor models were the models used to model the returns of the moving average portfolios. The study can be reaffirmed to be made more robust with the assessment of returns across additional pricing and return models. This can include Stephen Ross' Arbitrage Pricing Theory, the later developed Fama-French five factor model (adding profitability and investment), or conditional asset pricing models. The hypothesis and method would remain the same, but would be augmented by verifying the returns under further models.
3.   Using the moving average, and more generally speaking technical analysis, is inherently a trend-following strategy. Because of this trend-following strategy, investment issues that have been investigated around the momentum strategy can also be investigated with the moving average strategy.
4.   Hypotheses for future research tests would remain the same. The fundamental philosophy of this strategy is to partition a market segment or asset class into several portfolios, and apply a chosen trading strategy both using technical analysis. Further research therefore lies in different asset classes or market segments, different models of returns, or different technical analysis methods.