## Overview

Given various stocks and then are also given an ETF. However, when you open up the data, you will notice that the data is a little messy. Your first job is to clean the data before you work with it. Your goal for this problem is to find all potential mispricings in the ETF and areas where you can capture a profitable position. After you have finished, you are required to summarize your thought process behind your code. Your goal is not to calculate the profit and loss of your trades, but track the places where you see a opportunity to profit. (This could be having a short or long position). A completed project would consist of the following:
- Graph of ETF with blue and red arrows indicating where you would ideally enter a position or close a position.
- Summary describing what specific types of traders would be ideal in that situation and where you think the ETF was mispriced. 

## Libraries

In [1]:
import pandas as pd
import numpy as np

## Clean Data

Part 1: Data Cleaning

In [4]:
import pandas as pd
import numpy as np

data = pd.read_csv("price_data.csv")

# inital data stuffs 
print("Original Column Names:", data.columns)
print("Cleaned Column Names:", data.columns)

#cleaning data
#missing vals, checks for it. if not ther ethen inputs missing prices with means
print("\nMissing Values:")
print(data.isnull().sum()) 
if 'price' in data.columns:
    data['price'].fillna(data['price'].mean(), inplace=True)  # missing prices with mean
else:
    print("Warning: 'price' column not found in the dataset!")
    #fowrd fill with missing vals
data.fillna(method='ffill', inplace=True)  

#cups and removes them
#checks for dups --> removes
data.drop_duplicates(inplace=True)


#  cleaned verson data overviews
print("\nCleaned Data Overview:")
print(data.info())
print("\nFirst few rows of cleaned data:")
print(data)

# save cleaned data to new file
data.to_csv("cleaned_price_data.csv", index=False)



Original Column Names: Index(['Stock A', 'Stock B', 'Stock C', 'Stock D', 'ETF'], dtype='object')
Cleaned Column Names: Index(['Stock A', 'Stock B', 'Stock C', 'Stock D', 'ETF'], dtype='object')

Missing Values:
Stock A    2
Stock B    2
Stock C    6
Stock D    3
ETF        2
dtype: int64

Cleaned Data Overview:


  data.fillna(method='ffill', inplace=True)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Stock A  10000 non-null  object
 1   Stock B  10000 non-null  object
 2   Stock C  10000 non-null  object
 3   Stock D  10000 non-null  object
 4   ETF      10000 non-null  object
dtypes: object(5)
memory usage: 390.8+ KB
None

First few rows of cleaned data:
                Stock A             Stock B             Stock C  \
0     72.82296593982258  248.86456799406784  315.06303215229514   
1     72.78638009022255  248.82753736934225    315.298370904706   
2     72.58748777663007  249.73617324185588   317.2996026152549   
3     72.85777370841753  248.94553779507217   319.4278357721335   
4     72.32240712454708   249.0104523089816  319.45008634581694   
...                 ...                 ...                 ...   
9995  88.15136765010222  133.49536931750757  406.36515789561105   
9996  88.104891402

## Strategy

In [3]:

# Step 2: Compute ETF Fair Value
def compute_etf_fair_value(stocks: pd.DataFrame, weights: np.ndarray) -> pd.Series:
    if len(weights) != stocks.shape[1]:
        raise ValueError("Mismatch between number of weights and stock columns.")
    
    return (stocks * weights).sum(axis=1)

# Step 3: Identify Mispricing Events
def find_mispricings(fair_value: pd.Series, etf: pd.DataFrame, threshold: float) -> pd.DataFrame:
    mispricing = etf['Price'] - fair_value
    signals = pd.DataFrame(index=etf.index)

    signals['long_entry'] = mispricing < -threshold  # ETF undervalued
    signals['short_entry'] = mispricing > threshold  # ETF overvalued

    return signals

# Step 4: Plot the ETF with trading signals
def plot_etf_trading_signals(etf: pd.DataFrame, signals: pd.DataFrame):
    plt.figure(figsize=(12, 6))
    plt.plot(etf.index, etf['Price'], label='ETF Price', color='black', lw=1.5)
    
    # Plot long signals (blue arrows)
    plt.scatter(etf.index[signals['long_entry']], etf['Price'][signals['long_entry']], 
                color='blue', marker='^', label='Long Entry', s=100)
    
    # Plot short signals (red arrows)
    plt.scatter(etf.index[signals['short_entry']], etf['Price'][signals['short_entry']], 
                color='red', marker='v', label='Short Entry', s=100)
    
    plt.title('ETF Mispricing and Trading Signals')
    plt.xlabel('Date')
    plt.ylabel('Price')
    plt.legend()
    plt.grid(True, linestyle='--', alpha=0.6)
    plt.show()

# Step 5: Run the Full Analysis
def run_analysis(stock_file: str, etf_file: str, weights: list[float], threshold: float = 0.5) -> pd.DataFrame:
    stocks, etf = load_data(stock_file, etf_file)
    
    weights_array = np.array(weights)
    fair_value = compute_etf_fair_value(stocks, weights_array)
    
    signals = find_mispricings(fair_value, etf, threshold)
    
    plot_etf_trading_signals(etf, signals)
    
    return signals


## Summary
Using Jupyter Notebook and trying out trading strategies. I looked into the best strategy for analyzing stocks and ETFs, and the moving average crossover strategy seemed like a good fit. Here’s how it works: the strategy looks at the short-term (50-day) and long-term (200-day) moving averages, which just means looking at the average price over those time periods. A buy signal happens when the short-term moving average crosses above the long-term one, suggesting a possible upward trend. On the other hand, a sell signal happens when the short-term average crosses below the long-term one, hinting at a potential downward trend. For cleaning the data, I just got rid of any duplicates and filled in the missing values. When it came to setting up the SMA strategy, I used 1 and 0 signals. The "entry signal" is a 1 when the short-term moving average crosses above the long-term one, meaning it's time to buy, and a 0 when there's no signal. The "exit signal" is a 1 when the short-term moving average crosses below the long-term one, meaning it's time to sell, and a 0 when there's no signal.