<a href="https://colab.research.google.com/github/piyush-an/INFO7374_Predict_StockPrice/blob/main/2_Feature_Mart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part 2: Using the features/factors you take and discovered, e.g., FRED, Fama-French website, ADS, AR, CAPM, momentum factors, volume, price/return lags, etc.) to construct a feature database
- The target variable Y can be either price or return
- Frequency could be either daily or monthly

## Install Dependency and files

In [1]:
%%bash

pip install pandas yfinance ta seaborn matplotlib pandas-datareader jinja2 fredapi openpyxl xgboost scikit-learn statsmodels mlflow ta

if [ ! -f "ads_index_most_current_vintage.xlsx" ]; then
    wget https://www.philadelphiafed.org/-/media/frbp/assets/surveys-and-data/ads/ads_index_most_current_vintage.xlsx
fi
if [ ! -f "F-F_Research_Data_Factors_daily.CSV" ]; then
  wget https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily_CSV.zip
	unzip F-F_Research_Data_Factors_daily_CSV.zip
fi



You should consider upgrading via the '/Users/anshumankaran/Documents/GitHub/INFO7374_Predict_StockPrice/.env/bin/python3 -m pip install --upgrade pip' command.


In [2]:
import numpy as np
import pandas as pd
from datetime import datetime
import yfinance as yf
import ta
import getpass



In [3]:
# Define dataset start and end date => Two years worth of data
start_date = datetime(2018, 1, 1)
end_date = datetime(2023, 12, 31)

# Downloaded data
NVDA_STOCK = yf.download("NVDA", start_date, end_date)
NVDA_STOCK.describe()

[*********************100%%**********************]  1 of 1 completed


Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,1509.0,1509.0,1509.0,1509.0,1509.0,1509.0
mean,157.530023,160.415691,154.546222,157.608532,157.330305,47684030.0
std,122.302656,124.287872,120.093586,122.260984,122.343514,21494360.0
min,31.622499,32.494999,31.115,31.77,31.523224,9788400.0
25%,59.950001,60.834999,59.125,60.049999,59.654922,33682400.0
50%,131.365005,133.824997,129.482498,131.477493,131.218185,43868800.0
75%,212.0,217.550003,208.110001,212.580002,212.32341,58000800.0
max,502.160004,505.480011,494.119995,504.089996,504.045685,251152800.0


In [4]:
NVDA_STOCK.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-01-02,48.945,49.875,48.625,49.837502,49.312794,35561600
2018-01-03,51.025002,53.424999,50.9375,53.1175,52.558266,91470400
2018-01-04,53.939999,54.512501,53.172501,53.397499,52.835304,58326800
2018-01-05,53.547501,54.227501,52.77,53.849998,53.283047,58012400
2018-01-08,55.099998,56.25,54.645,55.5,54.915672,88121600


In [5]:
NVDA_STOCK["Returns"] = NVDA_STOCK["Adj Close"] - NVDA_STOCK["Adj Close"].shift(1)
NVDA_STOCK.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Returns
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-01-02,48.945,49.875,48.625,49.837502,49.312794,35561600,
2018-01-03,51.025002,53.424999,50.9375,53.1175,52.558266,91470400,3.245472
2018-01-04,53.939999,54.512501,53.172501,53.397499,52.835304,58326800,0.277039
2018-01-05,53.547501,54.227501,52.77,53.849998,53.283047,58012400,0.447742
2018-01-08,55.099998,56.25,54.645,55.5,54.915672,88121600,1.632626


Calculating daily returns:

Produce the day's difference of the stock dataframe: (`np.log(nvda['Open']) - np.log(nvda['Open'].shift(+1))`)

When we take the logarithm of the ratio between today's closing price and yesterday's, we're essentially computing the daily percentage change in the stock price. Using logarithms in return calculations helps us handle the additive nature of log-returns, making overall return calculations more interpretable and facilitating mathematical operations.

In [6]:
# Daily return
NVDA_STOCK["Daily_Return"] = np.log(NVDA_STOCK["Adj Close"]) - np.log(NVDA_STOCK["Adj Close"].shift(1))
NVDA_STOCK = NVDA_STOCK.dropna()
NVDA_STOCK.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Returns,Daily_Return
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2018-01-03,51.025002,53.424999,50.9375,53.1175,52.558266,91470400,3.245472,0.063739
2018-01-04,53.939999,54.512501,53.172501,53.397499,52.835304,58326800,0.277039,0.005257
2018-01-05,53.547501,54.227501,52.77,53.849998,53.283047,58012400,0.447742,0.008439
2018-01-08,55.099998,56.25,54.645,55.5,54.915672,88121600,1.632626,0.030181
2018-01-09,55.555,55.955002,54.66,55.485001,54.900833,49700000,-0.014839,-0.00027


### Feature-set 1: Typical Price, Typical_Price_Return
- `Typical_Price` is the *mean* value of High, Low and Close values

In [7]:
NVDA_STOCK["Typical_Price"] = NVDA_STOCK[["High", "Low", "Close"]].mean(axis=1)
NVDA_STOCK["Typical_Price_Return"] = (
    np.log(NVDA_STOCK.Typical_Price) - np.log(NVDA_STOCK.Typical_Price.shift(+1))
) * 100.0
NVDA_STOCK = NVDA_STOCK.dropna()
NVDA_STOCK.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  NVDA_STOCK["Typical_Price"] = NVDA_STOCK[["High", "Low", "Close"]].mean(axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  NVDA_STOCK["Typical_Price_Return"] = (


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Returns,Daily_Return,Typical_Price,Typical_Price_Return
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2018-01-04,53.939999,54.512501,53.172501,53.397499,52.835304,58326800,0.277039,0.005257,53.694167,2.26182
2018-01-05,53.547501,54.227501,52.77,53.849998,53.283047,58012400,0.447742,0.008439,53.615833,-0.145995
2018-01-08,55.099998,56.25,54.645,55.5,54.915672,88121600,1.632626,0.030181,55.465,3.390777
2018-01-09,55.555,55.955002,54.66,55.485001,54.900833,49700000,-0.014839,-0.00027,55.366667,-0.177445
2018-01-10,54.549999,55.955002,54.0,55.919998,55.331249,58266400,0.430416,0.007809,55.291667,-0.135554


### Feature-set 2: Common Transforms
- `log of volume`
- `pct_change of volume`
- `difference in volume`
- `log of 5 day moving average of volume`
- `Daily volume vs. 200 day moving average`
- `Daily closing price vs. 50 day exponential moving average`

In [8]:
NVDA_STOCK["Volume_Log"] = np.log(NVDA_STOCK.Volume)
NVDA_STOCK["Volume_Differencing"] = NVDA_STOCK.Volume.diff()
NVDA_STOCK["Volume_Differencing_10"] = NVDA_STOCK.Volume.diff(10)
NVDA_STOCK["Volumne_Percent_Change"] = NVDA_STOCK.Volume.pct_change()

In [9]:
# Log of 5 day moving average of volume
NVDA_STOCK["MA_5"] = np.log(NVDA_STOCK.Volume.rolling(5).mean())

# Daily volume vs. 200 day moving average
NVDA_STOCK["Volumne_MA_200"] = (
    NVDA_STOCK.Volume / NVDA_STOCK.Volume.rolling(200).mean() - 1
)

# Daily closing price vs. 50 day Exponential Moving Avg
NVDA_STOCK["Close_EMA_50"] = NVDA_STOCK.Close / NVDA_STOCK.Close.ewm(span=50).mean() - 1

In [10]:
NVDA_STOCK.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Returns,Daily_Return,Typical_Price,Typical_Price_Return,Volume_Log,Volume_Differencing,Volume_Differencing_10,Volumne_Percent_Change,MA_5,Volumne_MA_200,Close_EMA_50
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2018-01-04,53.939999,54.512501,53.172501,53.397499,52.835304,58326800,0.277039,0.005257,53.694167,2.26182,17.881572,,,,,,0.0
2018-01-05,53.547501,54.227501,52.77,53.849998,53.283047,58012400,0.447742,0.008439,53.615833,-0.145995,17.876167,-314400.0,,-0.00539,,,0.004134
2018-01-08,55.099998,56.25,54.645,55.5,54.915672,88121600,1.632626,0.030181,55.465,3.390777,18.294228,30109200.0,,0.519013,,,0.022527
2018-01-09,55.555,55.955002,54.66,55.485001,54.900833,49700000,-0.014839,-0.00027,55.366667,-0.177445,17.721515,-38421600.0,,-0.436007,,,0.016254
2018-01-10,54.549999,55.955002,54.0,55.919998,55.331249,58266400,0.430416,0.007809,55.291667,-0.135554,17.880536,8566400.0,,0.172362,17.950444,,0.018883


### Feature-set 3: Momentum Indicators

**1. AwesomeOscillatorIndicator**

In [11]:
NVDA_STOCK['Momentum_AwesomeOscillatorIndicator'] = ta.momentum.AwesomeOscillatorIndicator(NVDA_STOCK.High, NVDA_STOCK.Low,window1 = 5,window2 = 34, fillna=False).awesome_oscillator()

**2. Kaufman’s Adaptive Moving Average (KAMA)**

In [12]:
NVDA_STOCK['Momentum_KAMA'] = ta.momentum.KAMAIndicator(NVDA_STOCK.Close, fillna=False).kama()

**3. PercentagePriceOscillator**

In [13]:
NVDA_STOCK['Momentum_PercentagePVolumneOscillator'] = ta.momentum.PercentageVolumeOscillator(NVDA_STOCK.Volume, fillna=False).pvo()

**4. Rate of Change (ROC)**

In [14]:
NVDA_STOCK['Momentum_ROC'] = ta.momentum.ROCIndicator(NVDA_STOCK.Close, fillna=False).roc()

**5. Relative Strength Index (RSI)**

In [15]:
NVDA_STOCK['Momentum_RSI'] = ta.momentum.RSIIndicator(NVDA_STOCK.Close, fillna=False).rsi()

**6. Stochastic RSI**

In [16]:
NVDA_STOCK['Momentum_StochRSIIndicator'] = ta.momentum.StochRSIIndicator(NVDA_STOCK.Close, fillna=False).stochrsi()

**7. True strength index (TSI)**

In [17]:
NVDA_STOCK['Momentum_TSIIndicator'] = ta.momentum.TSIIndicator(NVDA_STOCK.Close, fillna=False).tsi()

In [18]:
NVDA_STOCK.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Returns,Daily_Return,Typical_Price,Typical_Price_Return,...,MA_5,Volumne_MA_200,Close_EMA_50,Momentum_AwesomeOscillatorIndicator,Momentum_KAMA,Momentum_PercentagePVolumneOscillator,Momentum_ROC,Momentum_RSI,Momentum_StochRSIIndicator,Momentum_TSIIndicator
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-01-04,53.939999,54.512501,53.172501,53.397499,52.835304,58326800,0.277039,0.005257,53.694167,2.26182,...,,,0.0,,,,,,,
2018-01-05,53.547501,54.227501,52.77,53.849998,53.283047,58012400,0.447742,0.008439,53.615833,-0.145995,...,,,0.004134,,,,,,,
2018-01-08,55.099998,56.25,54.645,55.5,54.915672,88121600,1.632626,0.030181,55.465,3.390777,...,,,0.022527,,,,,,,
2018-01-09,55.555,55.955002,54.66,55.485001,54.900833,49700000,-0.014839,-0.00027,55.366667,-0.177445,...,,,0.016254,,,,,,,
2018-01-10,54.549999,55.955002,54.0,55.919998,55.331249,58266400,0.430416,0.007809,55.291667,-0.135554,...,17.950444,,0.018883,,,,,,,


### Feature-set 4: Trend Indicators

**1. Average Directional Movement Index (ADX)**

In [19]:
NVDA_STOCK['Trend_ADX'] = ta.trend.ADXIndicator(NVDA_STOCK.High,NVDA_STOCK.Low, NVDA_STOCK.Close, window = 20,fillna=False).adx()

**2. Aroon Indicator**

In [20]:
NVDA_STOCK['Trend_AroonIndicator'] = ta.trend.AroonIndicator(NVDA_STOCK.Close, NVDA_STOCK.Low, window=20, fillna=False).aroon_indicator()

**3. Commodity Channel Index (CCI)**

In [21]:
NVDA_STOCK['Trend_CCI'] = ta.trend.CCIIndicator(NVDA_STOCK.High, NVDA_STOCK.Low, NVDA_STOCK.Close, window = 20,fillna=False).cci()

**4. Detrended Price Oscillator (DPO)**

In [22]:
NVDA_STOCK['Trend_DPO'] = ta.trend.DPOIndicator(NVDA_STOCK.Close, window = 20, fillna=False).dpo()

**5. EMA - Exponential Moving Average**

In [23]:
NVDA_STOCK['Trend_EMA'] = ta.trend.EMAIndicator(NVDA_STOCK.Close, window = 20, fillna=False).ema_indicator()

**6. Moving Average Convergence Divergence (MACD)**

In [24]:
NVDA_STOCK['Trend_MACD'] = ta.trend.MACD(NVDA_STOCK.Close, fillna=False).macd()

**7. Mass Index (MI)**

In [25]:
NVDA_STOCK['Trend_MI'] = ta.trend.MassIndex(NVDA_STOCK.High, NVDA_STOCK.Low, fillna=False).mass_index()

In [26]:
NVDA_STOCK.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Returns,Daily_Return,Typical_Price,Typical_Price_Return,...,Momentum_RSI,Momentum_StochRSIIndicator,Momentum_TSIIndicator,Trend_ADX,Trend_AroonIndicator,Trend_CCI,Trend_DPO,Trend_EMA,Trend_MACD,Trend_MI
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-01-04,53.939999,54.512501,53.172501,53.397499,52.835304,58326800,0.277039,0.005257,53.694167,2.26182,...,,,,0.0,,,,,,
2018-01-05,53.547501,54.227501,52.77,53.849998,53.283047,58012400,0.447742,0.008439,53.615833,-0.145995,...,,,,0.0,,,,,,
2018-01-08,55.099998,56.25,54.645,55.5,54.915672,88121600,1.632626,0.030181,55.465,3.390777,...,,,,0.0,,,,,,
2018-01-09,55.555,55.955002,54.66,55.485001,54.900833,49700000,-0.014839,-0.00027,55.366667,-0.177445,...,,,,0.0,,,,,,
2018-01-10,54.549999,55.955002,54.0,55.919998,55.331249,58266400,0.430416,0.007809,55.291667,-0.135554,...,,,,0.0,,,,,,


### Feature-set 5: Volumne Indicator

**1. Chaikin Money Flow (CMF)**

In [27]:
NVDA_STOCK['Volumne_CMF'] = ta.volume.ChaikinMoneyFlowIndicator(NVDA_STOCK.High,NVDA_STOCK.Low,NVDA_STOCK.Close, NVDA_STOCK.Volume,window = 20,fillna=False).chaikin_money_flow()

**2. Ease of movement (EoM, EMV)**

In [28]:
NVDA_STOCK['Volumne_EOM'] = ta.volume.EaseOfMovementIndicator(NVDA_STOCK.High,NVDA_STOCK.Low, NVDA_STOCK.Volume,window = 20,fillna=False).ease_of_movement()

**3. Force Index (FI)**

In [29]:
NVDA_STOCK['Volumne_FI'] = ta.volume.ForceIndexIndicator(NVDA_STOCK.Close, NVDA_STOCK.Volume,window = 20,fillna=False).force_index()

**4. Money Flow Index (MFI)**

In [30]:
NVDA_STOCK['Volumne_MFI'] = ta.volume.money_flow_index(NVDA_STOCK.High, NVDA_STOCK.Low, NVDA_STOCK.Close,NVDA_STOCK.Volume, window=20, fillna=False)

**5. Volume Weighted Average Price (VWAP)**

In [31]:
NVDA_STOCK['Volumne_VWAP'] = ta.volume.VolumeWeightedAveragePrice(NVDA_STOCK.High, NVDA_STOCK.Low, NVDA_STOCK.Close,NVDA_STOCK.Volume, window=20, fillna=False).volume_weighted_average_price()

In [32]:
NVDA_STOCK.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Returns,Daily_Return,Typical_Price,Typical_Price_Return,...,Trend_CCI,Trend_DPO,Trend_EMA,Trend_MACD,Trend_MI,Volumne_CMF,Volumne_EOM,Volumne_FI,Volumne_MFI,Volumne_VWAP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-01-04,53.939999,54.512501,53.172501,53.397499,52.835304,58326800,0.277039,0.005257,53.694167,2.26182,...,,,,,,,,,,
2018-01-05,53.547501,54.227501,52.77,53.849998,53.283047,58012400,0.447742,0.008439,53.615833,-0.145995,...,,,,,,,-0.863636,,,
2018-01-08,55.099998,56.25,54.645,55.5,54.915672,88121600,1.632626,0.030181,55.465,3.390777,...,,,,,,,3.549348,,,
2018-01-09,55.555,55.955002,54.66,55.485001,54.900833,49700000,-0.014839,-0.00027,55.366667,-0.177445,...,,,,,,,-0.364788,,,
2018-01-10,54.549999,55.955002,54.0,55.919998,55.331249,58266400,0.430416,0.007809,55.291667,-0.135554,...,,,,,,,-1.107243,,,


### Feature-set 6: Volatility Indicators

**1. Average True Range (ATR)**

In [33]:
NVDA_STOCK['Volatility_ATR'] = ta.volatility.AverageTrueRange(NVDA_STOCK.High, NVDA_STOCK.Low, NVDA_STOCK.Close, window=20, fillna=False).average_true_range()

**2. Bollinger Bands**

In [34]:
NVDA_STOCK['Volatility_BB'] = ta.volatility.BollingerBands(NVDA_STOCK.Close, window=20, fillna=False).bollinger_wband()

**3. Donchian Channel**

In [35]:
NVDA_STOCK['Volatility_DonchainChannel'] = ta.volatility.DonchianChannel(NVDA_STOCK.High, NVDA_STOCK.Low,NVDA_STOCK.Close, window=20, fillna=False).donchian_channel_wband()

**4. Ulcer Index**

In [36]:
NVDA_STOCK['Volatility_UlcerIndex'] = ta.volatility.UlcerIndex(NVDA_STOCK.Close, window=20, fillna=False).ulcer_index()

**5. Keltner channel (KC)**

In [37]:
NVDA_STOCK['Volatility_KeltnerChannel'] = ta.volatility.keltner_channel_hband(NVDA_STOCK.High, NVDA_STOCK.Low,NVDA_STOCK.Close, window=20, fillna=False)

In [38]:
NVDA_STOCK.dropna(inplace = True)

In [39]:
NVDA_STOCK.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Returns,Daily_Return,Typical_Price,Typical_Price_Return,...,Volumne_CMF,Volumne_EOM,Volumne_FI,Volumne_MFI,Volumne_VWAP,Volatility_ATR,Volatility_BB,Volatility_DonchainChannel,Volatility_UlcerIndex,Volatility_KeltnerChannel
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-10-18,61.465,61.852501,59.272499,59.8825,59.357967,52402000,-0.874771,-0.01463,60.335833,-1.370346,...,-0.166595,-3.975714,-25152140.0,45.203574,65.418863,2.260525,26.135972,22.298079,10.741495,67.986543
2018-10-19,60.439999,60.637501,56.924999,57.2925,56.790657,61360800,-2.56731,-0.044215,58.285,-3.458141,...,-0.18975,-10.777065,-37892360.0,44.660537,64.892034,2.333124,28.543358,24.96159,11.595971,67.696292
2018-10-22,57.82,58.830002,56.767502,57.805,57.298664,36884400,0.508007,0.008905,57.800835,-0.834155,...,-0.21926,-5.493928,-32483250.0,44.509744,64.570526,2.319593,30.310946,25.371115,12.354732,67.307376
2018-10-23,55.107498,56.047501,54.177502,55.264999,54.780914,62643600,-2.51775,-0.044935,55.163334,-4.670472,...,-0.216777,-8.018833,-44543400.0,40.195354,63.871194,2.384988,32.945426,29.643461,13.380735,66.757917
2018-10-24,54.877499,55.3475,49.712502,49.852501,49.415825,88428800,-5.365089,-0.103071,51.637501,-6.605032,...,-0.269425,-16.456613,-85884100.0,35.541378,62.703614,2.547489,38.472874,37.093141,15.020365,66.224375


In [40]:
NVDA_STOCK.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1308 entries, 2018-10-18 to 2023-12-29
Data columns (total 41 columns):
 #   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   Open                                   1308 non-null   float64
 1   High                                   1308 non-null   float64
 2   Low                                    1308 non-null   float64
 3   Close                                  1308 non-null   float64
 4   Adj Close                              1308 non-null   float64
 5   Volume                                 1308 non-null   int64  
 6   Returns                                1308 non-null   float64
 7   Daily_Return                           1308 non-null   float64
 8   Typical_Price                          1308 non-null   float64
 9   Typical_Price_Return                   1308 non-null   float64
 10  Volume_Log                             1308 non-null  

### Feature-set 7: Fama-French Indicators

In [41]:
df_fama = pd.read_csv("./F-F_Research_Data_Factors_daily.CSV", skiprows=3)
df_fama = df_fama.iloc[:-1]
df_fama.rename(columns={"Unnamed: 0": "Date"}, inplace=True)
df_fama["Date"] = pd.to_datetime(df_fama["Date"])
df_fama = df_fama[(df_fama["Date"] >= start_date) & (df_fama["Date"] <= end_date)]
fama = df_fama.set_index("Date")

In [42]:
fama.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1489 entries, 2018-01-02 to 2023-11-30
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Mkt-RF  1489 non-null   float64
 1   SMB     1489 non-null   float64
 2   HML     1489 non-null   float64
 3   RF      1489 non-null   float64
dtypes: float64(4)
memory usage: 58.2 KB


In [43]:
NVDA_STOCK = pd.concat([NVDA_STOCK, fama], axis=1)
NVDA_STOCK.dropna(inplace=True)
NVDA_STOCK

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Returns,Daily_Return,Typical_Price,Typical_Price_Return,...,Volumne_VWAP,Volatility_ATR,Volatility_BB,Volatility_DonchainChannel,Volatility_UlcerIndex,Volatility_KeltnerChannel,Mkt-RF,SMB,HML,RF
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-10-18,61.465000,61.852501,59.272499,59.882500,59.357967,52402000.0,-0.874771,-0.014630,60.335833,-1.370346,...,65.418863,2.260525,26.135972,22.298079,10.741495,67.986543,-1.54,-0.54,0.42,0.008
2018-10-19,60.439999,60.637501,56.924999,57.292500,56.790657,61360800.0,-2.567310,-0.044215,58.285000,-3.458141,...,64.892034,2.333124,28.543358,24.961590,11.595971,67.696292,-0.25,-1.33,0.71,0.008
2018-10-22,57.820000,58.830002,56.767502,57.805000,57.298664,36884400.0,0.508007,0.008905,57.800835,-0.834155,...,64.570526,2.319593,30.310946,25.371115,12.354732,67.307376,-0.38,0.48,-1.25,0.008
2018-10-23,55.107498,56.047501,54.177502,55.264999,54.780914,62643600.0,-2.517750,-0.044935,55.163334,-4.670472,...,63.871194,2.384988,32.945426,29.643461,13.380735,66.757917,-0.62,-0.10,-0.41,0.008
2018-10-24,54.877499,55.347500,49.712502,49.852501,49.415825,88428800.0,-5.365089,-0.103071,51.637501,-6.605032,...,62.703614,2.547489,38.472874,37.093141,15.020365,66.224375,-3.33,-0.93,0.77,0.008
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-11-24,484.700012,489.209991,477.450012,477.760010,477.718018,29464500.0,-9.399170,-0.019484,481.473338,-1.579108,...,464.689991,14.943944,27.269391,24.349504,6.059404,476.756838,0.11,0.59,0.19,0.021
2023-11-27,478.000000,485.299988,476.519989,482.420013,482.377594,39566200.0,4.659576,0.009707,481.413330,-0.012464,...,468.258318,14.635746,24.524721,24.148394,5.311905,480.384171,-0.23,-0.11,-0.08,0.021
2023-11-28,482.359985,483.230011,474.730011,478.209991,478.167938,40149100.0,-4.209656,-0.008765,478.723338,-0.560337,...,471.260490,14.328959,21.710106,23.978031,4.684455,483.534838,0.06,-0.29,0.05,0.021
2023-11-29,483.790009,487.619995,478.600006,481.399994,481.357666,38200500.0,3.189728,0.006649,482.539998,0.794097,...,475.865551,14.083011,17.658367,20.347052,3.786048,487.140170,0.01,0.44,0.69,0.021


### Feature-set 8: Exracting external factors using Fred API

In [44]:
from fredapi import Fred
key = getpass.getpass()
fred = Fred(api_key=key)

In [45]:
# Japanese Yen to U.S. Dollar Spot Exchange Rate
# U.S. Dollars to Euro Spot Exchange Rate
# Coinbase Bitcoin
feat_list = ["SP500", "DEXJPUS", "DEXUSEU", "CBBTCUSD"]
feat_df = pd.DataFrame()
for feat in feat_list:
    feature = fred.get_series(feat, start_date, end_date)
    feature = feature.to_frame(feat)
    feature.dropna(inplace=True)
    feat_df = pd.concat([feat_df, feature], axis=1)
feat_df.dropna(inplace=True)
feat_df

Unnamed: 0,SP500,DEXJPUS,DEXUSEU,CBBTCUSD
2018-01-02,2695.81,112.18,1.2050,14781.51
2018-01-03,2713.06,112.28,1.2030,15098.14
2018-01-04,2723.99,112.78,1.2064,15144.99
2018-01-05,2743.15,113.18,1.2039,16960.01
2018-01-08,2747.71,113.08,1.1973,14993.74
...,...,...,...,...
2023-12-22,4754.63,142.60,1.1008,44015.60
2023-12-26,4774.75,142.48,1.1035,42520.26
2023-12-27,4781.58,142.05,1.1114,43444.45
2023-12-28,4783.35,141.08,1.1073,42613.04


In [46]:
NVDA_STOCK = pd.concat([NVDA_STOCK, feat_df], axis=1)
NVDA_STOCK.dropna(inplace=True)
NVDA_STOCK

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Returns,Daily_Return,Typical_Price,Typical_Price_Return,...,Volatility_UlcerIndex,Volatility_KeltnerChannel,Mkt-RF,SMB,HML,RF,SP500,DEXJPUS,DEXUSEU,CBBTCUSD
2018-10-18,61.465000,61.852501,59.272499,59.882500,59.357967,52402000.0,-0.874771,-0.014630,60.335833,-1.370346,...,10.741495,67.986543,-1.54,-0.54,0.42,0.008,2768.78,112.11,1.1494,6394.96
2018-10-19,60.439999,60.637501,56.924999,57.292500,56.790657,61360800.0,-2.567310,-0.044215,58.285000,-3.458141,...,11.595971,67.696292,-0.25,-1.33,0.71,0.008,2767.78,112.52,1.1513,6382.99
2018-10-22,57.820000,58.830002,56.767502,57.805000,57.298664,36884400.0,0.508007,0.008905,57.800835,-0.834155,...,12.354732,67.307376,-0.38,0.48,-1.25,0.008,2755.88,112.78,1.1467,6407.65
2018-10-23,55.107498,56.047501,54.177502,55.264999,54.780914,62643600.0,-2.517750,-0.044935,55.163334,-4.670472,...,13.380735,66.757917,-0.62,-0.10,-0.41,0.008,2740.69,112.12,1.1480,6395.14
2018-10-24,54.877499,55.347500,49.712502,49.852501,49.415825,88428800.0,-5.365089,-0.103071,51.637501,-6.605032,...,15.020365,66.224375,-3.33,-0.93,0.77,0.008,2656.10,112.58,1.1389,6415.98
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-11-24,484.700012,489.209991,477.450012,477.760010,477.718018,29464500.0,-9.399170,-0.019484,481.473338,-1.579108,...,6.059404,476.756838,0.11,0.59,0.19,0.021,4559.34,149.57,1.0934,37745.94
2023-11-27,478.000000,485.299988,476.519989,482.420013,482.377594,39566200.0,4.659576,0.009707,481.413330,-0.012464,...,5.311905,480.384171,-0.23,-0.11,-0.08,0.021,4550.43,148.89,1.0937,37244.39
2023-11-28,482.359985,483.230011,474.730011,478.209991,478.167938,40149100.0,-4.209656,-0.008765,478.723338,-0.560337,...,4.684455,483.534838,0.06,-0.29,0.05,0.021,4554.89,147.41,1.1007,37840.46
2023-11-29,483.790009,487.619995,478.600006,481.399994,481.357666,38200500.0,3.189728,0.006649,482.539998,0.794097,...,3.786048,487.140170,0.01,0.44,0.69,0.021,4550.58,147.39,1.0969,37864.76


### Feature-set 8: ADS features

In [47]:
ads = pd.read_excel("ads_index_most_current_vintage.xlsx")
ads.rename(columns={"Unnamed: 0": "Date"}, inplace=True)
ads["Date"] = pd.to_datetime(ads["Date"], format="%Y:%m:%d")
ads = ads[(ads["Date"] >= start_date) & (ads["Date"] <= end_date)]
ads = ads.set_index("Date")
ads

Unnamed: 0_level_0,ADS_Index
Date,Unnamed: 1_level_1
2018-01-01,-0.261093
2018-01-02,-0.284242
2018-01-03,-0.305222
2018-01-04,-0.324039
2018-01-05,-0.340703
...,...
2023-12-19,-0.005191
2023-12-20,-0.006551
2023-12-21,-0.007530
2023-12-22,-0.008130


In [48]:
ads.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2183 entries, 2018-01-01 to 2023-12-23
Data columns (total 1 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   ADS_Index  2183 non-null   float64
dtypes: float64(1)
memory usage: 34.1 KB


In [49]:
NVDA_STOCK = pd.concat([NVDA_STOCK, ads], axis=1)
NVDA_STOCK.dropna(inplace=True)
NVDA_STOCK

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Returns,Daily_Return,Typical_Price,Typical_Price_Return,...,Volatility_KeltnerChannel,Mkt-RF,SMB,HML,RF,SP500,DEXJPUS,DEXUSEU,CBBTCUSD,ADS_Index
2018-10-18,61.465000,61.852501,59.272499,59.882500,59.357967,52402000.0,-0.874771,-0.014630,60.335833,-1.370346,...,67.986543,-1.54,-0.54,0.42,0.008,2768.78,112.11,1.1494,6394.96,-0.479897
2018-10-19,60.439999,60.637501,56.924999,57.292500,56.790657,61360800.0,-2.567310,-0.044215,58.285000,-3.458141,...,67.696292,-0.25,-1.33,0.71,0.008,2767.78,112.52,1.1513,6382.99,-0.479024
2018-10-22,57.820000,58.830002,56.767502,57.805000,57.298664,36884400.0,0.508007,0.008905,57.800835,-0.834155,...,67.307376,-0.38,0.48,-1.25,0.008,2755.88,112.78,1.1467,6407.65,-0.471985
2018-10-23,55.107498,56.047501,54.177502,55.264999,54.780914,62643600.0,-2.517750,-0.044935,55.163334,-4.670472,...,66.757917,-0.62,-0.10,-0.41,0.008,2740.69,112.12,1.1480,6395.14,-0.468746
2018-10-24,54.877499,55.347500,49.712502,49.852501,49.415825,88428800.0,-5.365089,-0.103071,51.637501,-6.605032,...,66.224375,-3.33,-0.93,0.77,0.008,2656.10,112.58,1.1389,6415.98,-0.465132
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-11-24,484.700012,489.209991,477.450012,477.760010,477.718018,29464500.0,-9.399170,-0.019484,481.473338,-1.579108,...,476.756838,0.11,0.59,0.19,0.021,4559.34,149.57,1.0934,37745.94,-0.037251
2023-11-27,478.000000,485.299988,476.519989,482.420013,482.377594,39566200.0,4.659576,0.009707,481.413330,-0.012464,...,480.384171,-0.23,-0.11,-0.08,0.021,4550.43,148.89,1.0937,37244.39,-0.016501
2023-11-28,482.359985,483.230011,474.730011,478.209991,478.167938,40149100.0,-4.209656,-0.008765,478.723338,-0.560337,...,483.534838,0.06,-0.29,0.05,0.021,4554.89,147.41,1.1007,37840.46,-0.010576
2023-11-29,483.790009,487.619995,478.600006,481.399994,481.357666,38200500.0,3.189728,0.006649,482.539998,0.794097,...,487.140170,0.01,0.44,0.69,0.021,4550.58,147.39,1.0969,37864.76,-0.005185


Saving the final dataframe as the *feature mart.*

In [50]:
filename = f"NVDA_feature_mart.csv"
NVDA_STOCK.to_csv(filename, index=True)