## Hidden Markov Models for Market Regimes

#### Hidden Markov Models (HMMs)

__Hidden Markov Models (HMMs)__ in market analysis assume financial markets operate in __distinct regimes (states)__ like "bull", "bear", or "volatile". 

***While the true regime is hidden, it can be inferred from observable market data like returns and volatility.***

__The model:__
- Has ***states*** that transition according to ***Markov probabilities***
- Each state has its own distribution parameters (mean, variance) for market variables
- Helps *identify current market regime* and *predict regime shifts*
- Common uses: Asset allocation, risk management, trading strategy adaptation

*The code of the following example specifically fits a 4-state model with multivariate Gaussian emissions, suggesting we're tracking multiple market variables simultaneously.*

#### Markov Probabilities

__Markov probabilities__ describe how likely the market is to switch between different regimes from one period to the next. 

__Key properties:__

1. Only depends on current state (memoryless)
2. Represented in transition matrix where P[i,j] = probability of moving from state i to state j
3. Tends to show persistence (diagonal probabilities often highest)

__For example__, if market is in a bull regime, ***Markov probabilities*** tell us:
- Likelihood it stays bullish next period
- Likelihood it transitions to bear/volatile regime
- Independent of how it got into bull regime

In [10]:
import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt

#### Data Management

In [11]:
# Data Extraction
start_date = "2017-01-1"
end_date = "2022-06-1"
# ticker: SPDR S&P 500 ETF Trust (SPY)
# https://finance.yahoo.com/quote/SPY/
symbol = "SPY"
data = yf.download(symbol, start=start_date, end=end_date)

[*********************100%***********************]  1 of 1 completed


In [12]:
# data = data[["Open", "High", "Low", "Adj Close", "Volume"]] # not sure why the course author has this here
# data.head()

In [13]:
# explicitly renaming the columns
# when we want to add columns at a later step with loc and a conditional we're running into an index error 
# if we stay with two index levels (or a nested index)
data.columns = ["Open", "High", "Low", "Adj Close", "Volume"]
data.head()

Unnamed: 0_level_0,Open,High,Low,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-01-03,197.288849,197.80563,196.097618,197.113657,91366500
2017-01-04,198.462555,198.611457,197.612926,197.62168,78744400
2017-01-05,198.304886,198.462556,197.499055,198.191027,78379000
2017-01-06,199.014389,199.487372,197.866942,198.418767,71559900
2017-01-09,198.357437,198.89174,198.322393,198.751592,46939700


In [14]:
# Add Returns and Range
df = data.copy()
df["Returns"] = (df["Adj Close"] / df["Adj Close"].shift(1)) - 1
df["Range"] = (df["High"] / df["Low"]) - 1
df.dropna(inplace=True)
df.head()

Unnamed: 0_level_0,Open,High,Low,Adj Close,Volume,Returns,Range
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-01-04,198.462555,198.611457,197.612926,197.62168,78744400,0.002577,0.005053
2017-01-05,198.304886,198.462556,197.499055,198.191027,78379000,0.002881,0.004879
2017-01-06,199.014389,199.487372,197.866942,198.418767,71559900,0.001149,0.008189
2017-01-09,198.357437,198.89174,198.322393,198.751592,46939700,0.001677,0.002871
2017-01-10,198.357437,199.224574,197.963269,198.374946,63771900,-0.001895,0.006371


In [15]:
# Structure Data
X_train = df[["Returns", "Range"]]
X_train.head()

Unnamed: 0_level_0,Returns,Range
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-04,0.002577,0.005053
2017-01-05,0.002881,0.004879
2017-01-06,0.001149,0.008189
2017-01-09,0.001677,0.002871
2017-01-10,-0.001895,0.006371


#### HMM Learning

The [hmmlearn](https://hmmlearn.readthedocs.io/en/latest/) package provides simple algorithms and models to learn HMMs (Hidden Markov Models) in Python. It's use case is unsupervised learning and inference of Hidden Markov Models.

`hmmlearn` implements the Hidden Markov Models (HMMs). The HMM is a generative probabilistic model, in which a sequence of observable
X variables is generated by a sequence of internal hidden states Z. The hidden states are not observed directly. The transitions between hidden states are assumed to have the form of a (first-order) Markov chain. They can be specified by the start probability vector π and a transition probability matrix A. The emission probability of an observable can be any distribution with parameters Ø conditioned on the current hidden state. The HMM is completely determined by π, A and Ø (https://hmmlearn.readthedocs.io/en/latest/tutorial.html).

__Markov chains__: Mathematical systems where future states depend only on the current state, not past history. Transitions between states follow fixed probabilities.

__Emission probability__: For each hidden state (market regime), probability of observing specific market data (e.g., returns). In our case, ***modeled as Gaussian distributions with state-specific means and covariances***.

*Example: Bull market (hidden state) more likely to "emit" positive returns (observable), bear market more likely to emit negative returns.*

In [16]:
# from pyhhmm.gaussian import GaussianHMM does not work on M1 mac
# using hmmlearn instead
from hmmlearn.hmm import GaussianHMM

The code to train the model in the online course is as follows:

`model = GaussianHMM(n_states=4, covariance_type='full', n_emissions=2)`
<br/>
`model.train([np.array(X_train.values)])`

With the help of Claude.ai it got rewritten to:

`model = GaussianHMM(n_components=4, covariance_type='full')`
<br/>
`model.fit(X_train.values)`


__Key changes:__

- n_states → n_components
- Remove n_emissions (inferred from data)
- train() → fit()
- No need to wrap data in list/array

In [17]:
# Train Model
model = GaussianHMM(n_components=4, covariance_type='full')
model.fit(X_train.values)

In [18]:
# Check Results
# hidden_states = model.predict([X_train.values])[0] # code for pyhhmm.gaussian
hidden_states = model.predict(X_train.values) # adapted by Claude.ai
# No need to wrap data in list since hmmlearn works directly with numpy arrays.
print(hidden_states[:40])
len(hidden_states)

[3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 3 3 3]


1361

In [19]:
# Regime state means for each feature
# model.means

# Means for each state/feature
model.means_

array([[-0.00230912,  0.01968386],
       [-0.00074447,  0.01836595],
       [-0.00673463,  0.0456571 ],
       [ 0.00171039,  0.00723397]])

In [20]:
# Regime state covars for each feature
# model.covars

# Covariance matrices for each state
model.covars_

array([[[2.65668325e-04, 5.29844006e-05],
        [5.29844006e-05, 1.23343174e-04]],

       [[2.67213590e-04, 3.26599185e-05],
        [3.26599185e-05, 1.14276983e-04]],

       [[1.46232309e-03, 1.32992146e-04],
        [1.32992146e-04, 5.24792884e-04]],

       [[4.18100451e-05, 1.02779628e-05],
        [1.02779628e-05, 2.19796133e-05]]])