### Statistical Inference of Hidden Markov Models on High Frequency Quote Data

Benchmarking PSG performance of statistical inference against HMMLearn 

References:

PSG: http://aorda.com/html/PSG_Help_HTML/index.html?hmm_normal.htm

hmmlearn: https://hmmlearn.readthedocs.io/en/latest/index.html



In [1]:
import pandas as pd
import numpy as np
from hmmlearn.hmm import GaussianHMM
import matplotlib.pyplot as plt

import os
os.add_dll_directory('C:\Aorda\PSG\lib')
import psgpython as psg 
from psg_loader import load_psg

load_psg()

Inputting features removing duplicated values within each observation


In [2]:

features=pd.read_csv('data/agg_features/grouped_features_2020-01-02.csv',index_col=0)
frac=0.75



def remove_duplicates(series):
    
    cleaned_series=series[np.insert(np.diff(series).astype(bool), 0, True)]
    dropped_els=len(series)-len(cleaned_series)
    
    print(f"Dropped {dropped_els} of original {len(series)} consecutive repeated values from input series")
    return cleaned_series

bidsize=remove_duplicates(features['Bid_Size'].values)
offersize=remove_duplicates(features['Offer_Size'].values)
bookimbalance=remove_duplicates(features['OB_IB'].values)
spread=remove_duplicates(features['spread'].values)

# formatted as numpy float 
np.savetxt(r'psg_text_hmm/vector_bidsize.txt', bidsize)
np.savetxt(r'psg_text_hmm/vector_offersize.txt', offersize)
np.savetxt(r'psg_text_hmm/vector_bookimbalance.txt', bookimbalance)
np.savetxt(r'psg_text_hmm/vector_spread.txt', spread)



Dropped 2215 of original 13279 consecutive repeated values from input series
Dropped 1517 of original 13279 consecutive repeated values from input series
Dropped 412 of original 13279 consecutive repeated values from input series
Dropped 614 of original 13279 consecutive repeated values from input series


In [3]:
features

Unnamed: 0_level_0,Bid_Price,Bid_Size,Offer_Price,Offer_Size,OB_IB,spread
sec,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02 09:30:03,295.700,0.693147,296.750,1.945910,1.945910,0.717840
2020-01-02 09:30:05,296.140,1.609438,296.750,1.791759,0.810930,0.476234
2020-01-02 09:30:06,294.875,0.693147,296.435,1.504077,1.504077,0.940007
2020-01-02 09:30:15,295.120,0.693147,295.775,0.916291,0.916291,0.503801
2020-01-02 09:30:17,295.440,0.693147,295.630,0.693147,0.693147,0.173953
...,...,...,...,...,...,...
2020-01-02 15:59:41,300.410,1.098612,300.660,0.693147,0.405465,0.223144
2020-01-02 15:59:46,300.290,1.386294,300.460,0.693147,0.287682,0.157004
2020-01-02 15:59:50,300.290,1.386294,300.660,0.693147,0.287682,0.314811
2020-01-02 15:59:53,300.290,1.386294,300.660,0.693147,0.287682,0.314811


# Steps

- Train HMM on one feature at a time
- Assume each feature is sampled according to two normal distributions that are our hidden states. 
- Learn optimal parameterization of hidden states

### PSG

Utilized HMM_Normal optimization routine for each feature

Initial point is estimated via Baum-Welch Algorithm

Inference is completed via constrained optimization



### Spread

In [4]:
psg_spread_prob = psg.psg_importfromtext('./psg_text_hmm/problem_hmm_normal_spread.txt')
psg_spread_prob['problem_statement'] = '\n'.join(psg_spread_prob['problem_statement'])
spread_solution=psg.psg_solver(psg_spread_prob)
spread_solution.values()

OK. Problem Imported

Running solver
Reading problem formulation
Asking for data information
Getting data
100% of vector_spread was read
Start optimization
Ext.iteration=0  Objective=0.740725099987E+00  Residual=0.000000000000E+00
Ext.iteration=10  Objective=0.740725099987E+00  Residual=0.000000000000E+00
Optimization is stopped
Solution is optimal
Calculating resulting outputs. Writing solution.
Objective: objective = 32086.1760096 [-4.512213776820E+16]
Solver has normally finished. Solution was saved.
Problem: problem_hmm_normal, solution_status = optimal
Timing: data_loading_time = 0.10, preprocessing_time = 12.55, solving_time = 1.16
Variables: optimal_point = point_problem_hmm_normal
Objective: objective = 32086.1760096 [-4.512213776820E+16]
Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states
Function: hmm_normal(2,vector_spread) =  3.208617600959E+04
OK. Solver Finished



dict_values(['problem_hmm_normal', 'optimal', ['problem_HMM_Normal, maximize', '  hmm_normal(2,vector_spread)', '  Solver: VAN , precision=9, stages=10'], ['Problem: problem_hmm_normal, solution_status = optimal', 'Timing: data_loading_time = 0.10, preprocessing_time = 12.55, solving_time = 1.16', 'Variables: optimal_point = point_problem_hmm_normal', 'Objective: objective = 32086.1760096 [-4.512213776820E+16]', 'Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states', 'Function: hmm_normal(2,vector_spread) =  3.208617600959E+04'], [['p1', 'p2', 'a1_1', 'a1_2', 'a2_1', 'a2_2', 'mu1', 'si1', 'mu2', 'si2'], array([0.        , 1.        , 0.94280774, 0.05719226, 0.25851767,
       0.74148233, 0.03607777, 0.01064202, 0.10616565, 0.09010888])], array([2., 2., 2., ..., 2., 2., 2.]), array([1., 1., 1.]), array([0.00000000e+00, 1.57651669e-14, 1.50990331e-14]), [['state1', 'state2'], array([[0.00000000e+000, 1.00000000e+000],
       [0.00000000e+000, 1.00000000e+0

In [5]:
p1,p2,a11,a12,a21,a22,mu1,si1,mu2,si2=list(spread_solution.values())[4][1]
mu1

0.036077765011157656

### Book Imbalance

In [6]:
psg_bookimbalance_prob = psg.psg_importfromtext('./psg_text_hmm/problem_hmm_normal_bookimbalance.txt')
psg_bookimbalance_prob['problem_statement'] = '\n'.join(psg_bookimbalance_prob['problem_statement'])
bookimbalance_solution=psg.psg_solver(psg_bookimbalance_prob)
bookimbalance_solution.values()

OK. Problem Imported

Running solver
Reading problem formulation
Asking for data information
Getting data
100% of vector_bookimbalance was read
Start optimization
Ext.iteration=0  Objective=0.706870918395E+00  Residual=0.000000000000E+00
Ext.iteration=9  Objective=0.706870918395E+00  Residual=0.000000000000E+00
Ext.iteration=10  Objective=0.706870918395E+00  Residual=0.000000000000E+00
Optimization is stopped
Solution is optimal
Calculating resulting outputs. Writing solution.
Objective: objective = -147.294109485 [-2.170571175784E+14]
Solver has normally finished. Solution was saved.
Problem: problem_hmm_normal, solution_status = optimal
Timing: data_loading_time = 0.16, preprocessing_time = 32.92, solving_time = 3.56
Variables: optimal_point = point_problem_hmm_normal
Objective: objective = -147.294109485 [-2.170571175784E+14]
Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states
Function: hmm_normal(2,vector_bookimbalance) = -1.472941094850E+02
OK. Sol

dict_values(['problem_hmm_normal', 'optimal', ['problem_HMM_Normal, maximize', '  hmm_normal(2,vector_bookimbalance)', '  Solver: VAN , precision=9, stages=10'], ['Problem: problem_hmm_normal, solution_status = optimal', 'Timing: data_loading_time = 0.16, preprocessing_time = 32.92, solving_time = 3.56', 'Variables: optimal_point = point_problem_hmm_normal', 'Objective: objective = -147.294109485 [-2.170571175784E+14]', 'Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states', 'Function: hmm_normal(2,vector_bookimbalance) = -1.472941094850E+02'], [['p1', 'p2', 'a1_1', 'a1_2', 'a2_1', 'a2_2', 'mu1', 'si1', 'mu2', 'si2'], array([1.        , 0.        , 0.93366698, 0.06633302, 0.03164285,
       0.96835715, 1.00815161, 0.28176739, 0.71286385, 0.20429735])], array([1., 1., 1., ..., 2., 2., 2.]), array([1., 1., 1.]), array([0.00000000e+00, 5.10702591e-15, 2.88657986e-15]), [['state1', 'state2'], array([[1.00000000e+00, 0.00000000e+00],
       [9.96862191e-01, 3

### Offer Size

In [7]:
psg_offersize_prob = psg.psg_importfromtext('./psg_text_hmm/problem_hmm_normal_offersize.txt')
psg_offersize_prob['problem_statement'] = '\n'.join(psg_offersize_prob['problem_statement'])
offersize_solution=psg.psg_solver(psg_offersize_prob)
offersize_solution.values()

OK. Problem Imported

Running solver
Reading problem formulation
Asking for data information
Getting data
100% of vector_offersize was read
Start optimization
Ext.iteration=0  Objective=0.754943884276E+00  Residual=0.000000000000E+00
Ext.iteration=10  Objective=0.754943884276E+00  Residual=0.000000000000E+00
Optimization is stopped
Solution is optimal
Calculating resulting outputs. Writing solution.
Objective: objective = 1349.27644869 [-1.861722877679E+15]
Solver has normally finished. Solution was saved.
Problem: problem_hmm_normal, solution_status = optimal
Timing: data_loading_time = 0.18, preprocessing_time = 21.94, solving_time = 1.27
Variables: optimal_point = point_problem_hmm_normal
Objective: objective = 1349.27644869 [-1.861722877679E+15]
Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states
Function: hmm_normal(2,vector_offersize) =  1.349276448692E+03
OK. Solver Finished



dict_values(['problem_hmm_normal', 'optimal', ['problem_HMM_Normal, maximize', '  hmm_normal(2,vector_offersize)', '  Solver: VAN , precision=9, stages=10'], ['Problem: problem_hmm_normal, solution_status = optimal', 'Timing: data_loading_time = 0.18, preprocessing_time = 21.94, solving_time = 1.27', 'Variables: optimal_point = point_problem_hmm_normal', 'Objective: objective = 1349.27644869 [-1.861722877679E+15]', 'Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states', 'Function: hmm_normal(2,vector_offersize) =  1.349276448692E+03'], [['p1', 'p2', 'a1_1', 'a1_2', 'a2_1', 'a2_2', 'mu1', 'si1', 'mu2', 'si2'], array([1.        , 0.        , 0.68130414, 0.31869586, 0.00224315,
       0.99775685, 1.66246961, 0.62988066, 1.21017954, 0.21892014])], array([1., 1., 1., ..., 2., 2., 2.]), array([1., 1., 1.]), array([0.00000000e+00, 2.22044605e-15, 6.88338275e-15]), [['state1', 'state2'], array([[1.00000000e+00, 0.00000000e+00],
       [9.94791772e-01, 5.20822824

### Bid Size

In [8]:
psg_bidsize_prob = psg.psg_importfromtext('./psg_text_hmm/problem_hmm_normal_bidsize.txt')
psg_bidsize_prob['problem_statement'] = '\n'.join(psg_bidsize_prob['problem_statement'])
bidsize_solution=psg.psg_solver(psg_bidsize_prob)
bidsize_solution.values()

OK. Problem Imported

Running solver
Reading problem formulation
Asking for data information
Getting data
100% of vector_bidsize was read
Start optimization
Ext.iteration=0  Objective=0.526958401537E+00  Residual=0.000000000000E+00
Ext.iteration=10  Objective=0.526958401537E+00  Residual=0.000000000000E+00
Optimization is stopped
Solution is optimal
Calculating resulting outputs. Writing solution.
Objective: objective = -207.891640655 [-4.109504502021E+14]
Solver has normally finished. Solution was saved.
Problem: problem_hmm_normal, solution_status = optimal
Timing: data_loading_time = 0.16, preprocessing_time = 18.14, solving_time = 0.66
Variables: optimal_point = point_problem_hmm_normal
Objective: objective = -207.891640655 [-4.109504502021E+14]
Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states
Function: hmm_normal(2,vector_bidsize) = -2.078916406554E+02
OK. Solver Finished



dict_values(['problem_hmm_normal', 'optimal', ['problem_HMM_Normal, maximize', '  hmm_normal(2,vector_bidsize)', '  Solver: VAN , precision=9, stages=10'], ['Problem: problem_hmm_normal, solution_status = optimal', 'Timing: data_loading_time = 0.16, preprocessing_time = 18.14, solving_time = 0.66', 'Variables: optimal_point = point_problem_hmm_normal', 'Objective: objective = -207.891640655 [-4.109504502021E+14]', 'Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states', 'Function: hmm_normal(2,vector_bidsize) = -2.078916406554E+02'], [['p1', 'p2', 'a1_1', 'a1_2', 'a2_1', 'a2_2', 'mu1', 'si1', 'mu2', 'si2'], array([1.        , 0.        , 0.78265447, 0.21734553, 0.00639426,
       0.99360574, 1.56456964, 0.44639881, 1.18190128, 0.23540001])], array([1., 1., 1., ..., 1., 1., 1.]), array([1., 1., 1.]), array([0.00000000e+00, 2.66453526e-15, 7.99360578e-15]), [['state1', 'state2'], array([[1.00000000e+00, 0.00000000e+00],
       [9.43674702e-01, 5.63252979e-0

### HMM Learn


- Set 2 hidden components
- Solves via the Viterbi Forward Backwards Algorithm
- Full covariance matrix with min_covar to prevent overfitting
- Tolerance set equivantly to PSG

### Spread

In [9]:
spread_model=GaussianHMM(n_components=2,algorithm='viterbi',covariance_type="spherical",min_covar=1e-4, n_iter=1000,tol=1e-8)
fitted_spread_model=spread_model.fit(spread.reshape(-1, 1))

In [10]:
print(f"Transition Matrix is {fitted_spread_model.transmat_.flatten()}")
print(f"Mean Values are is {fitted_spread_model.means_.flatten()}")
print(f"Covariance Matrix is {fitted_spread_model.covars_.flatten()}")

Transition Matrix is [0.94331442 0.05668558 0.25864479 0.74135521]
Mean Values are is [0.03611566 0.10653405]
Covariance Matrix is [0.0001152  0.00816749]


### Book Imbalance

In [11]:
bookimbalance_model=GaussianHMM(n_components=2,algorithm='viterbi',covariance_type="spherical",min_covar=1e-4, n_iter=1000,tol=1e-8)
fitted_bookimbalance_model=bookimbalance_model.fit(bookimbalance.reshape(-1, 1))

In [12]:
print(f"Transition Matrix is {fitted_bookimbalance_model.transmat_.flatten()}")
print(f"Mean Values are is {fitted_bookimbalance_model.means_.flatten()}")
print(f"Covariance Matrix is {fitted_bookimbalance_model.covars_.flatten()}")

Transition Matrix is [0.86808984 0.13191016 0.11530724 0.88469276]
Mean Values are is [0.63978626 1.01772923]
Covariance Matrix is [0.04708052 0.09439633]


### Bid Size

In [13]:
bidsize_model=GaussianHMM(n_components=2,algorithm='viterbi',covariance_type="spherical",min_covar=1e-4, n_iter=1000,tol=1e-8)
fitted_bidsize_model=bidsize_model.fit(bidsize.reshape(-1, 1))

In [14]:
print(f"Transition Matrix is {fitted_bidsize_model.transmat_.flatten()}")
print(f"Mean Values are is {fitted_bidsize_model.means_.flatten()}")
print(f"Covariance Matrix is {fitted_bidsize_model.covars_.flatten()}")

Transition Matrix is [0.95984676 0.04015324 0.08757776 0.91242224]
Mean Values are is [1.25933958 0.98327013]
Covariance Matrix is [0.07917477 0.04151069]


### Offer Size

In [15]:
offersize_model=GaussianHMM(n_components=2,algorithm='viterbi',covariance_type="spherical",min_covar=1e-4, n_iter=1000,tol=1e-8)
fitted_offersize_model=offersize_model.fit(offersize.reshape(-1, 1))

In [16]:
print(f"Transition Matrix is {fitted_offersize_model.transmat_.flatten()}")
print(f"Mean Values are is {fitted_offersize_model.means_.flatten()}")
print(f"Covariance Matrix is {fitted_offersize_model.covars_.flatten()}")

Transition Matrix is [0.92974086 0.07025914 0.12389664 0.87610336]
Mean Values are is [1.29632304 1.05429898]
Covariance Matrix is [0.08579449 0.05060665]


### Stationary Distributions

Limiting marginal distributions

In [17]:
spread_stationary=fitted_spread_model.get_stationary_distribution()
bookimbalance_stationary=fitted_bookimbalance_model.get_stationary_distribution()
bidsize_stationary=fitted_bidsize_model.get_stationary_distribution()
offersize_stationary=fitted_offersize_model.get_stationary_distribution()

print(f"Stationary Distribution for Spread HMM is {spread_stationary}")
print(f"Stationary Distribution for Book Imbalance HMM is {bookimbalance_stationary}")
print(f"Stationary Distribution for Bidsize HMM is {bidsize_stationary}")
print(f"Stationary Distribution for Offersize HMM is {offersize_stationary}")

Stationary Distribution for Spread HMM is [0.82023432 0.17976568]
Stationary Distribution for Book Imbalance HMM is [0.46642041 0.53357959]
Stationary Distribution for Bidsize HMM is [0.68564219 0.31435781]
Stationary Distribution for Offersize HMM is [0.63813005 0.36186995]
