### Statistical Inference of Hidden Markov Models on High Frequency Quote Data

Benchmarking PSG performance of statistical inference against HMMLearn 

References:

PSG: http://aorda.com/html/PSG_Help_HTML/index.html?hmm_normal.htm

HMMLearn: https://hmmlearn.readthedocs.io/en/latest/index.html



In [18]:
import pandas as pd
import numpy as np
from hmmlearn.hmm import GaussianHMM
import matplotlib.pyplot as plt

import os
os.add_dll_directory('C:\Aorda\PSG\lib')
import psgpython as psg 
from psg_loader import load_psg

load_psg()

Inputting features removing duplicated values within each observation


In [19]:

features=pd.read_csv('data/agg_features/grouped_features_2020-01-02.csv',index_col=0)
frac=0.75



def remove_duplicates(series):
    
    cleaned_series=series[np.insert(np.diff(series).astype(bool), 0, True)]
    dropped_els=len(series)-len(cleaned_series)
    
    print(f"Dropped {dropped_els} of original {len(series)} consecutive repeated values from input series")
    return cleaned_series

bidsize=remove_duplicates(features['Bid_Size'].values)
offersize=remove_duplicates(features['Offer_Size'].values)
bookimbalance=remove_duplicates(features['OB_IB'].values)
spread=remove_duplicates(features['spread'].values)

# formatted as numpy float 
np.savetxt(r'psg_text_hmm/vector_bidsize.txt', bidsize)
np.savetxt(r'psg_text_hmm/vector_offersize.txt', offersize)
np.savetxt(r'psg_text_hmm/vector_bookimbalance.txt', bookimbalance)
np.savetxt(r'psg_text_hmm/vector_spread.txt', spread)



Dropped 2215 of original 13279 consecutive repeated values from input series
Dropped 1517 of original 13279 consecutive repeated values from input series
Dropped 412 of original 13279 consecutive repeated values from input series
Dropped 614 of original 13279 consecutive repeated values from input series


In [20]:
features

Unnamed: 0_level_0,Bid_Price,Bid_Size,Offer_Price,Offer_Size,OB_IB,spread
sec,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02 09:30:03,295.700,0.693147,296.750,1.945910,1.945910,0.717840
2020-01-02 09:30:05,296.140,1.609438,296.750,1.791759,0.810930,0.476234
2020-01-02 09:30:06,294.875,0.693147,296.435,1.504077,1.504077,0.940007
2020-01-02 09:30:15,295.120,0.693147,295.775,0.916291,0.916291,0.503801
2020-01-02 09:30:17,295.440,0.693147,295.630,0.693147,0.693147,0.173953
...,...,...,...,...,...,...
2020-01-02 15:59:41,300.410,1.098612,300.660,0.693147,0.405465,0.223144
2020-01-02 15:59:46,300.290,1.386294,300.460,0.693147,0.287682,0.157004
2020-01-02 15:59:50,300.290,1.386294,300.660,0.693147,0.287682,0.314811
2020-01-02 15:59:53,300.290,1.386294,300.660,0.693147,0.287682,0.314811


# Steps

- Train HMM on one feature at a time
- Assume each feature is sampled according to two normal distributions that are our hidden states. 
- Learn optimal parameterization of hidden states

### PSG

Utilized HMM_Normal optimization routine for each feature

Initial point is estimated via Baum-Welch Algorithm

Inference is completed via constrained optimization



### Spread

In [21]:
psg_spread_prob = psg.psg_importfromtext('./psg_text_hmm/problem_hmm_normal_spread.txt')
psg_spread_prob['problem_statement'] = '\n'.join(psg_spread_prob['problem_statement'])
spread_solution=psg.psg_solver(psg_spread_prob)
spread_solution.values()

OK. Problem Imported

Running solver
Reading problem formulation
Asking for data information
Getting data
    100.0% of scenarios is processed
100% of vector_spread was read
Start optimization
Ext.iteration=0  Objective=0.740725099987E+00  Residual=0.000000000000E+00
Ext.iteration=10  Objective=0.740725099987E+00  Residual=0.000000000000E+00
Optimization is stopped
Solution is optimal
Calculating resulting outputs. Writing solution.
Objective: objective = 32086.1760096 [-4.512213776820E+16]
Solver has normally finished. Solution was saved.
Problem: problem_hmm_normal, solution_status = optimal
Timing: data_loading_time = 0.11, preprocessing_time = 12.23, solving_time = 1.27
Variables: optimal_point = point_problem_hmm_normal
Objective: objective = 32086.1760096 [-4.512213776820E+16]
Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states
Function: hmm_normal(2,vector_spread) =  3.208617600959E+04
OK. Solver Finished



dict_values(['problem_hmm_normal', 'optimal', ['problem_HMM_Normal, maximize', '  hmm_normal(2,vector_spread)', '  Solver: VAN , precision=9, stages=10'], ['Problem: problem_hmm_normal, solution_status = optimal', 'Timing: data_loading_time = 0.11, preprocessing_time = 12.23, solving_time = 1.27', 'Variables: optimal_point = point_problem_hmm_normal', 'Objective: objective = 32086.1760096 [-4.512213776820E+16]', 'Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states', 'Function: hmm_normal(2,vector_spread) =  3.208617600959E+04'], [['p1', 'p2', 'a1_1', 'a1_2', 'a2_1', 'a2_2', 'mu1', 'si1', 'mu2', 'si2'], array([0.        , 1.        , 0.94280774, 0.05719226, 0.25851767,
       0.74148233, 0.03607777, 0.01064202, 0.10616565, 0.09010888])], array([2., 2., 2., ..., 2., 2., 2.]), array([1., 1., 1.]), array([0.00000000e+00, 1.57651669e-14, 1.50990331e-14]), [['state1', 'state2'], array([[0.00000000e+000, 1.00000000e+000],
       [0.00000000e+000, 1.00000000e+0

In [22]:
p1,p2,a11,a12,a21,a22,mu1,si1,mu2,si2=list(spread_solution.values())[4][1]
mu1

0.036077765011157656

### Book Imbalance

In [23]:
psg_bookimbalance_prob = psg.psg_importfromtext('./psg_text_hmm/problem_hmm_normal_bookimbalance.txt')
psg_bookimbalance_prob['problem_statement'] = '\n'.join(psg_bookimbalance_prob['problem_statement'])
bookimbalance_solution=psg.psg_solver(psg_bookimbalance_prob)
bookimbalance_solution.values()

OK. Problem Imported

Running solver
Reading problem formulation
Asking for data information
Getting data
100% of vector_bookimbalance was read
Start optimization
Ext.iteration=0  Objective=0.912566134485E+00  Residual=0.000000000000E+00
Ext.iteration=29  Objective=0.912566134485E+00  Residual=0.000000000000E+00
Ext.iteration=51  Objective=0.912566134485E+00  Residual=0.000000000000E+00
Ext.iteration=78  Objective=0.912566134548E+00  Residual=0.000000000000E+00
Ext.iteration=101  Objective=0.912566134549E+00  Residual=0.000000000000E+00
Ext.iteration=130  Objective=0.912566134549E+00  Residual=0.000000000000E+00
Ext.iteration=144  Objective=0.912566134549E+00  Residual=0.000000000000E+00
Optimization is stopped
Solution is feasible
Calculating resulting outputs. Writing solution.
Objective: objective = -2916.85483052 [-3.995401977018E-02]
Solver has normally finished. Solution was saved.
Problem: problem_hmm_normal, solution_status = feasible
Timing: data_loading_time = 0.12, preproces

dict_values(['problem_hmm_normal', 'feasible', ['problem_HMM_Normal, maximize', '  hmm_normal(2,vector_bookimbalance)', '  Solver: VAN , precision=9, stages=10'], ['Problem: problem_hmm_normal, solution_status = feasible', 'Timing: data_loading_time = 0.12, preprocessing_time = 18.62, solving_time = 11.36', 'Variables: optimal_point = point_problem_hmm_normal', 'Objective: objective = -2916.85483052 [-3.995401977018E-02]', 'Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states', 'Function: hmm_normal(2,vector_bookimbalance) = -2.916854830516E+03'], [['p1', 'p2', 'a1_1', 'a1_2', 'a2_1', 'a2_2', 'mu1', 'si1', 'mu2', 'si2'], array([1.        , 0.        , 0.88468661, 0.11531339, 0.13191529,
       0.86808471, 1.01773347, 0.30723564, 0.63978433, 0.21697457])], array([1., 1., 1., ..., 2., 2., 2.]), array([1., 1., 1.]), array([0.00000000e+00, 2.66453526e-15, 1.38777878e-14]), [['state1', 'state2'], array([[1.        , 0.        ],
       [0.97474474, 0.02525526

### Offer Size

In [24]:
psg_offersize_prob = psg.psg_importfromtext('./psg_text_hmm/problem_hmm_normal_offersize.txt')
psg_offersize_prob['problem_statement'] = '\n'.join(psg_offersize_prob['problem_statement'])
offersize_solution=psg.psg_solver(psg_offersize_prob)
offersize_solution.values()

OK. Problem Imported

Running solver
Reading problem formulation
Asking for data information
Getting data
100% of vector_offersize was read
Start optimization
Ext.iteration=0  Objective=0.598659916621E+00  Residual=0.000000000000E+00
Ext.iteration=46  Objective=0.598659916621E+00  Residual=0.000000000000E+00
Ext.iteration=103  Objective=0.598659916631E+00  Residual=0.000000000000E+00
Ext.iteration=144  Objective=0.598659916631E+00  Residual=0.000000000000E+00
Optimization is stopped
Solution is feasible
Calculating resulting outputs. Writing solution.
Objective: objective = -1992.61805852 [-7.65680376719]
Solver has normally finished. Solution was saved.
Problem: problem_hmm_normal, solution_status = feasible
Timing: data_loading_time = 0.09, preprocessing_time = 14.19, solving_time = 5.98
Variables: optimal_point = point_problem_hmm_normal
Objective: objective = -1992.61805852 [-7.65680376719]
Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states
Functio

dict_values(['problem_hmm_normal', 'feasible', ['problem_HMM_Normal, maximize', '  hmm_normal(2,vector_offersize)', '  Solver: VAN , precision=9, stages=10'], ['Problem: problem_hmm_normal, solution_status = feasible', 'Timing: data_loading_time = 0.09, preprocessing_time = 14.19, solving_time = 5.98', 'Variables: optimal_point = point_problem_hmm_normal', 'Objective: objective = -1992.61805852 [-7.65680376719]', 'Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states', 'Function: hmm_normal(2,vector_offersize) = -1.992618058523E+03'], [['p1', 'p2', 'a1_1', 'a1_2', 'a2_1', 'a2_2', 'mu1', 'si1', 'mu2', 'si2'], array([1.        , 0.        , 0.9297239 , 0.0702761 , 0.12393203,
       0.87606797, 1.29632743, 0.29290249, 1.0542844 , 0.2249448 ])], array([1., 1., 1., ..., 1., 1., 1.]), array([1., 1., 1.]), array([0.00000000e+00, 1.77635684e-14, 1.73194792e-14]), [['state1', 'state2'], array([[1.00000000e+00, 0.00000000e+00],
       [9.98177645e-01, 1.82235547e-

### Bid Size

In [25]:
psg_bidsize_prob = psg.psg_importfromtext('./psg_text_hmm/problem_hmm_normal_bidsize.txt')
psg_bidsize_prob['problem_statement'] = '\n'.join(psg_bidsize_prob['problem_statement'])
bidsize_solution=psg.psg_solver(psg_bidsize_prob)
bidsize_solution.values()

OK. Problem Imported

Running solver
Reading problem formulation
Asking for data information
Getting data
100% of vector_bidsize was read
Start optimization
Ext.iteration=0  Objective=0.562587472260E+00  Residual=0.000000000000E+00
Ext.iteration=52  Objective=0.582621660320E+00  Residual=0.000000000000E+00
Ext.iteration=102  Objective=0.608856565830E+00  Residual=0.000000000000E+00
Ext.iteration=146  Objective=0.688106822136E+00  Residual=0.000000000000E+00
Ext.iteration=188  Objective=0.688221209447E+00  Residual=0.000000000000E+00
Ext.iteration=240  Objective=0.688221209447E+00  Residual=0.000000000000E+00
Ext.iteration=292  Objective=0.688221209447E+00  Residual=0.000000000000E+00
Optimization is stopped
Solution is feasible
Calculating resulting outputs. Writing solution.
Objective: objective = -1388.74744763 [-1.77816983722]
Solver has normally finished. Solution was saved.
Problem: problem_hmm_normal, solution_status = feasible
Timing: data_loading_time = 0.18, preprocessing_time

dict_values(['problem_hmm_normal', 'feasible', ['problem_HMM_Normal, maximize', '  hmm_normal(2,vector_bidsize)', '  Solver: VAN , precision=9, stages=10'], ['Problem: problem_hmm_normal, solution_status = feasible', 'Timing: data_loading_time = 0.18, preprocessing_time = 12.80, solving_time = 11.59', 'Variables: optimal_point = point_problem_hmm_normal', 'Objective: objective = -1388.74744763 [-1.77816983722]', 'Constraint: sum_of_probabilities_for_states = vector_sum_of_probabilities_for_states', 'Function: hmm_normal(2,vector_bidsize) = -1.388747447629E+03'], [['p1', 'p2', 'a1_1', 'a1_2', 'a2_1', 'a2_2', 'mu1', 'si1', 'mu2', 'si2'], array([0.72207366, 0.27792634, 0.95999742, 0.04000258, 0.087497  ,
       0.912503  , 1.25896764, 0.28146044, 0.98287768, 0.20335443])], array([1., 1., 2., ..., 1., 1., 1.]), array([1., 1., 1.]), array([4.4408921e-16, 4.4408921e-16, 8.8817842e-16]), [['state1', 'state2'], array([[0.77419653, 0.22580347],
       [0.87275135, 0.12724865],
       [0.7078130

### HMM Learn


- Set 2 hidden components
- Solves via the Viterbi Forward Backwards Algorithm
- Full covariance matrix with min_covar to prevent overfitting
- Tolerance set equivantly to PSG

### Spread

In [26]:
spread_model=GaussianHMM(n_components=2,algorithm='viterbi',covariance_type="spherical",min_covar=1e-4, n_iter=1000,tol=1e-8, verbose=True)
fitted_spread_model=spread_model.fit(spread.reshape(-1, 1))

         1        5984.5064             +nan
         2       30054.5619      +24070.0555
         3       31056.2370       +1001.6751
         4       31634.0862        +577.8492
         5       31914.4784        +280.3922
         6       32021.5371        +107.0587
         7       32060.4482         +38.9111
         8       32075.1683         +14.7201
         9       32081.0512          +5.8829
        10       32083.5412          +2.4900
        11       32084.6588          +1.1176
        12       32085.1916          +0.5328
        13       32085.4613          +0.2697
        14       32085.6054          +0.1441
        15       32085.6860          +0.0806
        16       32085.7327          +0.0467
        17       32085.7605          +0.0278
        18       32085.7774          +0.0168
        19       32085.7877          +0.0103
        20       32085.7940          +0.0064
        21       32085.7980          +0.0040
        22       32085.8005          +0.0025
        23

In [27]:
print(f"Transition Matrix is {fitted_spread_model.transmat_.flatten()}")
print(f"Mean Values are is {fitted_spread_model.means_.flatten()}")
print(f"Covariance Matrix is {fitted_spread_model.covars_.flatten()}")

Transition Matrix is [0.94331442 0.05668558 0.25864479 0.74135521]
Mean Values are is [0.03611566 0.10653405]
Covariance Matrix is [0.0001152  0.00816749]


### Book Imbalance

In [28]:
bookimbalance_model=GaussianHMM(n_components=2,algorithm='viterbi',covariance_type="spherical",min_covar=1e-4, n_iter=1000,tol=1e-8, verbose=True)
fitted_bookimbalance_model=bookimbalance_model.fit(bookimbalance.reshape(-1, 1))

         1       -6021.6746             +nan
         2       -3797.9273       +2223.7474
         3       -3671.8901        +126.0372
         4       -3578.4880         +93.4022
         5       -3472.9022        +105.5857
         6       -3348.9180        +123.9843
         7       -3220.1830        +128.7350
         8       -3113.1360        +107.0470
         9       -3043.4067         +69.7293
        10       -3004.3364         +39.0702
        11       -2982.0222         +22.3143
        12       -2967.5433         +14.4788
        13       -2957.0449         +10.4984
        14       -2948.9930          +8.0519
        15       -2942.6733          +6.3197
        16       -2937.6636          +5.0097
        17       -2933.6712          +3.9924
        18       -2930.4782          +3.1931
        19       -2927.9172          +2.5610
        20       -2925.8581          +2.0591
        21       -2924.1987          +1.6594
        22       -2922.8585          +1.3402
        23

In [29]:
print(f"Transition Matrix is {fitted_bookimbalance_model.transmat_.flatten()}")
print(f"Mean Values are is {fitted_bookimbalance_model.means_.flatten()}")
print(f"Covariance Matrix is {fitted_bookimbalance_model.covars_.flatten()}")

Transition Matrix is [0.8846931  0.1153069  0.13191061 0.86808939]
Mean Values are is [1.01772862 0.63978568]
Covariance Matrix is [0.09439634 0.0470804 ]


### Bid Size

In [30]:
bidsize_model=GaussianHMM(n_components=2,algorithm='viterbi',covariance_type="spherical",min_covar=1e-4, n_iter=1000,tol=1e-8, verbose=True)
fitted_bidsize_model=bidsize_model.fit(bidsize.reshape(-1, 1))

         1       -3344.8421             +nan
         2       -2234.5662       +1110.2759
         3       -2107.8322        +126.7340
         4       -2052.0698         +55.7625
         5       -2023.4304         +28.6394
         6       -2007.4193         +16.0111
         7       -1997.9702          +9.4491
         8       -1992.1736          +5.7966
         9       -1988.5077          +3.6658
        10       -1986.1288          +2.3789
        11       -1984.5491          +1.5798
        12       -1983.4772          +1.0719
        13       -1982.7350          +0.7422
        14       -1982.2108          +0.5241
        15       -1981.8337          +0.3771
        16       -1981.5574          +0.2763
        17       -1981.3513          +0.2060
        18       -1981.1952          +0.1562
        19       -1981.0749          +0.1203
        20       -1980.9809          +0.0940
        21       -1980.9064          +0.0745
        22       -1980.8465          +0.0598
        23

In [31]:
print(f"Transition Matrix is {fitted_bidsize_model.transmat_.flatten()}")
print(f"Mean Values are is {fitted_bidsize_model.means_.flatten()}")
print(f"Covariance Matrix is {fitted_bidsize_model.covars_.flatten()}")

Transition Matrix is [0.43495482 0.56504518 0.92495943 0.07504057]
Mean Values are is [1.17178961 1.1733334 ]
Covariance Matrix is [0.08358476 0.08403393]


### Offer Size

In [32]:
offersize_model=GaussianHMM(n_components=2,algorithm='viterbi',covariance_type="spherical",min_covar=1e-4, n_iter=1000,tol=1e-8, verbose=True)
fitted_offersize_model=offersize_model.fit(offersize.reshape(-1, 1))

         1       -5842.4639             +nan
         2       -2294.1031       +3548.3608
         3       -2271.2022         +22.9009
         4       -2260.2018         +11.0004
         5       -2249.6979         +10.5040
         6       -2237.5510         +12.1469
         7       -2222.9242         +14.6268
         8       -2205.3038         +17.6204
         9       -2184.5991         +20.7047
        10       -2161.4581         +23.1410
        11       -2137.3850         +24.0731
        12       -2114.3084         +23.0766
        13       -2093.7643         +20.5441
        14       -2076.3774         +17.3869
        15       -2061.9998         +14.3776
        16       -2050.1584         +11.8414
        17       -2040.3749          +9.7836
        18       -2032.2684          +8.1065
        19       -2025.5483          +6.7201
        20       -2019.9842          +5.5641
        21       -2015.3854          +4.5989
        22       -2011.5898          +3.7955
        23

In [33]:
print(f"Transition Matrix is {fitted_offersize_model.transmat_.flatten()}")
print(f"Mean Values are is {fitted_offersize_model.means_.flatten()}")
print(f"Covariance Matrix is {fitted_offersize_model.covars_.flatten()}")

Transition Matrix is [0.92974175 0.07025825 0.12389871 0.87610129]
Mean Values are is [1.29632176 1.0542967 ]
Covariance Matrix is [0.08579435 0.05060624]


### Stationary Distributions

Limiting marginal distributions

In [34]:
spread_stationary=fitted_spread_model.get_stationary_distribution()
bookimbalance_stationary=fitted_bookimbalance_model.get_stationary_distribution()
bidsize_stationary=fitted_bidsize_model.get_stationary_distribution()
offersize_stationary=fitted_offersize_model.get_stationary_distribution()

print(f"Stationary Distribution for Spread HMM is {spread_stationary}")
print(f"Stationary Distribution for Book Imbalance HMM is {bookimbalance_stationary}")
print(f"Stationary Distribution for Bidsize HMM is {bidsize_stationary}")
print(f"Stationary Distribution for Offersize HMM is {offersize_stationary}")

Stationary Distribution for Spread HMM is [0.82023432 0.17976568]
Stationary Distribution for Book Imbalance HMM is [0.53358118 0.46641882]
Stationary Distribution for Bidsize HMM is [0.62077622 0.37922378]
Stationary Distribution for Offersize HMM is [0.63813686 0.36186314]
