# Maximum Likelihood  #

First, it is perfectly acceptable to calculate the fit of the Phat distribution to a univariate dataset using Maximum Likelihood Estimation (MLE) via negative log-likelihood. This process is available via the `fit` method (which inherits from `statsmodels` `GenericLikelihoodModel`.

BUT, there is one major issue as it pertains to the tails that must be considered.

SO, first, let's attempt to fit the Phat distribution to our familiar distribution of S&P 500 index level returns.

In [35]:
%load_ext autoreload
%autoreload 2

import seaborn as sns; sns.set(style = 'whitegrid')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
import yfinance as yf
from phat import Phat

sp = yf.download('^GSPC')
sp_ret = sp.Close.pct_change()[1:]

res = Phat.fit(sp_ret)

[*********************100%***********************]  1 of 1 completed
Optimization terminated successfully.
         Current function value: -3.184517
         Iterations: 160
         Function evaluations: 272


In [3]:
res.params

array([0.000596  , 0.00354826, 0.07450248, 0.06371205])

In [4]:
res.summary()

0,1,2,3
Dep. Variable:,Close,Log-Likelihood:,62553.0
Model:,PhatFit,AIC:,-125100.0
Method:,Maximum Likelihood,BIC:,-125100.0
Date:,"Wed, 21 Jul 2021",,
Time:,16:56:08,,
No. Observations:,19643,,
Df Residuals:,19642,,
Df Model:,0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.0006,5.14e-05,11.586,0.000,0.000,0.001
x1,0.0035,3.42e-05,103.757,0.000,0.003,0.004
xi_l,0.0745,0.009,8.371,0.000,0.057,0.092
xi_r,0.0637,0.009,7.464,0.000,0.047,0.080


We can see that both the left and right tail indices are much smaller than we have estimated using the [POT and Hill Double Bootstrap techniques](estimation.ipynb). This phenomenon of underfitting in the tails results because the impact of extreme events on the dataset is not large enough to offset the gains from optimization in the body. Hence, we end up with thinner tails and expose ourselves to all the riks that comes with that.

So, we can instead estimate the tails separately and pass them as fixed values to our fit method. This results in just two free parameters, $\mu$ and $\sigma$, in the Gaussian body.

In [6]:
from phat import two_tailed_hill_double_bootstrap
xi_left, xi_right = two_tailed_hill_double_bootstrap(sp_ret)
res = Phat.fit(sp_ret, xi_left, xi_right)

100%|█████████████████████████████████████| 10/10 [00:21<00:00,  2.14s/it]


In [8]:
res.summary()

0,1,2,3
Dep. Variable:,Close,Log-Likelihood:,61921.0
Model:,PhatFit,AIC:,-123800.0
Method:,Maximum Likelihood,BIC:,-123800.0
Date:,"Wed, 21 Jul 2021",,
Time:,14:50:07,,
No. Observations:,19643,,
Df Residuals:,19642,,
Df Model:,0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.0006,4.8e-05,12.852,0.000,0.001,0.001
x1,0.0032,3.2e-05,98.768,0.000,0.003,0.003


In [9]:
res.params

array([0.00061751, 0.00316166])

The difference may not appear too meaningful but we do get a greater mean and lesser volatility at the first decimal place of the result.