# Author Model Demo

This notebook demonstrates how to use the author model from the notebook. To demonstrate, we analyze contest author data.

Theoretically, it is as simple as:

```python
import pandas as pd
import bayesalpha as ba
data = pd.read_csv('foo.csv')
ba.fit_authors(data)
```

In [1]:
'''
When importing bayesalpha, if you get this error message:

    WARNING (theano.configdefaults): install mkl with `conda install mkl-service`: No module named 'mkl'
    
It means that you don't have the low-level mkl linear algebra package.
This means PyMC3 will run (significantly) slower.
Research team usually uses conda, since that ships with mkl.
'''

import pandas as pd
import bayesalpha as ba

  from ._conv import register_converters as _register_converters


In [2]:
# The data _must_ look like this!
# Column names must match too. 

sharpes = pd.read_csv('../tests/test_data/author_model_test_sharpes.csv', index_col=0)
returns = pd.read_csv('../tests/test_data/author_model_test_returns.csv', index_col=0)

In [3]:
sharpes.head()

Unnamed: 0,meta_user_id,meta_algorithm_id,meta_code_id,meta_trading_days,sharpe_ratio
0,aaa,aaa111,aaa111_0,163,-1.164508
1,aaa,aaa111,aaa111_1,96,0.593194
2,aaa,aaa111,aaa111_2,232,-1.164254
3,aaa,aaa111,aaa111_3,118,0.27807
4,aaa,aaa111,aaa111_4,220,1.041695


In [4]:
returns.head()

Unnamed: 0,aaa111_0,aaa111_1,aaa111_2,aaa111_3,aaa111_4,aaa111_5,aaa111_6,aaa111_7,aaa111_8,aaa111_9,...,ddd666_328,ddd666_329,ddd666_330,ddd666_331,ddd666_332,ddd666_333,ddd666_334,ddd666_335,ddd666_336,ddd666_337
0,-0.035051,-0.272576,-0.47418,0.064162,0.292978,0.023471,0.025942,0.100582,-0.174277,0.126081,...,-0.111784,-0.268968,-0.075652,0.189964,-0.55668,0.014488,-0.124629,-0.04844,0.381756,0.176951
1,-0.315909,0.069661,-0.015446,-0.10102,-0.225975,0.204022,-0.079135,-0.104554,-0.34747,-0.121504,...,-0.008342,0.248996,-0.101882,-0.652522,-0.007524,-0.272381,-0.066879,0.041663,0.039876,-0.0118
2,-0.037757,-0.173874,0.404682,-0.037672,-0.642219,0.27424,0.169435,-0.098156,0.479511,-0.330409,...,0.32713,0.030426,0.014759,0.134476,0.156697,-0.313864,0.104577,-0.109029,0.296487,-0.012178
3,-0.058816,-0.190401,-0.367955,0.452142,-0.450456,-0.102206,-0.281244,-0.039853,0.004766,-0.070515,...,-0.227059,0.322227,0.306905,-0.086498,0.513551,0.017932,-0.408598,-0.098558,-0.111911,-0.292455
4,-0.145389,-0.144361,0.290843,0.260297,-0.212217,-0.093912,-0.363812,0.436243,-0.493436,0.160006,...,-0.29815,-0.206716,-0.08219,-0.118802,-0.201424,-0.298939,0.073852,-0.298209,0.314797,-0.004759


In [5]:
# Get some idea of how big our data set is
num_authors = sharpes.meta_user_id.nunique()
num_algos = sharpes.meta_algorithm_id.nunique()
num_backtests = sharpes.meta_code_id.nunique()

print('# authors:\t{}'.format(num_authors),
      '# algos:\t{}'.format(num_algos),
      '# backtests:\t{}'.format(num_backtests),
      sep='\n')

# authors:	4
# algos:	15
# backtests:	338


In [6]:
'''
Try the default `sampler_args` and if necessary, change `sampler_args` to fine-tune the MCMC sampler.
Talk to a Bayesian if you need help.

Sampling usually takes a while.
For reference: on QUACS, ingesting 30 authors, 900 algos, 40000 backtests,
with PyMC3 running 4 chains in 4 jobs, takes around 15 minutes.
'''

trace = ba.fit_authors(sharpes,
                       returns,
                       sampler_args={
                           # Setting 1 draw and 1 tune... this is an example, right?
                           'draws': 1,
                           'tune': 1,
                           'nuts_kwargs': {'target_accept': 0.90}
                       },
                       save_data=False                        
                      )

Only 1 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [mu_algo_raw, mu_algo_sd_log__, mu_author_raw, mu_author_sd_log__, mu_global]
100%|██████████| 2/2 [00:01<00:00,  1.93it/s]
The chain contains only diverging samples. The model is probably misspecified.
The acceptance probability does not match the target. It is 5.250929150656001e-118, but should be close to 0.9. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.0, but should be close to 0.9. Try to increase the number of tuning steps.


In [7]:
# Save the resulting trace object as a netcdf file.
trace.save('example.nc')