# LunAPI : models

Links to notebooks in this repository: [Index](./00_overview.ipynb) | [Luna tutorial](./tutorial.ipynb) | 
[Individuals](./01_indivs.ipynb) | [Projects](./02_projects.ipynb) | [Staging](./03_staging.ipynb) | [Models](./04_models.ipynb) | [Advanced](./98_advanced.ipynb) | [Reference](./99_reference.ipynb)

---

This page shows how to run prediction models using `lunapi`. Currently we are only distributing a single model (brain-age prediction based on the sleep EEG) as an example.  We plan to add more models in the near future.

In [1]:
import lunapi as lp
proj = lp.proj()

initiated lunapi v0.0.7 <lunapi.lunapi0.luna object at 0x117ad6a70> 



In [2]:
proj.sample_list( 'tutorial/s.lst' )

read 3 individuals from tutorial/s.lst


In [3]:
proj.sample_list()

Unnamed: 0,ID,EDF,Annotations
1,learn-nsrr01,tutorial/edfs/learn-nsrr01.edf,{tutorial/edfs/learn-nsrr01.xml}
2,learn-nsrr02,tutorial/edfs/learn-nsrr02.edf,{tutorial/edfs/learn-nsrr02.xml}
3,learn-nsrr03,tutorial/edfs/learn-nsrr03.edf,{tutorial/edfs/learn-nsrr03.xml}


## Pointing to model resources

This notebook contains the `models` folder in the root directory: setting it here means this section will work whether you are running this inside the Docker container, or locally.  (The Docker which has the resources for this function already bundled and the default value of `MODEL_PATH` is already set to point to the Docker resources.):

In [4]:
lp.resources.MODEL_PATH = 'models'

## Single individual

This fits the Sun _et al._ (2019) model of brain age based on the sleep EEG as described [here](https://zzz.bwh.harvard.edu/luna/ref/predict/#sun2019) using Luna's [PREDICT](http://zzz.bwh.harvard.edu/luna/ref/predict/) framework.

__Note:__ we're making up the true ages of these three test individuals for now, as true values of these anonymized, randomly selected PSGs are not currently available to us. 

As this function produces a lot of output, we'll also silence the verbose log outputs.  The main workhorse is `predict_SUN2019()`.  See the [reference](./99_reference.ipynb) page for more information on this function.   As with `pops()`, this function has both project-level and individual-level variants (both give identical results).  We'll start with the individual-level variant, using the second individual from the sample-list:

In [5]:
proj.silence()
p = proj.inst( 2 ) 
p.predict_SUN2019( 'EEG' , age = 62 )

Unnamed: 0,ID,NF,NF_OBS,OKAY,Y,Y1,YOBS
0,learn-nsrr02,13,13,1,60.597311,67.582365,62.0


The primary output is `Y1`, the bias-corrected age prediction.  The observed age (as input above) is `YOBS`:

In [6]:
p.table( 'PREDICT' )

Unnamed: 0,ID,NF,NF_OBS,OKAY,Y,Y1,YOBS
0,learn-nsrr02,13,13,1,60.597311,67.582365,62.0


In other words, for this individual, the PAD (_predicted age difference_) is 67.6 - 62 = 5.6 years.

It is also possible to look at the individual features from the model - in this case, there are 13.  See the main Luna documentation for more information on these outputs.

In [7]:
p.table( 'PREDICT' , 'FTR' ) 

Unnamed: 0,ID,FTR,B,D,IMP,M,REIMP,SD,X,Z
0,learn-nsrr02,COUPL_OVERLAP_C,-0.804678,-0.449101,0,366.302452,0,191.716141,256.0,-0.575343
1,learn-nsrr02,DENS_C,-1.665346,-0.667626,0,4.513583,0,1.9116,3.438596,-0.562349
2,learn-nsrr02,alpha_bandpower_kurtosis_C_N2,-3.184509,-0.271177,0,7.331549,0,2.598451,7.701239,0.142273
3,learn-nsrr02,alpha_bandpower_mean_C_N1,2.29108,-0.992484,0,0.068193,0,0.047436,0.039154,-0.612167
4,learn-nsrr02,delta_alpha_mean_C_N3,-1.348501,-0.514881,0,1.343991,0,0.548411,0.512308,-1.516532
5,learn-nsrr02,delta_bandpower_kurtosis_C_N2,-1.868672,-0.512531,0,17.017404,0,4.071176,11.987432,-1.235508
6,learn-nsrr02,delta_bandpower_mean_C_N3,-2.620558,-0.522081,0,1.445,0,0.618704,0.528062,-1.482031
7,learn-nsrr02,delta_theta_mean_C_N3,1.386207,-0.615108,0,1.224915,0,0.458186,0.487287,-1.60989
8,learn-nsrr02,kurtosis_N2_C,-0.052233,0.418827,0,2.851093,0,1.34911,1.838891,-0.750274
9,learn-nsrr02,kurtosis_N3_C,-1.247537,0.216441,0,1.086065,0,0.576482,0.600127,-0.842936


## Project-level invocation

The primary difference when running in _project_ mode is how to specify the observed age for each individaul.  Internally, the script that drives expects a variable called `${age}` to be set for each individual.    We can specify this for each person in a sample-list by individual them as _individual-level variables_, which can be imported from a simple tab-delimited file (with `ID` as the first column, which must align with the sample-list IDs). Here we've a previously-made file with the ages of each individual, `misc/vars.txt`:

In [8]:
%%sh
cat misc/vars.txt

ID	age
learn-nsrr01	60
learn-nsrr02	61
learn-nsrr03	62


Having generated such a file, use the `vars` _special variable_ to attach a set of individual-level variables to the project.  When processing each individual, the appropriate value of `${age}` will be swapped into the script. (Note: this mechanism is general, and not specific to this `predict_*()` function _per se_.)

In [9]:
proj.var( 'vars' , 'misc/vars.txt' )
proj.predict_SUN2019( 'EEG' )

Unnamed: 0,ID,NF,NF_OBS,OKAY,Y,Y1,YOBS
0,learn-nsrr01,13,13,1,48.989031,54.903227,60.0
1,learn-nsrr02,13,13,1,60.597311,67.046936,61.0
2,learn-nsrr03,13,13,1,50.259777,57.244831,62.0


As shown above, the project-level `predict_*()` functions return a small object with the key predictions for each individual.  You can explore the full set of results (which includes all the individual Luna commands performed in order to generate the features for the prediction model) as follows:

In [10]:
proj.strata()

Unnamed: 0,Command,Strata
0,EPOCH,BL
1,MASK,EMASK_STG
2,MTM,B1_B2_CH_STG
3,MTM,B_CH_STG
4,MTM,B_STG
5,MTM,CH_F_STG
6,PREDICT,BL
7,PREDICT,FTR
8,RE,BL
9,RE,STG


In [11]:
proj.table( 'PREDICT' )

Unnamed: 0,ID,NF,NF_OBS,OKAY,Y,Y1,YOBS
0,learn-nsrr01,13,13,1,48.989031,54.903227,60.0
1,learn-nsrr02,13,13,1,60.597311,67.046936,61.0
2,learn-nsrr03,13,13,1,50.259777,57.244831,62.0


(As noted above - the true ages are unknown for these test individuals: the values of 60, 61 and 62 were entered randomly for the purpose of this tutorial. Naturally, in real analysis, it will be critical to supply the individuals' known ages.)

---
## Notes

### Timing 
It takes approximately 15-20 seconds for each individual - for larger samples, you should use parallel approaches 

### Training model location

If the model training data are located elsewhere, set it before running the `predict_*()` functions.

### Running with multiple central EEGs

Use a comma-delimited list of channels (e.g. `'C3,C4'` ), or a Python list (e.g. `[ 'C3', 'C4' ]` ); features will be computed for all channels and then averaged:

In [12]:
p.predict_SUN2019( 'EEG,EEG_sec' , age = 62 )

Unnamed: 0,ID,NF,NF_OBS,OKAY,Y,Y1,YOBS
0,learn-nsrr02,13,13,1,60.954578,67.939631,62.0
