# LunAPI : projects

Links to notebooks in this repository: [Index](./00_overview.ipynb) | [Luna tutorial](./tutorial.ipynb) | 
[Individuals](./01_indivs.ipynb) | [Projects](./02_projects.ipynb) | [Staging](./03_staging.ipynb) | [Models](./04_models.ipynb) | [Advanced](./98_advanced.ipynb) | [Reference](./99_reference.ipynb)

---

Here we show how to use the package to work with multiple individuals, applying a set of Luna commands to an entire _sample-list_.

In [1]:
import lunapi as lp
proj = lp.proj()

initiated lunapi v0.0.7 <lunapi.lunapi0.luna object at 0x12fbd7570> 



### Working with projects

A _project_ is defined by a list of individual samples (EDFs/annotations).  We can then attach a sample list, either from a file (`read_sample_list()`) or build one on-the-fly.  This function returns the number of observations (EDF/annotation pairs), which is 3 in this case, but stores the information on the samples internally.

We can then attach a sample list, either from a file (`read_sample_list()`) or build one on-the-fly.  This function returns the number of observations (EDF/annotation pairs), but stores the information on the samples internally.

In [2]:
proj.build( [ 'tutorial/edfs' ] )

3

You can see the information of the attached sample list as follows:

In [3]:
proj.sample_list()

Unnamed: 0,ID,EDF,Annotations
1,learn-nsrr01,tutorial/edfs/learn-nsrr01.edf,{tutorial/edfs/learn-nsrr01.xml}
2,learn-nsrr02,tutorial/edfs/learn-nsrr02.edf,{tutorial/edfs/learn-nsrr02.xml}
3,learn-nsrr03,tutorial/edfs/learn-nsrr03.edf,{tutorial/edfs/learn-nsrr03.xml}


### Reading a sample-list from a file

If you have a pre-existing sample list (e.g. manually created, or generated by Luna's `--build` option), you can alternatively attach it with `sample_list( <filename> )`:

In [4]:
proj.sample_list( 'tutorial/s.lst' )

read 3 individuals from tutorial/s.lst


In [5]:
proj.sample_list()

Unnamed: 0,ID,EDF,Annotations
1,learn-nsrr01,tutorial/edfs/learn-nsrr01.edf,{tutorial/edfs/learn-nsrr01.xml}
2,learn-nsrr02,tutorial/edfs/learn-nsrr02.edf,{tutorial/edfs/learn-nsrr02.xml}
3,learn-nsrr03,tutorial/edfs/learn-nsrr03.edf,{tutorial/edfs/learn-nsrr03.xml}


Note, if the sample list has a relative path that does not align to the current working folder, you can always set the `path` special variable _before_ reading the sample-list: e.g. in the form of:
```
proj.var( 'path' , '/path/to/data/' )
proj.sample_list( '/path/to/data/s.lst' )
```
which is similar to the command-line version:
```
luna /path/to/data/s.lst path=/path/to/data/ -o out.db < cmd.txt
```

## Running a command across a project

To apply a Luna command to all individuals in a sample list use `proc()`:

In [6]:
res = proj.proc( 'STATS sig=EEG' )

___________________________________________________________________
Processing: learn-nsrr01 | tutorial/edfs/learn-nsrr01.edf
 duration 11.22.00, 40920s | time 21.58.17 - 09.20.17 | date 01.01.85

 signals: 14 (of 14) selected in a standard EDF file
  SaO2 | PR | EEG_sec | ECG | EMG | EOG_L | EOG_R | EEG
  AIRFLOW | THOR_RES | ABDO_RES | POSITION | LIGHT | OX_STAT
 ..................................................................
 CMD #1: STATS
   options: sig=EEG
 processing EEG ...
___________________________________________________________________
Processing: learn-nsrr02 | tutorial/edfs/learn-nsrr02.edf
 duration 09.57.30, 35850s | time 21.18.06 - 07.15.36 | date 01.01.85

 signals: 14 (of 14) selected in a standard EDF file
  SaO2 | PR | EEG_sec | ECG | EMG | EOG_L | EOG_R | EEG
  AIRFLOW | THOR_RES | ABDO_RES | POSITION | LIGHT | OX_STAT
 ..................................................................
 CMD #1: STATS
   options: sig=EEG
 processing EEG ...
____________________

It is generally useful to have the output logged as above.   If running programmatically, you can turn off output with `proj.silence()` (and back on with `proj.silence(False)`.   Note that - unlike the command-line Luna, a Jupyter Lab environment will generally return all console text only after completing all analysis.   This makes is less useful for tracking the progress of longer running jobs:  for this the standard command-line is advised (or better, using a HPC environment if available).

The `lp.show()` function is a convenience function to simply print all results in the current project-level results cache:

In [7]:
lp.show( res )

[1m[36mSTATS: CH[0m


Unnamed: 0,ID,CH,KURT,MAX,MEAN,MIN,P01,P02,P05,P10,...,P60,P70,P80,P90,P95,P98,P99,RMS,SD,SKEW
0,learn-nsrr01,EEG,4.657487,125.0,-0.301199,-124.019608,-124.019608,-124.019608,-70.098039,-24.019608,...,2.45098,4.411765,8.333333,23.039216,72.058824,125.0,125.0,37.801351,37.800154,0.086212
1,learn-nsrr02,EEG,3.546719,125.0,-0.370447,-124.019608,-124.019608,-124.019608,-78.921569,-33.823529,...,4.411765,8.333333,14.215686,30.882353,75.980392,125.0,125.0,41.044234,41.042566,-0.019268
2,learn-nsrr03,EEG,1.345607,125.0,-0.11223,-124.019608,-124.019608,-124.019608,-124.019608,-73.039216,...,3.431373,7.352941,15.196078,70.098039,125.0,125.0,125.0,54.2708,54.270689,-0.011078


You can see a table of the available output (organized by command and stratum) with `strata()`:

In [8]:
proj.strata()

Unnamed: 0,Command,Strata
0,STATS,CH


Note that these commands are basically identical to the individual-level variants (i.e. `proj.strata()` versus `p.strata()`) except they query the project-level cache, rather than the results cache for a particular individual (`inst` class, here called `p`).

We can pull out a single table of results with the `table()`:

In [9]:
df = proj.table( 'STATS' , 'CH' )
df

Unnamed: 0,ID,CH,KURT,MAX,MEAN,MIN,P01,P02,P05,P10,...,P60,P70,P80,P90,P95,P98,P99,RMS,SD,SKEW
0,learn-nsrr01,EEG,4.657487,125.0,-0.301199,-124.019608,-124.019608,-124.019608,-70.098039,-24.019608,...,2.45098,4.411765,8.333333,23.039216,72.058824,125.0,125.0,37.801351,37.800154,0.086212
1,learn-nsrr02,EEG,3.546719,125.0,-0.370447,-124.019608,-124.019608,-124.019608,-78.921569,-33.823529,...,4.411765,8.333333,14.215686,30.882353,75.980392,125.0,125.0,41.044234,41.042566,-0.019268
2,learn-nsrr03,EEG,1.345607,125.0,-0.11223,-124.019608,-124.019608,-124.019608,-124.019608,-73.039216,...,3.431373,7.352941,15.196078,70.098039,125.0,125.0,125.0,54.2708,54.270689,-0.011078


We can extract various columns, e.g.:

In [10]:
df[ ['ID','KURT'] ] 

Unnamed: 0,ID,KURT
0,learn-nsrr01,4.657487
1,learn-nsrr02,3.546719
2,learn-nsrr03,1.345607


Instead of working with a Pandas dataframe, we can extract/convert key columns using standard Python syntax, e.g.:

In [11]:
df[ ['MEAN','KURT','SKEW' ] ].to_numpy(dtype=float)

array([[-0.3011987 ,  4.65748664,  0.08621198],
       [-0.37044745,  3.54671873, -0.01926794],
       [-0.11222989,  1.34560684, -0.01107757]])

## Adding individual-specific variables

Project-level variables are, by definition, shared across an entire project, and are set with `proj.var()` or `proj.varmap()`.  Also, some project-level variables may be _special variables_ that initiate a particular Luna function (e.g. `alias` to remap channel labels, etc).  See the main Luna documentation for more details on [special variables](https://zzz.bwh.harvard.edu/luna/luna/args/#special-variables).

An individual-level variable is defined per-individual.  When evaluating a Luna script for a given individual, project-level and individual-level are combined (with individual-level values over-riding any project level one for that individual).   It is possible to specify a whole set of individual-level variables from a file.  For example, consider we've madethis example file (`misc/s1.txt`), which we'll use to set the variable `${s1}` equal to a different value per individual:

In [12]:
%%sh
cat misc/s1.txt

ID	s1
learn-nsrr01	EEG
learn-nsrr02	EMG
learn-nsrr03	ECG


We can then attach _all_ these variables via the single (project-level) command, using the `vars` special variable to assign these values to a specify internal cache, which is accessed when a given individual with a matching `ID` is analyzed.  (Note: these files should be tab-delimited and also have `ID` as the first column, and `.` as a missing value).

In [13]:
proj.var( 'vars' , 'misc/s1.txt' )

Now when running a command for a whole _sample-list_ that uses `${s1}`, this will swap in the correct variable for each individual:

In [14]:
res = proj.proc( 'STATS sig=${s1}' )

___________________________________________________________________
Processing: learn-nsrr01 | tutorial/edfs/learn-nsrr01.edf
 duration 11.22.00, 40920s | time 21.58.17 - 09.20.17 | date 01.01.85

 signals: 14 (of 14) selected in a standard EDF file
  SaO2 | PR | EEG_sec | ECG | EMG | EOG_L | EOG_R | EEG
  AIRFLOW | THOR_RES | ABDO_RES | POSITION | LIGHT | OX_STAT
 ..................................................................
 CMD #1: STATS
   options: sig=EEG
 processing EEG ...
___________________________________________________________________
Processing: learn-nsrr02 | tutorial/edfs/learn-nsrr02.edf
 duration 09.57.30, 35850s | time 21.18.06 - 07.15.36 | date 01.01.85

 signals: 14 (of 14) selected in a standard EDF file
  SaO2 | PR | EEG_sec | ECG | EMG | EOG_L | EOG_R | EEG
  AIRFLOW | THOR_RES | ABDO_RES | POSITION | LIGHT | OX_STAT
 ..................................................................
 CMD #1: STATS
   options: sig=EMG
 processing EMG ...
____________________

In [15]:
proj.strata()

Unnamed: 0,Command,Strata
0,STATS,CH


In [16]:
proj.table( 'STATS' , 'CH' )

Unnamed: 0,ID,CH,KURT,MAX,MEAN,MIN,P01,P02,P05,P10,...,P60,P70,P80,P90,P95,P98,P99,RMS,SD,SKEW
0,learn-nsrr03,ECG,11.165007,1.25,0.003571,-1.240196,-1.240196,-1.240196,-0.259804,-0.142157,...,0.02451,0.034314,0.063725,0.142157,0.259804,1.25,1.25,0.30003,0.300009,-0.094024
1,learn-nsrr01,EEG,4.657487,125.0,-0.301199,-124.019608,-124.019608,-124.019608,-70.098039,-24.019608,...,2.45098,4.411765,8.333333,23.039216,72.058824,125.0,125.0,37.801351,37.800154,0.086212
2,learn-nsrr02,EMG,-0.926233,31.5,-0.609767,-31.252941,-31.252941,-31.252941,-31.252941,-31.252941,...,1.111765,7.041176,20.629412,31.5,31.5,31.5,31.5,19.706464,19.69703,0.185887


## Saving results

You can use standard Python approaches to save results, including `pickle` for objects.

In [17]:
import pickle

For example, for a simple data returned by `table()`:

In [18]:
df = proj.table( 'STATS' , 'CH' )

In [19]:
type(df)

pandas.core.frame.DataFrame

We can save `df`:

In [20]:
pickle.dump( df , open( "df.p", "wb" ) )

and then later reload it:

In [21]:
df2 = pickle.load( open( "df.p", "rb" ) )
df2

Unnamed: 0,ID,CH,KURT,MAX,MEAN,MIN,P01,P02,P05,P10,...,P60,P70,P80,P90,P95,P98,P99,RMS,SD,SKEW
0,learn-nsrr03,ECG,11.165007,1.25,0.003571,-1.240196,-1.240196,-1.240196,-0.259804,-0.142157,...,0.02451,0.034314,0.063725,0.142157,0.259804,1.25,1.25,0.30003,0.300009,-0.094024
1,learn-nsrr01,EEG,4.657487,125.0,-0.301199,-124.019608,-124.019608,-124.019608,-70.098039,-24.019608,...,2.45098,4.411765,8.333333,23.039216,72.058824,125.0,125.0,37.801351,37.800154,0.086212
2,learn-nsrr02,EMG,-0.926233,31.5,-0.609767,-31.252941,-31.252941,-31.252941,-31.252941,-31.252941,...,1.111765,7.041176,20.629412,31.5,31.5,31.5,31.5,19.706464,19.69703,0.185887


Note that you cannot use this approach to save EDF objects (e.g. `proj` or `p`) as these are only references to underlying C/C++ structures.  Use Luna commands (`WRITE` and `WRITE-ANNOTS`) to save intermediate files in their standard formats (i.e. EDF, or Luna .annot).

That completes this project-level overview of the `lunapi` workflow.  You can see how to use POPS to perform automated staging in the [next notebook](./03_staging.ipynb).