# Adding Labels and Features in FEMR

In order to currently run models in FEMR (currently just CLMBR), you will need to first assign `Label`s of interest to each patient timeline (i.e. a Patient + corresponding list of events). 

Once we have a `PatientDatabase` and `Label`s, we can run CLMBR (in the next tutorial).


## Labels
A `Label` represents a time/value of interest at which we'd like to use for training. When training, all event data before the Label's time are used to predict the corresponding value.

Currently in FEMR, there are pre-existing `Labeler` classes which can create labels based on specific timelines. Since the `PatientDatabase` is a set of files on disk, the output of this step is also saved to disk for later use.

## Features

There are currently 3 types of features that can be constructed (using FEMR `Featurizer` classes):
1. `AgeFeaturizer`: Adds calculated age at each Label.
2. `CountFeaturizer`: Creates a column for each unique code with the count per patient.
3. `NoteFeaturizer`: ...


### Using the PatientDatabase

In [1]:
from femr.datasets import PatientDatabase

patient_db = PatientDatabase("./example_data/example_etl_output")
print(type(patient_db))

<class 'femr.extension.datasets.PatientDatabase'>


### Getting some Labels

In [11]:
from femr.labelers import TimeHorizon
from femr.labelers.omop import CodeLabeler
from datetime import timedelta

# TODO: Does this example make semantic sense?
labeler = CodeLabeler(outcome_codes=["ICD10CM/E11.4"], time_horizon=TimeHorizon(timedelta(0), timedelta(365)))

labels = labeler.label(patient_db[3])
labels

[Label(time=datetime.datetime(1970, 1, 7, 0, 0), value=False),
 Label(time=datetime.datetime(1990, 1, 7, 0, 0), value=False),
 Label(time=datetime.datetime(2020, 7, 9, 0, 0), value=False),
 Label(time=datetime.datetime(2020, 8, 9, 0, 0), value=False)]

### Using Featurizers on the Patients

In [14]:
from femr.featurizers.featurizers import AgeFeaturizer


age_featurizer = AgeFeaturizer()

age_featurizer.featurize(patient_db[3], labels=labels, ontology=patient_db.get_ontology())

ValueError: Cannot compute variance with only 0 observations.