# Labeling

A core component of FEMR is labeling patients.

Labels are represented and stored within FEMR as instances of the `Label` class.

A `Label` object contains the following two attributes:

* `time` (datetime): The prediction time when the model should attempt to predict the label
* `value` (bool | int | float | SurvivalValue): The target to predict

The definition of the `Label` class can be [found here](https://github.com/som-shahlab/femr/blob/main/src/femr/labelers/core.py#L51)

Label.value has a dynamic type to reflect the different types of labels that might be used in clinical ML, including boolean, numeric, categorical, and survival labels.

In [1]:
import shutil
import os

TARGET_DIR = 'trash/tutorial_3'

if os.path.exists(TARGET_DIR):
    shutil.rmtree(TARGET_DIR)

os.mkdir(TARGET_DIR)

In [2]:
import femr.labelers
import datetime

# Predict False on March 2nd, 1994
example_label = femr.labelers.Label(time=datetime.datetime(1994, 3, 2), value=False)

# Predict True on March 2nd, 2009
example_label2 = femr.labelers.Label(time=datetime.datetime(2000, 3, 2), value=True)

# Label Storage

Labels are stored with the femr [`LabeledPatients` class](https://github.com/som-shahlab/femr/blob/main/src/femr/labelers/core.py#L96), which is conceptually a mapping between patient ids and the labels for that patient, with a bit of extra metadata and functions.

This class is constructed from a dictionary of labels and a string label type indicator.

LabeledPatients is serialized to disk using pickle.

# Generating Labels Manually

You can manually create labels, or import them from outside FEMR, by simply constructing the appropriate Label and LabeledPatients classes and serializing them to disk using the save function.

Note that this is generally not recommended because outside labels might not be in-sync with FEMR's definition of time.

In [3]:
import pickle

# Note that a patient can have multiple labels

label_map = {
    2: [example_label],
    6: [example_label, example_label2],
}

labels = femr.labelers.LabeledPatients(label_map, labeler_type="boolean")

print(labels[6])

# This class also contains some useful helpers
print(labels.as_numpy_arrays())

labels.save(os.path.join(TARGET_DIR, 'manual_example.csv'))

[Label(time=datetime.datetime(1994, 3, 2, 0, 0), value=False), Label(time=datetime.datetime(2000, 3, 2, 0, 0), value=True)]
(array([2, 6, 6]), array([False, False,  True]), array([datetime.datetime(1994, 3, 2, 0, 0),
       datetime.datetime(1994, 3, 2, 0, 0),
       datetime.datetime(2000, 3, 2, 0, 0)], dtype=object))


# Generating Labels Programatically Within FEMR

FEMR also supports algorithmically generating labels through the use of a labeling function class. Generating labels within FEMR using this approach is garanteed to be in-sync with FEMR's definition of time and is highly recommended.

The core for FEMR's labeling code is the abstract base class [Labeler](https://github.com/som-shahlab/femr/blob/main/src/femr/labelers/core.py#L251).

Labeler has two abstract methods:

```
def label(self, patient: Patient) -> List[Label]:
    Generate a list of labels for a patient

def get_labeler_type(self) -> LabelType:
    Get the type of the labelers
```

Once these two methods are implemented, the apply function becomes available for generating labels on a particular PatientDatabase.

In [4]:
from typing import List

class IsMaleLabeler(femr.labelers.Labeler):
    # Dummy labeler to predict gender at birth
    
    def label(self, patient: femr.Patient) -> List[femr.labelers.Label]:
        is_male = any('Gender/Gender' in event.code and event.value == "M" for event in patient.events)
        return [femr.labelers.Label(time=patient.events[1].start, value=is_male)]
    
    def get_labeler_type(self) -> femr.labelers.LabelType:
        return "boolean"
    
labeler = IsMaleLabeler()
labeled_patients = labeler.apply(path_to_patient_database="input/extract")

for i in range(10):
    print(labeled_patients[100 + i])
    
# Serialize with pickle

labeled_patients.save(os.path.join(TARGET_DIR, 'programatic.csv'))

[Label(time=datetime.datetime(1990, 11, 30, 0, 0), value=True)]
[Label(time=datetime.datetime(1991, 4, 15, 0, 0), value=False)]
[Label(time=datetime.datetime(1992, 4, 27, 0, 0), value=True)]
[Label(time=datetime.datetime(1991, 6, 28, 0, 0), value=True)]
[Label(time=datetime.datetime(1990, 12, 2, 0, 0), value=False)]
[Label(time=datetime.datetime(1990, 5, 21, 0, 0), value=True)]
[Label(time=datetime.datetime(1992, 7, 13, 0, 0), value=True)]
[Label(time=datetime.datetime(1991, 5, 3, 0, 0), value=True)]
[Label(time=datetime.datetime(1991, 3, 5, 0, 0), value=False)]
[Label(time=datetime.datetime(1990, 11, 7, 0, 0), value=True)]
