<h1 style="text-align:center">
Processing Annotations
</h1>

In [2]:
from utils import cluster_annots
import pylidc as pl
import numpy as np
import pandas as pd

## Importing and Handling Annotations

We used the same method as in **[4]**, producing a consensus list of nodules where *"the malignancy rating assigned was the average of the malignancy ratings assigned by the radiologists who annotated the nodule, rounded to the nearest integer"*.

1. Handling Multiple Radiologist Annotations: 
 > Since each nodule is annotated by four radiologists, the approach is to aggregate their assessments into a single set of averaged features per nodule, obtaining a consesus-like representation for each nodule.

2. Averaging Annotations:
> For each feature, the values across radiologists are stored in individual lists and then the mean of each list is computed using **np.mean()**, providing an averaged feature for the nodule. 

3. Rounding to Integer
> The feature values are averaged as floats, but the final results are cast to integers. This decision is based on the fact that the original features are categorical and are represented as discrete values.

4. Handling Missing Data
> We decided to skip scans with no annotations

In [None]:
patients_nodules_features = []

for scan in pl.query(pl.Scan):
    if len(scan.annotations) == 0:
        continue

    nodules = cluster_annots(scan)

    for nodule in nodules:
        """nodule is a list of annotations (from the various radiologists)"""

        #the goal is to calculate the average of the annotations, so the end result is one annotation for each nodule
        texture_list = []
        spiculation_list = []
        lobulation_list = []
        margin_list = []
        sphericity_list = []
        calcification_list = []
        internal_structure_list = []
        subtlety_list = []
        malignancy_list = []

        for annotation in nodule:
            texture_list.append(annotation.texture)
            spiculation_list.append(annotation.spiculation)
            lobulation_list.append(annotation.lobulation)
            margin_list.append(annotation.margin)
            sphericity_list.append(annotation.sphericity)
            calcification_list.append(annotation.calcification)
            internal_structure_list.append(annotation.internalStructure)
            subtlety_list.append(annotation.subtlety)
            malignancy_list.append(annotation.malignancy)

        # TODO: maybe round instead of int?
        features = {
            "ID": nodule[0].id,
            "Scan_ID": nodule[0].scan_id,
            "Patient_ID": scan.patient_id,
            "Texture": round(np.mean(texture_list)),
            "Spiculation": round(np.mean(spiculation_list)),
            "Lobulation": round(np.mean(lobulation_list)),
            "Margin": round(np.mean(margin_list)),
            "Sphericity": round(np.mean(sphericity_list)),
            "Calcification": round(np.mean(calcification_list)),
            "Internal Structure": round(np.mean(internal_structure_list)),
            "Subtlety": round(np.mean(subtlety_list)),
            "Malignancy": round(np.mean(malignancy_list)),
        }

        patients_nodules_features.append(features)


Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.
Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.
Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.
Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.
Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.
Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.
Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.
Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.
Failed to reduce all groups to <= 4 Annotations.
Some nodules may be close and must be grouped manually.
Failed to reduce all groups to <= 4 Annotations.
Some n

In [None]:
df_annotations = pd.DataFrame(patients_nodules_features)
df_annotations.to_csv('annotations_ds.csv')
df_annotations

Unnamed: 0,ID,Scan_ID,Patient_ID,Texture,Spiculation,Lobulation,Margin,Sphericity,Calcification,Internal Structure,Subtlety,Malignancy
0,2,1,LIDC-IDRI-0078,5,2,2,3,4,6,1,4,4
1,1,1,LIDC-IDRI-0078,4,2,3,3,4,6,1,5,4
2,8,1,LIDC-IDRI-0078,5,1,1,5,5,5,1,4,1
3,3,1,LIDC-IDRI-0078,5,3,3,3,4,5,1,5,4
4,16,2,LIDC-IDRI-0069,5,4,4,4,4,6,1,2,3
...,...,...,...,...,...,...,...,...,...,...,...,...
2656,6850,1016,LIDC-IDRI-0639,3,3,2,2,4,6,1,4,4
2657,6851,1016,LIDC-IDRI-0639,1,2,1,2,4,6,1,2,4
2658,6856,1017,LIDC-IDRI-0638,5,1,2,4,4,6,1,3,4
2659,6855,1017,LIDC-IDRI-0638,5,1,1,5,4,6,1,5,2
