# Include Charité Data for Finetuning
The Charité dataset consists of multiple abdominal T1, T2 and T1fs weighted MIR scans.
20 classes (subset `A`) are manually annotated.

The method `map_merge_charite` in [map_merge.py](../scripts/map_merge.py) adds automatic segmentntions of the remaining 20 classes (subset `B`) to the annotations.
I expect that using those new labels for finetuning will not affect segmentation quality of subset `B`, but will increase segmentation quality concerning subset `A`.

This notebook creates a combined csv file for annotations of both UKKB and Charité data.

In [5]:
import pandas as pd

# private libraries
import sys

if "../scripts" not in sys.path:
    sys.path.insert(1, "../scripts")
import config

## Load Charité Annotations

In [11]:
# load annotation documentation
data = pd.read_csv(config.mr_label_path + "annotations.csv")

# Rename columns
data = data.rename({"FinalPath": "label"}, axis=1)

# remove annotations that are severly incomplete
_2BeRemoved = data.loc[data["Remove"]].index
data = data.drop(_2BeRemoved).reset_index(drop=True)

print(
    f"Loaded data for {len(data)} annotations, {len(_2BeRemoved)} additional ones have been excluded."
)
for seq in data["Seq"].unique():
    print(f"{seq[:4]}:\t {sum(data['Seq'] == seq)} sequences")

# Save as csv
data["image"] = "MR/" + data["ID"].apply(lambda x: str(x)) + "/" + data["Seq"] + "/image.nii"
data["pred"] = data["label"].apply(lambda x: x.split("/")[1])
charite_data = data[["Seq", "label", "image", "pred"]].rename({"Seq": "seq"}, axis=1)
charite_data.to_csv(config.ukbb + "csv/charite_annotations.csv", index=False)

Loaded data for 221 annotations, 4 additional ones have been excluded.
T1:	 90 sequences
T1fs:	 67 sequences
T2:	 64 sequences


## Combine Charité Annotations with UKBB Annotations

In [None]:
# load annotation documentation (except of small test set 'test_finetuninf.csv')
ukbb_data = pd.read_csv(config.ukbb + "train_finetuning.csv")[
    ["dixon_type", "label", "image"]
].rename({"dixon_type": "seq"}, axis=1)
ukbb_data["label"] = config.ukbb + "annotations/" + ukbb_data["label"]
ukbb_data["image"] = config.ukbb + "nifti/" + ukbb_data["image"]

charite_data["label"] = (
    config.ukbb + "preds_charite_combined/" + charite_data["label"].apply(lambda x: x.split("/")[1])
)
charite_data["image"] = config.mr_path + charite_data["image"]
charite_data.iloc[0]["label"]

# Combine charite and ukbb annotations
combined_data = pd.concat(
    [
        ukbb_data,
        charite_data,
    ]
).reset_index(drop=True)
combined_data.sample(frac=1, random_state=13).to_csv(
    config.ukbb + "csv/charite_ukbb_combined.csv", index=False
)