## MRSegmentator v1.1 VS v1.2
TLDR: Better on NAKO data, same perfomance on other datasets

In our [preprint](https://arxiv.org/pdf/2405.06463) (version 1.1) we evaluated MRSegmentator on external data. At the time of writing no suitable external dataset existed so we created our own. For this we annoated 900 scans from the NAKO dataset.
Now, another dataset has been released, which we can evaluate our model on: the TotalSegmentator MRI dataset ([zenodo](https://zenodo.org/records/11367005)).

Consequently, we moved the NAKO data to the training partition and retrained MRSegmentator (version 1.2). Below you can find the updated evaluation using our test datasets:

|Dataset|MRSegmentator v1.1|MRSegmentator v1.2|
| :-------- | :-------: | :-------: |
| NAKO GRE MRI | 0.88  | 0.91* |
| NAKO T2-HASTE MRI | 0.85 | 0.89* |
| AMOS MRI | 0.79 | 0.79 |
| AMOS CT | 0.84 | 0.84 |
| TotalSegmentator MRI |0.74 | 0.74 |

*We evaluated the NAKO scans in a 5-fold cross-validation setting. 

Segmentation performance has been largely improved for the NAKO data, while performance on the other datasets remained the same. This suggests that the performance increase is in some-parts caused by the bias that comes from changing the NAKO dataset from an "external" to an "internal" test/validation dataset.
Nevertheless, given that version 1.2 is as good as version 1.1 on the other datasets and better for NAKO data, we believe it to be the better version overall. 

In [1]:
import pandas as pd

pd.options.display.float_format = "{:,.2f}".format

data = pd.read_csv("csvs/classwise_results.csv", index_col=0)
averaged_data = pd.DataFrame(columns=["MRSegmentator v1.1", "MRSegmentator v1.2"])
averaged_data.loc["NAKO GRE MRI"] = data["NAKO GRE v1.1"].mean(), data["NAKO GRE v1.2"].mean()
averaged_data.loc["NAKO T2-HASTE MRI"] = data["NAKO T2 v1.1"].mean(), data["NAKO T2 v1.2"].mean()
averaged_data.loc["AMOS MRI"] = data["AMOS MRI v1.1"].mean(), data["AMOS MRI v1.2"].mean()
averaged_data.loc["AMOS CT"] = data["AMOS CT v1.1"].mean(), data["AMOS CT v1.2"].mean()
averaged_data.loc["TotalSegmentator MRI"] = (
    data["TotSeg MRI v1.1"].mean(),
    data["TotSeg MRI v1.2"].mean(),
)
averaged_data

Unnamed: 0,MRSegmentator v1.1,MRSegmentator v1.2
NAKO GRE MRI,0.88,0.91
NAKO T2-HASTE MRI,0.85,0.89
AMOS MRI,0.79,0.79
AMOS CT,0.84,0.84
TotalSegmentator MRI,0.74,0.74


In [2]:
data

Unnamed: 0,NAKO GRE v1.1,NAKO GRE v1.2,NAKO T2 v1.1,NAKO T2 v1.2,AMOS MRI v1.1,AMOS MRI v1.2,AMOS CT v1.1,AMOS CT v1.2,TotSeg MRI v1.1,TotSeg MRI v1.2
spleen,0.91,0.93,0.91,0.92,0.95,0.94,0.95,0.95,0.88,0.89
right_kidney,0.95,0.97,0.95,0.97,0.95,0.95,0.94,0.94,0.88,0.89
left_kidney,0.93,0.95,0.95,0.96,0.95,0.94,0.94,0.94,0.77,0.77
gallbladder,0.74,0.74,0.75,0.83,0.74,0.73,0.8,0.8,0.83,0.84
liver,0.96,0.97,0.95,0.96,0.96,0.96,0.96,0.96,0.82,0.82
stomach,0.91,0.93,0.84,0.88,0.87,0.87,0.89,0.89,0.86,0.85
pancreas,0.82,0.85,0.8,0.84,0.81,0.79,0.79,0.79,0.45,0.44
right_adrenal_gland,0.69,0.74,0.72,0.75,0.56,0.54,0.7,0.69,0.56,0.59
left_adrenal_gland,0.69,0.74,0.69,0.69,0.56,0.53,0.69,0.69,0.66,0.65
left_lung,0.96,0.97,0.96,0.96,,,,,0.92,0.92
