- `conda activate mri`
  - (created in `0_setup.ipynb`)

---

- `jupyter lab` => open this file

---

- Selected Jupyter default kernel (`ipykernel`)

---

---

In [None]:
import pandas as pd

def get_stats(df_runs: pd.DataFrame,
              df_summary: pd.DataFrame,
              summary_to_wide) -> pd.DataFrame:
    """
    Compute median coverage + summary stats for one facet (either class_conditional=True or False).

    Parameters
    ----------
    df_runs : pd.DataFrame
        Must contain columns ['class_conditional','cal_test','variant_test_data','coverage', ...].
    df_summary : pd.DataFrame
        Must contain a matching 'class_conditional' column and be suitable for summary_to_wide().
    summary_to_wide : Callable
        Function that pivots df_summary into "wide" form with columns ['binom_p','fisher_p',...].

    Returns
    -------
    pd.DataFrame
        Columns ['class_conditional','cal_test','variant_test_data',
                 'median_cov','binom_p','fisher_p','prop_sig',
                 'q25_p','median_p','q75_p'].
    """
    # --- 1) check that each DF is all-True or all-False, and they agree ---
    runs_flags    = set(df_runs['class_conditional'])
    summary_flags = set(df_summary['class_conditional'])

    valid = ({True}, {False})
    if runs_flags not in valid or summary_flags not in valid:
        raise ValueError(
            "Each input must have class_conditional uniformly True or uniformly False"
        )
    if runs_flags != summary_flags:
        raise ValueError(
            "df_runs and df_summary must agree on the value of class_conditional"
        )

    # pull out the boolean
    is_conditional = runs_flags.pop()

    # --- 2) pick grouping columns based on that flag ---
    grp_cols = ['class_conditional', 'cal_test', 'variant_test_data']
    if is_conditional:
        grp_cols.append('class')

    # --- 3a) median coverage from raw runs ---
    summary_runs = (
        df_runs
        .groupby(grp_cols)['coverage']
        .median()
        .reset_index()
        .rename(columns={'coverage': 'median_cov'})
    )

    # --- 3b) pivot & median on summary stats ---
    wide = summary_to_wide(df_summary)
    summary_stats = (
        wide
        .groupby(grp_cols)
        .median()
        .reset_index()
    )

    # --- 3c) merge & pick final column order ---
    cols = [
        'class_conditional', 'cal_test', 'variant_test_data',
        'median_cov', 'binom_p', 'fisher_p', 'prop_sig',
        'q25_p', 'median_p', 'q75_p'
    ]
    result = (
        summary_runs
        .merge(summary_stats, on=grp_cols)
        [cols]
    )

    return result


# Scan Counts

---

In [1]:
import os
import glob
import pandas as pd

files_ms_muslim_15t = glob.glob(os.path.expanduser('~/dissertation/data/MRI/Muslim_et_al/Patient-*/*[0-9]-T2.nii'), recursive=True)         # Muslim et al.
files_ms_isbi_ph3_train = glob.glob(os.path.expanduser('~/dissertation/data/MRI/ISBI/training/training*/orig/*t2.nii.gz'), recursive=True)  # ISBI 2015
files_ms_isbi_ph3_test = glob.glob(os.path.expanduser('~/dissertation/data/MRI/ISBI/testdata_website/*/orig/*t2.nii.gz'), recursive=True)   # ISBI 2015
files_healthy_ph3 = glob.glob(os.path.expanduser('~/dissertation/data/MRI/IXI/*HH*T2.nii.gz'), recursive=True)                              # IXI
files_healthy_ph15 = glob.glob(os.path.expanduser('~/dissertation/data/MRI/IXI/*Guy*T2.nii.gz'), recursive=True)                            # IXI
files_healthy_ge15 = glob.glob(os.path.expanduser('~/dissertation/data/MRI/IXI/*IOP*T2.nii.gz'), recursive=True)                            # IXI

# MS patients
print(f"MS: Muslim et al. (Iraq) on 1.5T systems: {len(files_ms_muslim_15t)}\n")
print(f"MS: ISBI 2015 Challenge 5-patient longitudinal Philips 3T system: {len(files_ms_isbi_ph3_train) + len(files_ms_isbi_ph3_test)}\n")

# Healthy patients
print(f"Healthy: Hammersmith Hospital using Philips 3T system: {len(files_healthy_ph3)}\n")
print(f"Healthy: Guy\'s Hospital using Philips 1.5T system: {len(files_healthy_ph15)}\n")
print(f"Healthy: Institute of Psychiatry using GE 1.5T system: {len(files_healthy_ge15)}\n")

MS: Muslim et al. (Iraq) on 1.5T systems: 60

MS: ISBI 2015 Challenge 5-patient longitudinal Philips 3T system: 82

Healthy: Hammersmith Hospital using Philips 3T system: 185

Healthy: Guy's Hospital using Philips 1.5T system: 319

Healthy: Institute of Psychiatry using GE 1.5T system: 74



---

---

---

# Datasets

---

# The 2015 Longitudinal MS Lesion Segmentation Challenge: Data

https://smart-stats-tools.org/lesion-challenge-2015

---
The Data Set Summary Table (below) includes demographic details for the training data and both test data sets. The top line is the information of the entire data set, while subsequent lines within a section are specific to the patient diagnoses. The codes are __RR__ for relapsing remitting MS, __PP__ for primary progressive MS, and __SP__ for secondary progressive MS. N (M/F) denotes the number of patients and the male/female ratio, respectively. Timepoints is the mean (and standard deviation) of the number of time-points provided to participants. Age is the mean age (and standard deviation), in years, at baseline. Follow-up is the mean (and standard deviation), in years, of the time between follow-up scans.

Data Set Summary Table  


|Data Set|N (M/F)|Time-Points Mean (SD)|Age Mean (SD)|Follow-Up Mean (SD)|
|---|---|---|---|---|
|Training|5 (1/4)|4 (±0.55)|43.5 (±10.3)|1.0 (±0.13)|
|RR|1 (1/3)|4.4 (±0.50)|43.5 (±10.3)|1.0 (±0.14)|
|PP|1 (0/1)|4.0|57.9|1.0 (±0.04)|
|Test|14 (3/11)|4.4 (±0.63)|39.3 (±8.9)|1.0 (±0.23)|
|RR|12 (3/9)|4.4 (±0.67)|39.2 (±9.6)|1.0 (±0.25)|
|PP|1 (0/1)|4.0|39.0|1.0 (±0.04)|
|SP|1 (0/1)|4.0|41.7|1.0 (±0.05)|

Each scan was imaged and preprocessed in the same manner, with data acquired on a 3.0 Tesla MRI scanner (Philips Medical Systems, Best, The Netherlands) using the following sequences: ...; __a double spin echo (DSE) which produces__ the PD-w and __T2-w images with TR = 4177 ms, TE1 = 12.31 ms, TE2 = 80 ms, & 0:82 × 0:82 × 2.2 mm3 voxel size__; and .... The imaging protocols were approved by the local institutional review board. Each subject underwent the following preprocessing: the baseline (first time-point) MPRAGE was inhomogeneity-corrected using N4 (Tustison et al., 2010), skull-stripped (Carass et al., 2007, 2010), dura stripped (Shiee et al., 2014), followed by a second N4 inhomogeneity correction, and rigid registration to a 1 mm isotropic MNI template. We have found that running N4 a second time after skull and dura stripping is 25 more effective (relative to a single correction) at reducing any inhomogeneity within the images. Once the baseline MPRAGE is in MNI space, it is used as a target for the remaining images. The remaining images include the baseline T2-w, PD-w, and FLAIR, as well as the scans from each of the follow-up time-points. These images are N4 corrected and 30 then rigidly registered to the 1 mm isotropic baseline MPRAGE in MNI space. Our registration steps are inverse consistent and thus any registration based biases are avoided (Reuter and Fischl, 2011) The skull & dura stripped mask from the baseline MPRAGE is applied to all the subsequent images, which are then N4 corrected again.

For each time-point of every subject’s scans in the Training Set and Test Set, the following data are provided: the original scan images consisting of T1-w MPRAGE, T2-w, PD-w, and FLAIR, as well as the preprocessed images (in MNI space) for each of the scan modalities. The Training Set also included manual delineations by two experts identifying and segmenting WMLs on MR images.

https://iacl.ece.jhu.edu/index.php/MSChallenge/data

---

---

# Muslim et al. - MS 1.5T Baghdad/Iraq

https://data.mendeley.com/datasets/8bctsm8jz7/1  (data)  

`~/dissertation/data/MRI/Muslim_et_al/Supplementary Table 1 for patient info .xlsx`  (demographic/metadata)  

`~/dissertation/data/MRI/Muslim_et_al/Supplementary Table 2 for  sequence parameters .xlsx`  (metadata)  

---

| Parameter                         | Description                                                                                                        |
|-----------------------------------|--------------------------------------------------------------------------------------------------------------------|
| **Dataset Source**                | Baghdad Teaching Hospital, Medical City Complex, Iraq                                                              |
| **Patient Count**                 | 60 confirmed MS patients                                                                                           |
| **Data Type**                     | NIfTI image format, segmented lesion masks for T1, T2, FLAIR MRI sequences                                         |
| **MRI Machines**                  | 1.5T MRI from 20 centers                                                                                           |
| **Segmentation Method**           | Consensus manual lesion segmentation by radiologist and neurologist experts                                        |
| **Demographics**                  | 46 females, 14 males; Age range: 15–56 years, average age 33                                                       |
| **EDSS Score Range**              | 0 to 6 (avg 2.3), with 78% of patients scoring below 4                                                             |
| **Clinical Metadata**             | Includes EDSS, general patient data, clinical exams across neurological functions                                  |
| **MRI Acquisition Dates**         | Between 2019–2020                                                                                                  |
| **MRI Sequences**                 | T1-weighted, T2-weighted, and FLAIR                                                                                |
| **Data Accessibility**            | Available on Mendeley Data [10.17632/8bctsm8jz7.1](https://data.mendeley.com/datasets/8bctsm8jz7/1)                |

---

The patient’s MRI were acquired on 1.5 Tesla came from __twenty different__ centres with different MRI sequence parameters as listed in supplementary Table 2.  

https://doi.org/10.1016/j.dib.2022.108139

In [2]:
apprx_healthcare_contexts = \
    pd.read_excel('~/dissertation/data/MRI/Muslim_et_al/Supplementary Table 2 for  sequence parameters .xlsx', 
                  header=1, 
                  usecols=[0, 8, 9, 10, 11, 12]).groupby(['Slice Thickness .2', 'Spacing Between Slices.2']).count()

print(f'Publication says MRI data is from 20 sites.\nUnique thickness/spacing combinations from T2 scans: {len(apprx_healthcare_contexts)}')
apprx_healthcare_contexts

Publication says MRI data is from 20 sites.
Unique thickness/spacing combinations from T2 scans: 19


Unnamed: 0_level_0,Unnamed: 1_level_0,ID,Spacing Between Slices.1,Repetition Time (TR).2,Echo Time (TE).2
Slice Thickness .2,Spacing Between Slices.2,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3.0,4.86,1,1,1,1
3.5,6.3,1,1,1,1
4.0,5.2,1,1,1,1
4.5,4.725,1,1,1,1
5.0,5.25,1,1,1,1
5.0,5.5,9,9,9,9
5.0,5.75,1,1,1,1
5.0,5.849,1,1,1,1
5.0,6.0,11,11,11,11
5.0,6.15,1,1,1,1


Publication says MRI data is from 20 sites 

=>

Unique thickness/spacing combinations from T2 scans: 19

---

---

# IXI Data - Normal Healthy

- `~/dissertation/data/MRI/IXI.xls`    
  - IXI_ID
  - SEX_ID
  - HEIGHT
  - WEIGHT
  - ETHNIC_ID
  - MARITAL_ID
  - OCCUPATION_ID
  - QUALIFICATION_ID
  - DATE_AVAILABLE
  - STUDY_DATE
  - AGE

---

Hammersmith Hospital Philips 3T Parameters
---
```
Scanner: Philips Medical Systems Intera 3T

T2 parameters:

Repetition time = 8178.34
Echo time = 100
Number of Phase Encoding Steps = 187
Echo Train Length = 16
Reconstruction Diameter = 240
Flip Angle = 90
```
https://brain-development.org/scanner-philips-medical-systems-intera-3t/

---

Guy's Hospital Philips 1.5T Parameters
---
```
Scanner: Philips Medical Systems Gyroscan Intera 1.5T

T2 parameters:

Repetition time = 8178.34
Echo time = 100
Number of Phase Encoding Steps = 187
Echo Train Length = 16
Reconstruction Diameter = 240
Flip Angle = 90
```
https://brain-development.org/scanner-philips-medical-systems-gyroscan-intera-1-5t/

---

Institute of Psychiatry using a GE 1.5T system
---

(details of the scan parameters not available at the moment)

---

---