# Shanoir to Fed-BioMed converter - POC 2


Goal of the PoC: **provide a way to convert Shanoir dataset folders into Fed-BioMed medical folder dataset**
https://notes.inria.fr/HsVQiufgRIGNTZs2umh2DA#

**Description of the Shanoir dataset**: in this updated version, we handle a new type of dataset, containing the following specificities:



### Structure of  Shanoir Dataset

**1. basic Shanoir folder: Shanoir folder containing**:

 - **1 patient (record done on the 2024-05-03)**
 - **1 serie**
 - **several acquisitions constituing a 3d image**

   
```
Fed-BioMed_Example_ShUp_One_DICOM_Study_Phantom/
└── 1319E9-238B-11_2024_05_03_13_19_38_651
    ├── 1.4.9.12.34.1.8527.4108713574735248556281156520855496517752
    │   ├── 1.4.9.12.34.1.8527.1008064111753223271000195301476908655974.dcm
    │   ├── 1.4.9.12.34.1.8527.1024888726043321613379622790008460371276.dcm
    ...
    ├── import-job.json
    ├── nominative-data-job.xml
    └── upload-job.xml

```

**2. Another complex dataset, containing**:
* **2 patients**
* **1 modality each**
* **different series regarding the image modality used**

```
workFolder
├── 2032B8-7289-11_2024_08_20_15_19_40_983
│   ├── 1.4.9.12.34.1.8527.9190420633949044258273493601325945099590
│   │   ├── 1.4.9.12.34.1.8527.1033245378080688965564802929478239431354.dcm
│   │   ├── 1.4.9.12.34.1.8527.1036405439974105761803058139139517309811.dcm
...
│   ├── import-job.json
│   ├── nominative-data-job.xml
│   └── upload-job.xml
├── 8A-118B5-753B5_2024_08_20_15_11_36_849
│   ├── 1.4.9.12.34.1.8527.2113412453823604682567869637250601969230
│   │   ├── 1.4.9.12.34.1.8527.1017463926793201091226867287913393935142.dcm
...
│   │   └── 1.4.9.12.34.1.8527.9339932517692019105020851679467540742415.dcm
│   ├── 1.4.9.12.34.1.8527.3009454799332674087203494258205245364651
│   │   ├── 1.4.9.12.34.1.8527.1000246565108798698400749889204485890049.dcm
...
│   │   └── 1.4.9.12.34.1.8527.9325608176760251441966417937124838059276.dcm
│   ├── 1.4.9.12.34.1.8527.3131876202492289682835393691074293979434
│   │   ├── 1.4.9.12.34.1.8527.1109837059386394718736173763001851315204.dcm
...
│   │   └── 1.4.9.12.34.1.8527.9220051936018747611752876514965567446653.dcm
│   ├── 1.4.9.12.34.1.8527.4164112393443258790186552529327488575550
│   │   ├── 1.4.9.12.34.1.8527.1137130666940367323865649738623725229183.dcm
...
│   │   └── 1.4.9.12.34.1.8527.9308443060601907228417563222217683184040.dcm
│   ├── 1.4.9.12.34.1.8527.5146138466355875700234504467782152743194
│   │   ├── 1.4.9.12.34.1.8527.1239448152484875728572659021360403579923.dcm
...
│   │   └── 1.4.9.12.34.1.8527.5308747961323743069701581462251244703029.dcm
│   ├── import-job.json
│   ├── nominative-data-job.xml
│   └── upload-job.xml
└── tmp

```

**General Assumptions**:

1. a patient visit == a patient study
2. a single visit per patient (not supporting multiple visits per patients over time)

**Difficulties**:
1. no notion of `series` in Fed-BioMed
2. no notion of `modalities` in Shanoir Dataset
3. images difficult to parse from Dicom (Shanoir Dataset) to Niftii (Fed-BioMed) -> how is it done in Shanoir?

In [None]:
import pydicom as dicom
import matplotlib.pylab as plt
import json
import pandas as pd
import numpy as np
import os
import re
import nibabel
import shutil

from datetime import datetime, timedelta
# you will also need dcm2niix
#!conda install -c conda-forge dcm2niix 

## 1. Parse Shnoir Folder and get patient information

In this section we will try to guess the patient_ids and their modalities given their file name

Shanoir dataset specificities:
- one patient can have several series
- 2 folders can correspond to the same patient (with different modalities)
  

**NOTA**: date in the folder name doesnot match the one in the seriesDate

In [None]:
shanoir_datasets = {'1': {'name': 'Fed-BioMed_Example_ShUp_One_DICOM_Study_Phantom', 'version': 'v1'},
                    '2': {'name': 'Fed-BioMed_Example_ShUp_2', 'version': 'v1'},
                   '3': {'name': 'workFolder', 'version': 'v2'} ,
                   '4': {'name': 'workFolder_2', 'version': 'v2'},
                   '5': {'name': 'workFolder_3', 'version': 'v2'}}



In [None]:
# constant
SHANOIR_DATASET_NUMBER = '5'

REMOVE_CORRECTIONS = True
TIMEDELTA = timedelta(days=32)

In [None]:
# field mapper: created to encompass different way for parsing `import-job.json`, depending on Shanoir dataset versions


field_mapper = {'v1': {
    'patient_id': lambda demogr: demogr['patients'][0]['patientID'],
    'patient_data' : lambda demogr: (demogr['patients'][0]['studies'][0]['series'][i] for i in range(len(demogr['patients'][0]['studies'][0]['series'])))},
                'v2': {
                    'patient_id': lambda demogr: demogr['subject']['identifier'],
                    'patient_data': lambda demogr: (demogr['selectedSeries'][i] for i in range(len(demogr['selectedSeries'])))
                }
               }

# version selected 

version = shanoir_datasets[SHANOIR_DATASET_NUMBER]['version']

# global variables
patient_info_func = field_mapper[version]['patient_id']
patient_data_func = field_mapper[version]['patient_data']

In [None]:
from typing import Iterable


def list_to_datetime_converter(date: list):
    return datetime(date[0], date[1], date[2])

def avg_dates(dates: Iterable):
  reference_date = datetime(1900, 1, 1)
  return str(reference_date + sum([date - reference_date for date in dates], timedelta()) / len(dates))
    
def get_patient_id_from_file_name(name: str):
    """Extracts patient_id and date"""
    match = re.search(r'_\d{4}_\d{2}_\d{2}_\d{2}_\d{2}_\d{2}_', name)
    if match is None:
        print("discarding ", name)
        return None
    patient_id = name[:int(match.span()[0])]
    #match_date = re.search(r'\d{4}_\d{2}_\d{2}_\d{2}_\d{2}_\d{2}', match.group())
    #date = datetime.strptime(match_date.group(), '%Y_%m_%d_%H_%M_%S')
    return patient_id

def refine_shanoir_dataset_patients(folder_path:str, time_delta) -> dict:
    """Detects and refines patients and their modality
    Considers that modality belongs to same patient if the folders date are close (less than timedelta arg)
    """
    n_visit = 0
    refined_patients = {f't_{n_visit}': {}}
    for detected_patient in os.listdir(folder_path):
        patient_file_id = get_patient_id_from_file_name(detected_patient)
        if patient_file_id is None:
            continue
        patient_json_id, series = parse_shanoir_json(os.path.join(folder_path, detected_patient, 'import-job.json'))
        dates = []
        for serie_entry in series:
            date = serie_entry['seriesDate']
            dates.append(list_to_datetime_converter(date))
            
        if patient_file_id not in refined_patients[f't_{n_visit}']:
                refined_patients[f't_{n_visit}'][patient_file_id] = {'date': dates, 'modalities': [detected_patient]}
        else:
            check1 = min(dates) - max(refined_patients[f't_{n_visit}'][patient_file_id]['date']) <= time_delta and min(dates) - max(refined_patients[f't_{n_visit}'][patient_file_id]['date']) >= timedelta(0) 
            check2 = min(refined_patients[f't_{n_visit}'][patient_file_id]['date']) - max(dates) <= time_delta and  min(refined_patients[f't_{n_visit}'][patient_file_id]['date']) - max(dates) >= timedelta(0)
            if check1 or check2:
                print("same patient, different modality detected")
    
                refined_patients[f't_{n_visit}'][patient_file_id]['modalities'].append(detected_patient)
                refined_patients[f't_{n_visit}'][patient_file_id]['date'].extend(dates)
            else:
                n_visit += 1
                refined_patients[f't_{n_visit}'] = {patient_file_id: {'date': dates, 'modalities': [detected_patient]}}
                print("same patient, different visit detected")
                print("WARNING: several visits for same patient for a given modality is not yet supported by the poc and by Fed-BioMed")
    return refined_patients

In [None]:
main_folder_path = os.path.join(os.getcwd(), 'data', 'shanoir', shanoir_datasets[SHANOIR_DATASET_NUMBER]['name'])


refined_patients = refine_shanoir_dataset_patients(main_folder_path, TIMEDELTA)

In [None]:
refined_patients

### understanding `import-job.json` structure

`import-job.json` is sorted as follows:
- `patients` entry holds all the patients considered in the dataset, and their `studies` (*first dataset*)
- `studies` entry holds all studies for a given patient. A study has one or several `series` 
- `series` entry holds one or several patient acquisitions (image data) for a given study. Image data can have several `instances`, which correspond to all files of given patient acquisition. `series` can have different modalities (eg CT, XR, ...)
- `subject` entry holds patient details (*second dataset*)
- `selectedSeries` holds patient dataset (its `studies`) (*second dataset*)

## 2. Creating Fed-BioMed demographics csv file

**GENERAL ASSUMPTIONS**

we assume that there is only one patient per studies and one study per `import-job.json` files

In [None]:
def create_fedbiomed_demographics(shanoir_folder_path: str, refined_patients: dict):
    # collect all modalities
    modalities_detected = set()
    csv_demographics = pd.DataFrame(columns=['patientID',  'date'])  #'protocolName'
    
    modalities_mapper = {}
    demographic_entry = 0
    
    for patient_file_id, patient_details in refined_patients.items():
        # extract information about patient / study from the `import-job.json`
        patient_modalities = patient_details['modalities']
    
        # add patient id
        
        modalities_mapper.update({patient_file_id: {}})
        for patient_modality in patient_modalities:
            
            patient_id, series = parse_shanoir_json(os.path.join(main_folder_path, patient_modality, 'import-job.json'))
            for j, serie_entry in enumerate(series):
                
                if  modalities_mapper[patient_id].get(serie_entry['modality']) is None:
                    modalities_mapper[patient_id][serie_entry['modality']] = []
                modalities_mapper[patient_id][serie_entry['modality']].append((serie_entry['seriesInstanceUID'], serie_entry['protocolName'], patient_modality))
                modalities_detected.add('modality_' + serie_entry['modality'] + '_' + serie_entry['protocolName'])
            
                # warning: should not itere over series, since series could be different modalities
                #modalities_mapper[patient_id].append((serie_entry['seriesInstanceUID'],  serie_entry['protocolName']))
            
        csv_demographics.loc[demographic_entry] = [patient_id,  avg_dates(patient_details['date'])]
        demographic_entry += 1
    return csv_demographics, modalities_mapper, modalities_detected

def parse_shanoir_json(import_json_path):

    with open(import_json_path, 'r') as f:
        demographics = json.load(f)
    try:
        patient_id = patient_info_func(demographics)
    except Exception as e:
        raise ValueError("Error in `patient_info_func`. Have you used the correct version? Details: ", e)
    
    try:
        series = patient_data_func(demographics)
    except Exception as e:
        raise ValueError("Error in `patient_data_func`. Have you used the correct version? Details: ", e)
    return patient_id, series

In [None]:
demographics, modalities_mapper, modalities_detected = create_fedbiomed_demographics(main_folder_path, refined_patients['t_0'])

In [None]:
demographics, modalities_mapper

We hence create the demographics csv file for Fed-BioMed medical folder dataset



We can have same patientID but with 2 modalities (case where date are different)

In [None]:
demographics

Display a few layers of the patient MRI

## 3. Convert dicom images into fedbiomed medical folder

### Creating folder to store data

now we convert the shanoir images into fedbiomed images

FedBioMed medical folder dataset  should have the folowing structure:

**For Fed-BioMed_Example_ShUp_One_DICOM_Study_Phantom**
```
├── fbm_medical_folder_dataset
        ├── 1319E9-238B-11
        │   └── modality_MR_t1_se_tra
        │       ├── t1_se_tra.json
        │       └── t1_se_tra.nii
        └── participants.csv


```

with :

- `1319E9-238B-11` being the patient folder
- `modality_MR_t1_se_tra` the modality image (in the provided dataset we only have one modailty: MR) with the protocol
- `participants.csv` the demographic file

**For Fed-BioMed second example**

```
fbm_medical_folder_dataset_workFolder_2
├── 2032B8-7289-11
│   └── modality_CT_CRANE_NR
│       ├── CRANE_NR.json
│       └── CRANE_NR.nii
├── 8A-118B5-753B5
│   ├── modality_MR_gre_field_mapping
│   │   ├── gre_field_mapping_e2_ph.json
│   │   └── gre_field_mapping_e2_ph.nii
│   ├── modality_MR_localizer
│   │   ├── localizer.json
│   │   └── localizer.nii
│   ├── modality_MR_loca_t2_tse_SAG
│   │   ├── loca_t2_tse_SAG.json
│   │   └── loca_t2_tse_SAG.nii
│   ├── modality_MR_MPRAGE_iso
│   │   ├── MPRAGE_iso.json
│   │   └── MPRAGE_iso.nii
│   └── modality_MR_t2_flair_3d_iso_PRESAT
│       ├── t2_flair_3d_iso_PRESAT.json
│       └── t2_flair_3d_iso_PRESAT.nii
└── participants.csv


```
Here I name the new nifti image file without dots (`.`), using only the protocol name (due to a [limitation of Fed-BioMed - issue 1105 ](https://github.com/fedbiomed/fedbiomed/issues/1105))

In [None]:


def create_fedbiomed_medical_dataset(medical_folder_dataset_path, demographics, modalities_mapper, remove_corrections: bool = True):
    # create demographics file
    demographics.to_csv(os.path.join(medical_folder_dataset_path, 'participants.csv'))

    for patient_id in modalities_mapper:
        patient_folder = os.path.join(medical_folder_dataset_path, patient_id)
        os.makedirs(patient_folder, exist_ok=True)
        
        for modality in modalities_mapper[patient_id]:
            for i in range(len(modalities_mapper[patient_id][modality])):
                serie, protocol, detected_patient = modalities_mapper[patient_id][modality][i]
            
                modality_folder = os.path.join(medical_folder_dataset_path, patient_id, f'modality_{modality}_{protocol}')
                os.makedirs(modality_folder, exist_ok=True)
                print(f"[LOG] - parsing {os.path.join(main_folder_path, detected_patient, serie)}")

                os.environ['OUTPUT_FOLDER'] =  modality_folder
                os.environ['PATIENT_FOLDER'] = os.path.join(main_folder_path, detected_patient, serie)
            
                !dcm2niix --terse -m y  -f %p -o $OUTPUT_FOLDER $PATIENT_FOLDER
                !echo $?
                # -y argument is for disabling the flipping
                # check if dicom to niftii converter has created several images, and remove the inappropriate one(s)
    
                if remove_corrections and len(os.listdir(modality_folder)) > 2:
                    for file in os.listdir(modality_folder):
                        if file.endswith('Tilt_1.nii'):
                            # remove gantry tilt file generated
                            os.remove(os.path.join(modality_folder, file))
                            print(f"[LOG] file removed: {file}")
                        if file.endswith('Eq_1.nii'):
                            # remove file got through equalization (if any)
                            os.remove(os.path.join(modality_folder, file))
                            print(f"[LOG] file removed: {file}")



In [None]:
# remove existing Fed-BioMed data folder 
medical_folder_dataset_path = os.path.join('data', 'shanoir', f'fbm_medical_folder_dataset_{os.path.basename(main_folder_path)}')

if os.path.exists(medical_folder_dataset_path):
    shutil.rmtree(medical_folder_dataset_path)

# create new folder for fedbiomed's medical folder dataset
os.makedirs(medical_folder_dataset_path,)


create_fedbiomed_medical_dataset(medical_folder_dataset_path, demographics, modalities_mapper, remove_corrections = REMOVE_CORRECTIONS)

Now we can load the newly created dataset into Fed-BioMed !

## 4. Load FedBioMed dataset into Fed-BioMed

In [None]:
#os.environ["FEDBIOMED_DIR"] = '../github/fedbiomed'

os.environ["FEDBIOMED_DIR"] = '..'
! $FEDBIOMED_DIR/scripts/fedbiomed_run node gui --data-folder $PWD start

**Warning**: the Shanoir dataset `workFolder` has an uncorrect structure: It cannot be loaded in Fed-BioMed due to the fact that patients have different modality each

For this dataset, we are going to complete patient dataset by copying data when modality is unavailable

## 3. Open Questions





## Dicom to niftii specificities:

- differences in the origins
- gantry tilt
- distance interslice
- localizer edge case (dicom specificities)


### differences in the origin

**NIFTII**

Uses the Tolairach-Tournoux Coordinates
- X: Increasing value toward the Right
- Y: Increasing value toward the Anterior
- Z: Increasing value toward the Superior

**DICOM**

- X: Increasing value toward the Left
- Y: Increasing value toward the Posterior
- Z: Increasing value toward the Superior


Hence resulting in a 90degree rotation
<img src="./imgs/300px-Dcm2nii_Mni_v_dicom.jpg" alt="img1" width="500"/>


### Gantry tilt
An angle formed between x-ray tube plane and the vertical plane. Ranges usualy between [-25degee, 25degree].

Useful for a better vizualisation of some features in the human body

Dicom has a specific entry for gantry tilt [0018, 1120], whereas NIFTII doesnot handle such case


On the image, left is the image with gantry tilt, and right without
![img2](./imgs/gantry_tilt.webp)



Use of the CT scan with a gantry tilt

![img3](./imgs/gantry_tilt_scan.png)



**Action**: I would suggest to keep the image without gantry tilt, so size of the niftii images are consistant with ohter dicom images (size will differ with the other images obtained through other modalities, making not possible to use it for )

In [None]:
dicoms_gt_folder = 'data/shanoir/workFolder/2032B8-7289-11_2024_08_20_15_19_40_983/1.4.9.12.34.1.8527.9190420633949044258273493601325945099590'
dicoms_gt_json = 'data/shanoir/workFolder/2032B8-7289-11_2024_08_20_15_19_40_983/import-job.json'
with open(dicoms_gt_json, 'r') as f:
    dicoms_gt_details = json.load(f)


dicoms_gt = [f for f in os.listdir(dicoms_gt_folder) if f.endswith('.dcm')]
niftii_no_gt = 'fbm_medical_folder_dataset_workFolder/2032B8-7289-11/modality_CT_CRANE_NR/CRANE_NR.nii'
niftii_gt = 'fbm_medical_folder_dataset_workFolder/2032B8-7289-11/modality_CT_CRANE_NR/CRANE_NR_Tilt_1.nii'

niftii_no_gt = nibabel.load(niftii_no_gt)
niftii_gt = nibabel.load(niftii_gt)

f, axarr = plt.subplots(3, 3)
d1 = dicoms_gt_details['selectedSeries'][0]['instances'][0]['sopInstanceUID']
d2 = dicoms_gt_details['selectedSeries'][0]['instances'][5]['sopInstanceUID']
d3 = dicoms_gt_details['selectedSeries'][0]['instances'][-1]['sopInstanceUID']

f.suptitle("Gantry tilt handling in dicom and niftii images", fontweight='semibold')
axarr[0,0].imshow(dicom.dcmread(os.path.join(dicoms_gt_folder, d1 + '.dcm')).pixel_array)
axarr[0,1].imshow(dicom.dcmread(os.path.join(dicoms_gt_folder, d2 + '.dcm')).pixel_array)
axarr[0,2].imshow(dicom.dcmread(os.path.join(dicoms_gt_folder, d3 + '.dcm')).pixel_array)


axarr[1, 0].imshow(niftii_no_gt.dataobj[:,:,0])
axarr[1,1].imshow(niftii_no_gt.dataobj[:,:,5])
axarr[1,2].imshow(niftii_no_gt.dataobj[:,:,-1])

axarr[2, 0].imshow(niftii_gt.dataobj[:,:,0])
axarr[2,1].imshow(niftii_gt.dataobj[:,:,5])
axarr[2,2].imshow(niftii_gt.dataobj[:,:,-1])

In [None]:
# retriving gantry tilt (at [0x0018, 0x1120])
for d in (d1, d2, d3):
    print("gantry tilt value: ", dicom.dcmread(os.path.join(dicoms_gt_folder, d + '.dcm'))[0x0018, 0x1120].value)

In [None]:

max_shape = max(niftii_no_gt.dataobj.shape, niftii_gt.dataobj.shape)
min_shape = min(niftii_no_gt.dataobj.shape, niftii_gt.dataobj.shape)

In [None]:
dim = 0
z = np.zeros((max_shape[0], max_shape[1], 3))

v = np.array(niftii_no_gt.dataobj)
v.resize(max_shape)
z[:,:, 0] = v[:, :, dim]  #red
z[:, :, 1] = niftii_gt.dataobj[:, :, dim]  #green


plt.imshow(z)
plt.title("niftii with no gantry tilt correction (red) vs with gantry tilt correction (green)")

## distance insterslice

While variation in distance between slice in dicom are accepted, it is not the case in niftii

Field "slice location" is not compatible with 

In this case, The converter applies an equalizer to the image (extrapollation)

In [None]:
import copy
dicoms_di_folder = 'workFolder/8A-118B5-753B5_2024_08_20_15_11_36_849/1.4.9.12.34.1.8527.5146138466355875700234504467782152743194'
dicoms_di_json = 'workFolder/8A-118B5-753B5_2024_08_20_15_11_36_849/import-job.json'

# extracting the import-job.json
with open(dicoms_di_json, 'r') as f:
    dicoms_di_details = json.load(f)


dicoms_di = [f for f in os.listdir(dicoms_di_folder) if f.endswith('.dcm')]
niftii_no_eq = 'fbm_medical_folder_dataset_workFolder/8A-118B5-753B5/modality_MR_localizer/localizer.nii'
niftii_eq = 'fbm_medical_folder_dataset_workFolder/8A-118B5-753B5/modality_MR_localizer/localizer_Eq_1.nii'

niftii_no_eq = nibabel.load(niftii_no_eq)
niftii_eq = nibabel.load(niftii_eq)

f, axarr = plt.subplots(3, 5)

for i in range(5):
    d1 = dicoms_di_details['selectedSeries'][0]['instances'][i]['sopInstanceUID']
    
    
    f.suptitle("interslice distance dicom and niftii images (equalizer applied or not)", fontweight='semibold')
    axarr[0,i].imshow(dicom.dcmread(os.path.join(dicoms_di_folder, d1 + '.dcm')).pixel_array)
    
    
    #axarr[0, 1].imshow(ds2.pixel_array)
    #axarr[0, 2].imshow(ds3.pixel_array)
    axarr[1, i].imshow(niftii_no_eq.dataobj[:,:,i])

    
    axarr[2, i].imshow(niftii_eq.dataobj[:,:,i])
    if i == 1:
        copied_array = copy.deepcopy(niftii_eq.dataobj[:,:,1])
    elif i > 1:
        copied_array += niftii_eq.dataobj[:,:,i]

Investigating: Getting the slice location for each dicom images

In [None]:
for i in range(5):
    d1 = dicoms_di_details['selectedSeries'][0]['instances'][i]['sopInstanceUID']
    print("image number", dicom.dcmread(os.path.join(dicoms_di_folder, d1 + '.dcm'))[0x0020,0x0013].value)
    print("slice location", dicom.dcmread(os.path.join(dicoms_di_folder, d1 + '.dcm'))[0x0020,0x1041].value)

In [None]:
#dicoms_gt_folder = 'workFolder/2032B8-7289-11_2024_08_20_15_19_40_983/1.4.9.12.34.1.8527.9190420633949044258273493601325945099590'
dicoms_gt_folder = 'workFolder/8A-118B5-753B5_2024_08_20_15_11_36_849/1.4.9.12.34.1.8527.3131876202492289682835393691074293979434'
dicoms_gt_json = 'workFolder/8A-118B5-753B5_2024_08_20_15_11_36_849/import-job.json'
with open(dicoms_gt_json, 'r') as f:
    dicoms_gt_details = json.load(f)

dicoms_gt = [f for f in os.listdir(dicoms_gt_folder) if f.endswith('.dcm')]
for i, dicom_img in enumerate(dicoms_gt_folder):
    d = dicoms_gt_details['selectedSeries'][1]['instances'][i]['sopInstanceUID']
    dicom.dcmread(os.path.join(dicoms_gt_folder, d + '.dcm'))
    print("image number", dicom.dcmread(os.path.join(dicoms_gt_folder, d + '.dcm'))[0x0020,0x0013].value)
    print("slice location", dicom.dcmread(os.path.join(dicoms_gt_folder, d + '.dcm'))[0x0020,0x1041].value)

In [None]:
dicoms_gt[-1]



Other issues:

Some of the images in directory are not in the `import-job.json` file: 

**should we consider them anyway?**

### Localizer edge case

Localizer (scout) images are the first scans acquired for any scanning session, and are used to plan the location for subsequent images. Localizers are not used in subsequent analyses (due to resolution, artefacts, etc). Localizers are often acquired with three orthogonal image planes (sagittal, coronal and axial). The NIfTI format requires that all slices in a volume are co-planar, so these localizers will generate naming conflicts. The solution is to use '-i y' which will ignore (not convert) localizers (it will also ignore derived images and 2D slices). This command helps exclude images that are not required for subsequent analyses.

This could expalain why localizer data are hard to process.

**Should we handle case where images are for localizer?**