You can download and run this notebook locally, or you can run it for free in a cloud environment using Colab or Sagemaker Studio Lab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kirbyju/nifti-curation/blob/main/Examples.ipynb)

[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github.com/kirbyju/nifti-curation/blob/main/Examples.ipynb)

# Summary

This notebook contains examples for testing out the functionality in curation.py.

# Setup
Install the relevant packages if you haven't already.

In [None]:
import os

!{sys.executable} -m pip install --upgrade -q pandas
!{sys.executable} -m pip install --upgrade -q nibabel
!{sys.executable} -m pip install --upgrade -q matplotlib

Next we'll import curation.py and use some customized BraTS LGG files originally obtained from The Cancer Imaging Archive at https://doi.org/10.7937/K9/TCIA.2017.GJQ7R0EF to test the functions. These images can be found in the **data** folder of this repository. They were purposefully modified to create some duplicate pixel data under different file names to demonstrate some of the functionality shown below.

In [29]:
import curation
import pandas as pd

# these ones are only necessary for the last example
import nibabel as nib
import nilearn.plotting as nlp
import matplotlib.pyplot as plt
from nilearn.image import resample_img
import textwrap

# Check for duplicate data
The **niftiDups(data_dir, format)** function allows you to specify a folder and it will compare the hashes of *.nii or *.nii.gz files to see if any contain the same image pixel content.  The first column of the output will contain the hash values and the second column contains the file paths.  Files with duplicate content will have the same hash value in the first column.

Results will be returned as a dataframe or you can optionally specify **format = "csv"** as a parameter to save a CSV file.  

Windows users should specify paths with forward slashes, e.g. **"C:/data/"**.



In [2]:
curation.niftiDups("data", format = "csv")

2024-03-25 17:27:41,232:INFO:CSV file created: nifti_duplicates_2024-03-25_17-27.csv


Unnamed: 0,Hash,File Path
0,20ccab5b8288786bbf5000159d82ebcc6cb74de303e734...,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t2.n...
1,20ccab5b8288786bbf5000159d82ebcc6cb74de303e734...,data/TCGA-CS-5393/TCGA-CS-4942_1997.02.22_t3.n...
2,fc40c798c881622f6bc796b54b7eaebaa916665a6412b2...,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_flai...
3,fc40c798c881622f6bc796b54b7eaebaa916665a6412b2...,data/TCGA-CS-4944/TCGA-CS-4942_1997.02.22_flai...


# Analyze header data

The **niftiHeaderAnalysis(path, unique, format)** function can be pointed at a directory to generate a dataframe or spreadsheet of header data from all .nii or .nii.gz files in that directory.  Specify **format = "csv"** to create the spreadsheet.


In [3]:
curation.niftiHeaderAnalysis("data", format = "csv")

2024-03-25 17:28:19,730:INFO:CSV file created: nifti_metadata_2024-03-25_17-28.csv


Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic
0,TCGA-CS-4942_1997.02.22_t2.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t2.n...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
1,TCGA-CS-4942_1997.02.22_t1Gd.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1Gd...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
2,TCGA-CS-4942_1997.02.22_flair.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_flai...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
3,TCGA-CS-4942_1997.02.22_GlistrBoost_ManuallyCo...,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_Glis...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
4,TCGA-CS-4942_1997.02.22_GlistrBoost.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_Glis...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
5,TCGA-CS-4942_1997.02.22_t1.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1.n...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
6,TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_Glis...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
7,TCGA-CS-4944_2001.02.08_t1.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_t1.n...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
8,TCGA-CS-4944_2001.02.08_GlistrBoost_ManuallyCo...,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_Glis...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
9,TCGA-CS-4944_2001.02.08_flair.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_flai...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'



You can also use the optional parameter **unique = "yes"** to distill the results down to unique values for each column.

**Note:** When using the unique option, there is no meaningful relationship between the items in any given row. Each column is just an independent list of unique values that were contained in that field across all files.

In [4]:
curation.niftiHeaderAnalysis("data", unique = "yes", format = "csv")

2024-03-25 17:28:44,130:INFO:CSV file created: nifti_metadata_2024-03-25_17-28.csv


Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic
0,TCGA-CS-4942_1997.02.22_t2.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t2.n...,348.0,b'',b'',0.0,0.0,b'r',0.0,[ 3 240 240 155 1 1 1 1],...,0.0,1.0,-0.0,239.0,0.0,[-1. 0. 0. -0.],[ 0. -1. 0. 239.],[0. 0. 1. 0.],b'',b'n+1'
1,TCGA-CS-4942_1997.02.22_t1Gd.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1Gd...,,,,,,,,,...,,,0.0,,-0.0,[-1. -0. -0. -0.],[ -0. -1. -0. 239.],[ 0. 0. 1. -0.],,
2,TCGA-CS-4942_1997.02.22_flair.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_flai...,,,,,,,,,...,,,,,,[-1. -0. -0. 0.],,,,
3,TCGA-CS-4942_1997.02.22_GlistrBoost_ManuallyCo...,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_Glis...,,,,,,,,,...,,,,,,,,,,
4,TCGA-CS-4942_1997.02.22_GlistrBoost.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_Glis...,,,,,,,,,...,,,,,,,,,,
5,TCGA-CS-4942_1997.02.22_t1.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1.n...,,,,,,,,,...,,,,,,,,,,
6,TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_Glis...,,,,,,,,,...,,,,,,,,,,
7,TCGA-CS-4944_2001.02.08_t1.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_t1.n...,,,,,,,,,...,,,,,,,,,,
8,TCGA-CS-4944_2001.02.08_GlistrBoost_ManuallyCo...,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_Glis...,,,,,,,,,...,,,,,,,,,,
9,TCGA-CS-4944_2001.02.08_flair.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_flai...,,,,,,,,,...,,,,,,,,,,


# Edit header data

Next let's use **niftiHeaderEdit(file_path, tag, new_value, input_type)** to edit some of this header data we've analyzed.  In the first parameter you specify the path of a .nii or .nii.gz file to edit.  Then specify the tag name (as shown in the output above) that you want to edit.  Third, you specify the value you'd like to replace it with.

Let's insert some fake PHI into the **db_name** field for a few of the files as an example, and then we'll clean it up in the subequent steps.  

In [9]:
# single file example
curation.niftiHeaderEdit("data/TCGA-CS-4944/TCGA-CS-4942_1997.02.22_flair.nii.gz", "db_name", "John Smith")
curation.niftiHeaderEdit("data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz", "db_name", "Jane Doe")

2024-03-25 17:35:12,468:INFO:Header tag 'db_name' in file 'data/TCGA-CS-4944/TCGA-CS-4942_1997.02.22_flair.nii.gz' has been updated to 'John Smith'.
2024-03-25 17:35:12,490:INFO:Header tag 'db_name' in file 'data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz' has been updated to 'Jane Doe'.


Now let's re-run the analysis so you can see the change.  Note the db_name column now reflects the fake names we inserted for these 2 files in the unique values for **db_name**.  

We're also going to save the resulting dataframe to the **analysis** variable so we can use it in the next step.  

In [18]:
analysis = curation.niftiHeaderAnalysis("data")
display(analysis)

Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic
0,TCGA-CS-4942_1997.02.22_t2.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t2.n...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
1,TCGA-CS-4942_1997.02.22_t1Gd.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1Gd...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
2,TCGA-CS-4942_1997.02.22_flair.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_flai...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
3,TCGA-CS-4942_1997.02.22_GlistrBoost_ManuallyCo...,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_Glis...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
4,TCGA-CS-4942_1997.02.22_GlistrBoost.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_Glis...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
5,TCGA-CS-4942_1997.02.22_t1.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1.n...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
6,TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_Glis...,348,b'',b'Jane Doe',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
7,TCGA-CS-4944_2001.02.08_t1.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_t1.n...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
8,TCGA-CS-4944_2001.02.08_GlistrBoost_ManuallyCo...,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_Glis...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
9,TCGA-CS-4944_2001.02.08_flair.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_flai...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'


Now let's say you want to clean this up without having to copy and paste the path of each file like we did above, which could be very tedious with a larger dataset.

To do this, we'll filter out what you want to edit from the full analysis report and feed that output to **niftiHeaderEdit()** using the **input_type** parameter.

You can choose to set **input_type** to "df" or "csv" depending on whether you saved your report to a dataframe variable or a spreadsheet. When performing bulk edits the file_path should be the variable name of the dataframe, or the path to the CSV file.  Regardless of which type of input you're providing, the file/dataframe must contain a "Path" column to specify the file locations for each file you want to edit.  The other columns will be ignored.

In our example case, let's filter out only the scans that contain "John Smith" and "Jane Doe" in the db_name field and set those to be blank again.

In [26]:
phi = ['John Smith', 'Jane Doe'] 
filtered_analysis = analysis[analysis.apply(lambda row: any(name in row['db_name'] for name in phi), axis=1)].copy()

display(filtered_analysis)

Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic


In [30]:
phi = ['John Smith', 'Jane Doe']

for name in phi:
    tmp = analysis[analysis.apply(lambda row: any(row.astype(str).str.contains(name, case=False)), axis=1)]
    filtered_analysis = pd.concat([filtered_analysis, pd.DataFrame(tmp)], ignore_index=True)
    
display(filtered_analysis)

Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic
0,TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_Glis...,348,b'',b'Jane Doe',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
1,TCGA-CS-4942_1997.02.22_t2.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t2.n...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
2,TCGA-CS-4942_1997.02.22_GlistrBoost.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_Glis...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
3,TCGA-CS-4942_1997.02.22_t1.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1.n...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
4,TCGA-CS-4942_1997.02.22_flair.nii.gz,data/TCGA-CS-4944/TCGA-CS-4942_1997.02.22_flai...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
5,TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_Glis...,348,b'',b'Jane Doe',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'


Note that only the files we want to edit have been saved to the **filtered_analysis** dataframe.  Now let's edit them.

In [31]:
curation.niftiHeaderEdit(filtered_analysis, "db_name", "", input_type= "df")

2024-03-25 18:09:29,082:INFO:Header tag 'db_name' in file 'data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz' has been updated to ''.
2024-03-25 18:09:29,165:INFO:Header tag 'db_name' in file 'data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t2.nii.gz' has been updated to ''.
2024-03-25 18:09:29,182:INFO:Header tag 'db_name' in file 'data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_GlistrBoost.nii.gz' has been updated to ''.
2024-03-25 18:09:29,253:INFO:Header tag 'db_name' in file 'data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1.nii.gz' has been updated to ''.
2024-03-25 18:09:29,329:INFO:Header tag 'db_name' in file 'data/TCGA-CS-4944/TCGA-CS-4942_1997.02.22_flair.nii.gz' has been updated to ''.
2024-03-25 18:09:29,352:INFO:Header tag 'db_name' in file 'data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz' has been updated to ''.


And then let's check one more time using the **unique** option to ensure the **db_name** field is empty for all files.  The only value that should appear is **b''**.  

**Note:** NIfTI puts whatever the value is for this field in between 2 single quote marks, which is why it was b'John Smith' earlier and b'' after emptying the contents.

In [32]:
curation.niftiHeaderAnalysis("data", unique = "yes")

Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic
0,TCGA-CS-4942_1997.02.22_t2.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t2.n...,348.0,b'',b'',0.0,0.0,b'r',0.0,[ 3 240 240 155 1 1 1 1],...,0.0,1.0,-0.0,239.0,0.0,[-1. 0. 0. -0.],[ 0. -1. 0. 239.],[0. 0. 1. 0.],b'',b'n+1'
1,TCGA-CS-4942_1997.02.22_t1Gd.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1Gd...,,,,,,,,,...,,,0.0,,-0.0,[-1. -0. -0. -0.],[ -0. -1. -0. 239.],[ 0. 0. 1. -0.],,
2,TCGA-CS-4942_1997.02.22_flair.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_flai...,,,,,,,,,...,,,,,,[-1. -0. -0. 0.],,,,
3,TCGA-CS-4942_1997.02.22_GlistrBoost_ManuallyCo...,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_Glis...,,,,,,,,,...,,,,,,,,,,
4,TCGA-CS-4942_1997.02.22_GlistrBoost.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_Glis...,,,,,,,,,...,,,,,,,,,,
5,TCGA-CS-4942_1997.02.22_t1.nii.gz,data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1.n...,,,,,,,,,...,,,,,,,,,,
6,TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_Glis...,,,,,,,,,...,,,,,,,,,,
7,TCGA-CS-4944_2001.02.08_t1.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_t1.n...,,,,,,,,,...,,,,,,,,,,
8,TCGA-CS-4944_2001.02.08_GlistrBoost_ManuallyCo...,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_Glis...,,,,,,,,,...,,,,,,,,,,
9,TCGA-CS-4944_2001.02.08_flair.nii.gz,data/TCGA-CS-4944/TCGA-CS-4944_2001.02.08_flai...,,,,,,,,,...,,,,,,,,,,


# Create PNG image grids for reviewing pixel data
The **nifti2png()** function allows you to specify an input directory containing .nii and .nii.gz files to be converted into a 3x3 grid of PNG files.  It will save the resulting files in a new folder, **pngOutput** unless an output directory is also specified.

In [34]:
curation.nifti2png("data")

# Generate PNG image grids for reviewing images with segmentation mask overlays
This will require some customization depending on your image and mask file names.  This is currently setup for testing with https://doi.org/10.7937/K9/TCIA.2017.GJQ7R0EF.  

This first step will build a CSV that has 2 columns.  The first is the path/filename of the image series and the second is the path/filename for the corresponding mask.  

In this particular example it's set to use the same segmentation file with all of the various image data since that is how the BRATS data are organized.

In [35]:
# Specify the root directory where the NIfTI files are located
root_directory = "data"

# Create a dictionary to store image and mask file paths
image_and_mask_paths = {}

# Recursively search for .nii.gz files in the root directory
for root, dirs, files in os.walk(root_directory):
    images = []
    mask_boost_manual = None
    mask_boost = None

    for file in files:
        if file.endswith(".nii.gz"):
            file_path = os.path.join(root, file)
            if "GlistrBoost_ManuallyCorrected" in file:
                mask_boost_manual = file_path
            elif "GlistrBoost" in file:
                mask_boost = file_path
            else:
                images.append(file_path)

    if mask_boost_manual:
        for image in images:
            image_and_mask_paths[image] = mask_boost_manual
    elif mask_boost:
        for image in images:
            image_and_mask_paths[image] = mask_boost

# Create a list of dictionaries for the DataFrame
data = []
for image, mask in image_and_mask_paths.items():
    data.append({"image": image, "mask": mask})

# Create a DataFrame and save it to a CSV file
df = pd.DataFrame(data)
df.to_csv("image_mask_paths.csv", index=False)


After the manifest CSV has been created this step will create the merged image + overlay PNG files.

In [36]:
# Read the CSV file containing image and mask paths
csv_file = "image_mask_paths.csv"  # Replace with the path to your CSV file
df = pd.read_csv(csv_file)

# Create a directory to store the PNG images
output_dir = "data/seg-grids"
os.makedirs(output_dir, exist_ok=True)

# Set the opacity (alpha) for the mask overlay
opacity = 1

# Number of representative images to display
num_images = 9  # 3x3 grid

# Iterate through rows in the CSV file
for index, row in df.iterrows():
    image_path = row["image"]
    mask_path = row["mask"]

    # Load the NIfTI image
    image = nib.load(image_path)

    # Load the segmentation mask and resample it to match the image's dimensions
    mask = nib.load(mask_path)
    mask = resample_img(mask, target_affine=image.affine, target_shape=image.shape, interpolation='nearest')

    # Get the file name without the extension for the title and output file
    image_file_name = os.path.splitext(os.path.basename(image_path))[0]
    mask_file_name = os.path.splitext(os.path.basename(mask_path))[0]

    # Wrap the title text into multiple lines
    title = textwrap.fill(f"{image_file_name} with mask {mask_file_name}", width=80)  # Adjust the width as needed

    # Create a figure for the image with mask overlay
    fig, axes = plt.subplots(3, 3, figsize=(9, 9))
    fig.suptitle(title, color='white')

    for i in range(num_images):
        row_index, col_index = divmod(i, 3)
        slice_index = int(i * image.shape[-1] / num_images)

        # Get the slice from the image and mask using "..."
        image_slice = image.dataobj[..., slice_index]
        mask_slice = mask.dataobj[..., slice_index]

        # Overlay the mask on the image slice
        overlaid_slice = image_slice.copy()
        overlaid_slice[mask_slice > 0] = (1 - opacity) * image_slice[mask_slice > 0] + opacity * 255

        # Display the overlaid slice
        axes[row_index, col_index].imshow(overlaid_slice, cmap='gray')
        axes[row_index, col_index].axis('off')
        axes[row_index, col_index].set_title(f"Slice {slice_index}", color='white')

    # Save the plot as a PNG file
    output_file = os.path.join(output_dir, f"{image_file_name}_mask.png")
    plt.savefig(output_file, bbox_inches='tight', pad_inches=0, format='png', dpi=300, facecolor='black')

    # Close the figure
    plt.close()
