You can download and run this notebook locally, or you can run it for free in a cloud environment using Colab or Sagemaker Studio Lab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kirbyju/TCIA_Notebooks/blob/main/NIfTI_Curation_Testing.ipynb)

[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github.com/kirbyju/TCIA_Notebooks/blob/main/NIfTI_Curation_Testing.ipynb)

# Summary

This notebook contains examples for testing out the beta release of NIfTI curation tools for https://pypi.org/project/tcia-utils/.

# Setup
Install the relevant packages if you haven't already.

In [None]:
!pip install -q --upgrade tcia_utils

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.4/10.4 MB[0m [31m63.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m40.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.6/52.6 MB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m117.7/117.7 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is

In [None]:
from tcia_utils import curation

Customized sample BraTS LGG files originally obtained from https://doi.org/10.7937/K9/TCIA.2017.GJQ7R0EF can be found at https://drive.google.com/drive/folders/1mnWbkl3HCFRwHM4G87i34n9FSXXZSkpx?usp=sharing. These data were purposefully modified to create some duplicate pixel data under different file names to demonstrate some of the functionality shown below.

It's recommended to go to that link, copy the data to your own Google Drive folder and then mount it using the command below.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Check for duplicate data
The **niftiDups(data_dir, format)** function allows you to specify a folder and it will compare the hashes of *.nii or *.nii.gz files to see if any contain the same image pixel content.  The first column of the output will contain the hash values and the second column contains the file paths.  Files with duplicate content will have the same hash value in the first column.

Results will be returned as a dataframe or you can optionally specify **format = "csv"** as a parameter to save a CSV file.  

Windows users should specify paths with forward slashes, e.g. **"C:/data/"**.



In [None]:
curation.niftiDups("/content/drive/MyDrive/NIfTI testing/data", format = "csv")



Unnamed: 0,Hash,File Path
0,fc40c798c881622f6bc796b54b7eaebaa916665a6412b2...,/content/drive/MyDrive/NIfTI testing/data/TCGA...
1,fc40c798c881622f6bc796b54b7eaebaa916665a6412b2...,/content/drive/MyDrive/NIfTI testing/data/TCGA...
2,20ccab5b8288786bbf5000159d82ebcc6cb74de303e734...,/content/drive/MyDrive/NIfTI testing/data/TCGA...
3,20ccab5b8288786bbf5000159d82ebcc6cb74de303e734...,/content/drive/MyDrive/NIfTI testing/data/TCGA...


# Analyze header data

The **niftiHeaderAnalysis(path, unique, format)** function can be pointed at a directory to generate a dataframe or spreadsheet of header data from all .nii or .nii.gz files in that directory.  Specify **format = "csv"** to create the spreadsheet.


In [None]:
curation.niftiHeaderAnalysis("/content/drive/MyDrive/NIfTI testing/data", format = "csv")

Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic
0,TCGA-CS-4944_2001.02.08_t1Gd.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
1,TCGA-CS-4944_2001.02.08_t2.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
2,TCGA-CS-4944_2001.02.08_GlistrBoost_ManuallyCo...,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
3,TCGA-CS-4944_2001.02.08_t1.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
4,TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
5,TCGA-CS-4944_2001.02.08_flair.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
6,TCGA-CS-4942_1997.02.22_flair.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
7,TCGA-CS-5393_1999.06.06_t2.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
8,TCGA-CS-5393_1999.06.06_GlistrBoost.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
9,TCGA-CS-5393_1999.06.06_t1.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'



You can also use the optional parameter **unique = "yes"** to distill the results down to unique values for each column.

**Note:** When using the unique option, there is no meaningful relationship between the items in any given row. Each column is just an independent list of unique values that were contained in that field across all files.

In [None]:
curation.niftiHeaderAnalysis("/content/drive/MyDrive/NIfTI testing/data", unique = "yes", format = "csv")

Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic
0,TCGA-CS-4944_2001.02.08_t1Gd.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348.0,b'',b'',0.0,0.0,b'r',0.0,[ 3 240 240 155 1 1 1 1],...,0.0,1.0,-0.0,239.0,0.0,[-1. -0. -0. -0.],[ -0. -1. -0. 239.],[0. 0. 1. 0.],b'',b'n+1'
1,TCGA-CS-4944_2001.02.08_t2.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,0.0,,-0.0,[-1. 0. 0. -0.],[ 0. -1. 0. 239.],[ 0. 0. 1. -0.],,
2,TCGA-CS-4944_2001.02.08_GlistrBoost_ManuallyCo...,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,[-1. -0. -0. 0.],,,,
3,TCGA-CS-4944_2001.02.08_t1.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
4,TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
5,TCGA-CS-4944_2001.02.08_flair.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
6,TCGA-CS-4942_1997.02.22_flair.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
7,TCGA-CS-5393_1999.06.06_t2.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
8,TCGA-CS-5393_1999.06.06_GlistrBoost.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
9,TCGA-CS-5393_1999.06.06_t1.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,


# Edit header data

Next let's use **niftiHeaderEdit(file_path, tag, new_value, input_type)** to edit some of this header data we've analyzed.  In the first parameter you specify the path of a .nii or .nii.gz file to edit.  Then specify the tag name (as shown in the output above) that you want to edit.  Third, you specify the value you'd like to replace it with.

Let's insert some fake PHI into the db_name field for a few of the files as an example, and then we'll clean it up in the subequent steps.  

In [None]:
# single file example
curation.niftiHeaderEdit("/content/drive/MyDrive/NIfTI testing/data/TCGA-CS-4944/TCGA-CS-4942_1997.02.22_flair.nii.gz", "db_name", "John Smith")
curation.niftiHeaderEdit("/content/drive/MyDrive/NIfTI testing/data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_GlistrBoost.nii.gz", "db_name", "John Smith")
curation.niftiHeaderEdit("/content/drive/MyDrive/NIfTI testing/data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t1.nii.gz", "db_name", "John Smith")
curation.niftiHeaderEdit("/content/drive/MyDrive/NIfTI testing/data/TCGA-CS-4942/TCGA-CS-4942_1997.02.22_t2.nii.gz", "db_name", "John Smith")

Now let's re-run the analysis so you can see the change.  Note the db_name column now reflects the fake name we inserted for these 4 files.

We're also going to save the resulting dataframe to the **analysis** variable so we can use it in the next step.  

In [None]:
analysis = curation.niftiHeaderAnalysis("/content/drive/MyDrive/NIfTI testing/data")
display(analysis)

Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic
0,TCGA-CS-4944_2001.02.08_t1Gd.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
1,TCGA-CS-4944_2001.02.08_t2.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
2,TCGA-CS-4944_2001.02.08_GlistrBoost_ManuallyCo...,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
3,TCGA-CS-4944_2001.02.08_t1.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
4,TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
5,TCGA-CS-4944_2001.02.08_flair.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
6,TCGA-CS-4942_1997.02.22_flair.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
7,TCGA-CS-5393_1999.06.06_t2.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
8,TCGA-CS-5393_1999.06.06_GlistrBoost.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
9,TCGA-CS-5393_1999.06.06_t1.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'


Now let's say you want to clean this up without having to copy and paste the path of each file like we did above, which could be very tedious with a larger dataset.

To do this, we'll leverage the **searchDf(query, column_header, dataframe_name)** function in tcia_utils to filter what you want to edit from the full analysis report and feed that output to **niftiHeaderEdit()** using the **input_type** parameter.

You can choose to set **input_type** to "df" or "csv" depending on whether you saved your report to a dataframe variable or a spreadsheet. When performing bulk edits the file_path should be the variable name of the dataframe, or the path to the CSV file.  Regardless of which type of input you're providing, it should contain a "Path" column to specify the file locations for each file you want to edit.  The other columns will be ignored.

In our example case, let's filter out only the scans that contain "John Smith" in the db_name field and set those to be blank again.

In [None]:
results = curation.searchDf("John Smith", "db_name", analysis )
display(results)


Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic
0,TCGA-CS-4942_1997.02.22_flair.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
17,TCGA-CS-4942_1997.02.22_t2.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
18,TCGA-CS-4942_1997.02.22_t1.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, 0.0, 0.0, -0.0]","[0.0, -1.0, 0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'
19,TCGA-CS-4942_1997.02.22_GlistrBoost.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348,b'',b'John Smith',0,0,b'r',0,"[3, 240, 240, 155, 1, 1, 1, 1]",...,0.0,1.0,-0.0,239.0,0.0,"[-1.0, -0.0, -0.0, -0.0]","[-0.0, -1.0, -0.0, 239.0]","[0.0, 0.0, 1.0, 0.0]",b'',b'n+1'


Note that only the 4 files we want to edit have been saved to the **results** dataframe.  Now let's edit them.

In [None]:
curation.niftiHeaderEdit(results, "db_name", "", input_type= "df")

And then let's check one more time using the **unique** option to ensure the **db_name** field is empty for all files.  The only value that should appear is **b''**.  

**Note:** I'm not sure why, but NIfTI appears to put whatever the value is for this field in between 2 single quote marks, which is why it was b'John Smith' earlier and b'' after emptying the contents.

In [None]:
curation.niftiHeaderAnalysis("/content/drive/MyDrive/NIfTI testing/data", unique = "yes")

Unnamed: 0,Filename,Path,sizeof_hdr,data_type,db_name,extents,session_error,regular,dim_info,dim,...,quatern_c,quatern_d,qoffset_x,qoffset_y,qoffset_z,srow_x,srow_y,srow_z,intent_name,magic
0,TCGA-CS-4942_1997.02.22_flair.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,348.0,b'',b'',0.0,0.0,b'r',0.0,[ 3 240 240 155 1 1 1 1],...,0.0,1.0,-0.0,239.0,0.0,[-1. 0. 0. -0.],[ 0. -1. 0. 239.],[0. 0. 1. 0.],b'',b'n+1'
1,TCGA-CS-4944_2001.02.08_t1Gd.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,0.0,,-0.0,[-1. -0. -0. -0.],[ -0. -1. -0. 239.],[ 0. 0. 1. -0.],,
2,TCGA-CS-4944_2001.02.08_t2.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,[-1. -0. -0. 0.],,,,
3,TCGA-CS-4944_2001.02.08_GlistrBoost_ManuallyCo...,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
4,TCGA-CS-4944_2001.02.08_t1.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
5,TCGA-CS-4944_2001.02.08_GlistrBoost.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
6,TCGA-CS-4944_2001.02.08_flair.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
7,TCGA-CS-5393_1999.06.06_t2.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
8,TCGA-CS-5393_1999.06.06_GlistrBoost.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,
9,TCGA-CS-5393_1999.06.06_t1.nii.gz,/content/drive/MyDrive/NIfTI testing/data/TCGA...,,,,,,,,,...,,,,,,,,,,


# Create PNG image grids for reviewing pixel data
The **nifti2png()** function allows you to specify an input directory containing .nii and .nii.gz files to be converted into a 3x3 grid of PNG files.  It will save the resulting files in a new folder, **pngOutput** unless an output directory is also specified.

Once you've created the PNG output it's recommended that you review those files in the Photos app on Windows or the Preview app on Mac.  In both cases you should be able to move through the images very quickly using the left and right arrow keys after scanning for burned in PHI.

In [None]:
curation.nifti2png("/content/drive/MyDrive/NIfTI testing/data", "/content/drive/MyDrive/NIfTI testing/data/png")

# Generate PNG image grids for reviewing images with segmentation mask overlays
This will require some customization depending on your image and mask file names.  This is currently setup for testing with https://doi.org/10.7937/K9/TCIA.2017.GJQ7R0EF.  Contact Justin for assistance if you're using it with other datasets so we can revise the code to work for however the submitter has named and organized their files.

This first step will build a CSV that has 2 columns.  The first is the path/filename of the image series and the second is the path/filename for the corresponding mask.  

In this particular example it's set to use the same segmentation file with all of the various image data since that is how the BRATS data are organized.

In [None]:
import os
import pandas as pd
from glob import glob

# Specify the root directory where the NIfTI files are located
root_directory = "/content/drive/MyDrive/NIfTI testing/data"

# Create a dictionary to store image and mask file paths
image_and_mask_paths = {}

# Recursively search for .nii.gz files in the root directory
for root, dirs, files in os.walk(root_directory):
    images = []
    mask_boost_manual = None
    mask_boost = None

    for file in files:
        if file.endswith(".nii.gz"):
            file_path = os.path.join(root, file)
            if "GlistrBoost_ManuallyCorrected" in file:
                mask_boost_manual = file_path
            elif "GlistrBoost" in file:
                mask_boost = file_path
            else:
                images.append(file_path)

    if mask_boost_manual:
        for image in images:
            image_and_mask_paths[image] = mask_boost_manual
    elif mask_boost:
        for image in images:
            image_and_mask_paths[image] = mask_boost

# Create a list of dictionaries for the DataFrame
data = []
for image, mask in image_and_mask_paths.items():
    data.append({"image": image, "mask": mask})

# Create a DataFrame and save it to a CSV file
df = pd.DataFrame(data)
df.to_csv("image_mask_paths.csv", index=False)


After the manifest CSV has been created this step will create the merged image + overlay PNG files.

In [None]:
import os
import nibabel as nib
import nilearn.plotting as nlp
import matplotlib.pyplot as plt
import pandas as pd
from nilearn.image import resample_img
import textwrap

# Read the CSV file containing image and mask paths
csv_file = "image_mask_paths.csv"  # Replace with the path to your CSV file
df = pd.read_csv(csv_file)

# Create a directory to store the PNG images
output_dir = "output_images"
os.makedirs(output_dir, exist_ok=True)

# Set the opacity (alpha) for the mask overlay
opacity = 1

# Number of representative images to display
num_images = 9  # 3x3 grid

# Iterate through rows in the CSV file
for index, row in df.iterrows():
    image_path = row["image"]
    mask_path = row["mask"]

    # Load the NIfTI image
    image = nib.load(image_path)

    # Load the segmentation mask and resample it to match the image's dimensions
    mask = nib.load(mask_path)
    mask = resample_img(mask, target_affine=image.affine, target_shape=image.shape, interpolation='nearest')

    # Get the file name without the extension for the title and output file
    image_file_name = os.path.splitext(os.path.basename(image_path))[0]
    mask_file_name = os.path.splitext(os.path.basename(mask_path))[0]

    # Wrap the title text into multiple lines
    title = textwrap.fill(f"{image_file_name} with mask {mask_file_name}", width=80)  # Adjust the width as needed

    # Create a figure for the image with mask overlay
    fig, axes = plt.subplots(3, 3, figsize=(9, 9))
    fig.suptitle(title, color='white')

    for i in range(num_images):
        row_index, col_index = divmod(i, 3)
        slice_index = int(i * image.shape[-1] / num_images)

        # Get the slice from the image and mask using "..."
        image_slice = image.dataobj[..., slice_index]
        mask_slice = mask.dataobj[..., slice_index]

        # Overlay the mask on the image slice
        overlaid_slice = image_slice.copy()
        overlaid_slice[mask_slice > 0] = (1 - opacity) * image_slice[mask_slice > 0] + opacity * 255

        # Display the overlaid slice
        axes[row_index, col_index].imshow(overlaid_slice, cmap='gray')
        axes[row_index, col_index].axis('off')
        axes[row_index, col_index].set_title(f"Slice {slice_index}", color='white')

    # Save the plot as a PNG file
    output_file = os.path.join(output_dir, f"{image_file_name}_mask.png")
    plt.savefig(output_file, bbox_inches='tight', pad_inches=0, format='png', dpi=300, facecolor='black')

    # Close the figure
    plt.close()
