Skip to content

mikevoets/dicom_anonymizer

Repository files navigation

Written by Mike Voets, UiT Arctic University of Norway, 2018. Report: link

Python DICOM Anonymizer

This is a Python script for anonymization of DICOM files, and can be used from the command line. It takes a source csv file with variables and a source directory containing DICOM files, anonymizes them, and places them to the specified destination. The source csv file will also be de-identified and written to the specific destination. The anonymized DICOM files are also renamed in order to not remain having any sensitive information.

Notice

The source csv file should contain variables from Kreftregisteret in a specific order. It is assumed that the personal ID, invitation ID, screening date, and diagnosis date, are the 1st, 2nd, 3rd, and 10th variable in each row in the csv file, respectively.

Important!

The function .find_dicom_path in anonymize_dicom_files.py on line 65 must be implemented in order to be able to run this script successfully.

Prerequisites

The script runs with Python 2.7. See the requirements for what third-party requirements you will need to have installed.

You can install all requirements by using pip:

pip install -U -r requirements.txt

You will also need to load the dicom-anon submodule (assuming you have Git installed):

git submodule update --init

Notice: For Windows users, it may be that cloning only works if you are using Git bash.

To test if the script can be run succesfully, execute the script in test mode with -t and without any other arguments:

python anonymize_dicom_files.py -t

If the test ran succesfully, the output should be as follows:

Start anonymizing DICOMs of 1 patients.
Anonymization has finished.
=== Test has run smoothly!

Example

Assume the identified DICOM files are in a directory called identified in your home directory, and you want the de-identified files to be placed in a directory called cleaned in your home directory.

The csv file that contains the links between screenings (StudyIDs) and variables from Kreftregisteret is called links.csv.

The variables from Kreftregisteret are placed in a csv file called variables.csv, and you want the de-identified variables to be placed in a new csv file called cleaned_variables.csv.

The following example starts the script, and uses dicom-anon to de-identify the DICON files. Dicom-anon attempts to be compliant with the Basic Application Level Confidentiality Profile as specified in DICOM 3.15 Annex E document on page 85.

The de-identifier script creates a sqlite database with a table containing the original and cleaned version of every attribute. This file can be removed after running this script. Files that are explicitly marked as containing burnt-in data along with files that have a series description of "Patient Protocol", will be copied to the quarantine folder.

python anonymize_dicom_files.py variables.csv links.csv cleaned_variables.csv identified cleaned

As a default only modalities MG and OT are allowed. If for any reason you need to specify other modalities, you will need to use the --modalities argument and specify the allowed modalities yourself. Multiple modalities should be comma-separated.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

About

Data Retrieval from PACS for BreastScreen Norway

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages