
# Anonymize DICOM data

This example is a starting point to anonymize DICOM data.

It shows how to read data and replace tags: person names, patient ID,
optionally remove curves and private tags, and write the results in a new file.


## Anonymize a single file



In [2]:
# authors : Darcy Mason
#           Guillaume Lemaitre <g.lemaitre58@gmail.com>
# license : MIT

import tempfile
import pydicom
import matplotlib.pyplot as plt

#src dicom
dicom_path = "../dicoms_downloaded/1_ORIGINAL.dcm"

# read dicom
ds = pydicom.dcmread(dicom_path)

#ds = examples.mr

for keyword in ["PatientID", "PatientBirthDate"]:
    print(ds.data_element(keyword))

(0010, 0020) Patient ID                          LO: '339833062'
(0010, 0030) Patient's Birth Date                DA: ''


We can define a callback function to find all tags corresponding to a person
names inside the dataset. We can also define a callback function to remove
curves tags.



In [4]:
def person_names_callback(ds, elem):
    if elem.VR == "PN":
        elem.value = "anonymous"


def curves_callback(ds, elem):
    if elem.tag.group & 0xFF00 == 0x5000:
        del ds[elem.tag]

We can use the different callback function to iterate through the dataset but
also some other tags such that patient ID, etc.



In [5]:
ds.PatientID = "id"
ds.walk(person_names_callback)
ds.walk(curves_callback)

pydicom allows to remove private tags using ``remove_private_tags`` method



In [6]:
ds.remove_private_tags()

Data elements of type 3 (optional) can be easily deleted using ``del`` or
``delattr``.



In [7]:
if "OtherPatientIDs" in ds:
    delattr(ds, "OtherPatientIDs")

if "OtherPatientIDsSequence" in ds:
    del ds.OtherPatientIDsSequence

For data elements of type 2, this is possible to blank it by assigning a
blank string.



In [8]:
tag = "PatientBirthDate"
if tag in ds:
    ds.data_element(tag).value = "19000101"

Finally, this is possible to store the image



In [10]:
for keyword in ["PatientID", "PatientBirthDate"]:
    print(ds.data_element(keyword))

path = tempfile.NamedTemporaryFile().name
ds.save_as("output-2.dcm")

(0010, 0020) Patient ID                          LO: 'id'
(0010, 0030) Patient's Birth Date                DA: '19000101'


In [3]:
#src dicom
dicom_path = "../dicom_processed/anonymized_1_ORIGINAL.dcm"

# read dicom
dicom_data = pydicom.dcmread(dicom_path, force=True)

#ds = examples.mr

print(dicom_data)

Dataset.file_meta -------------------------------
(0002, 0002) Media Storage SOP Class UID         UI: Secondary Capture Image Storage
(0002, 0003) Media Storage SOP Instance UID      UI: 1.2.826.0.1.3680043.8.498.94650191124222492089842785732057361379
(0002, 0010) Transfer Syntax UID                 UI: Explicit VR Little Endian
(0002, 0012) Implementation Class UID            UI: 1.2.826.0.1.3680043.8.498.1
-------------------------------------------------
(0008, 0016) SOP Class UID                       UI: Secondary Capture Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.2.826.0.1.3680043.8.498.94650191124222492089842785732057361379
(0008, 0023) Content Date                        DA: '20240528'
(0008, 0033) Content Time                        TM: '105350'
(0010, 0010) Patient's Name                      PN: 'Anonymized'
(0010, 0020) Patient ID                          LO: 'Anonymized'
(0020, 000d) Study Instance UID                  UI: 1.2.826.0.1.3680043.8.4

In [26]:
# Obtener el array de píxeles
pixel_array = dicom_data.pixel_array

# Crear la figura y el eje
fig, ax = plt.subplots()

# Mostrar la imagen en escala de grises
ax.imshow(pixel_array, cmap=plt.cm.gray)

# Quitar el título y los ejes
ax.set_title("")
ax.axis("off")

# Guardar la imagen como PNG
output_name = dicom_path.split("/")[-1]
output_path = f"../dicoms_downloaded/{output_name}.png"
plt.savefig(output_path, bbox_inches='tight', pad_inches=0)

# Cerrar la figura
plt.close(fig)

print(f"Imagen guardada en {output_path}")

Imagen guardada en ../dicoms_downloaded/1_ORIGINAL.dcm.png


In [16]:
import pydicom
from pydicom.dataset import Dataset, FileDataset
import datetime
import numpy as np
from PIL import Image

# Ruta del archivo PNG
png_file = '../dicoms_downloaded/1_ORIGINAL.dcm.png'
# Ruta del archivo DICOM de salida
dicom_file = '1_ORIGINAL.dcm'

# Leer la imagen PNG usando Pillow
image = Image.open(png_file)
image = image.convert('L')  # Convertir a escala de grises
np_image = np.array(image)

# Crear un nuevo Dataset DICOM
ds = Dataset()

# Configurar algunos metadatos DICOM básicos
ds.PatientName = "Anonymized"
ds.PatientID = "Anonymized"
ds.StudyInstanceUID = pydicom.uid.generate_uid()
ds.SeriesInstanceUID = pydicom.uid.generate_uid()
ds.SOPInstanceUID = pydicom.uid.generate_uid()
ds.SOPClassUID = pydicom.uid.SecondaryCaptureImageStorage

# Fecha y hora de creación del archivo
dt = datetime.datetime.now()
ds.ContentDate = dt.strftime('%Y%m%d')
ds.ContentTime = dt.strftime('%H%M%S')

# Configurar las dimensiones de la imagen
ds.SamplesPerPixel = 1
ds.PhotometricInterpretation = "MONOCHROME2"
ds.Rows = np_image.shape[0]
ds.Columns = np_image.shape[1]
ds.BitsAllocated = 8
ds.BitsStored = 8
ds.HighBit = 7
ds.PixelRepresentation = 0
ds.PixelData = np_image.tobytes()

# Crear un FileDataset y agregar metadatos necesarios
file_meta = pydicom.dataset.FileMetaDataset()
file_meta.MediaStorageSOPClassUID = ds.SOPClassUID
file_meta.MediaStorageSOPInstanceUID = ds.SOPInstanceUID
file_meta.ImplementationClassUID = pydicom.uid.PYDICOM_IMPLEMENTATION_UID
file_meta.TransferSyntaxUID = pydicom.uid.ExplicitVRLittleEndian  # Añadir TransferSyntaxUID

# Crear el FileDataset con los metadatos del archivo y del conjunto de datos
ds.file_meta = file_meta
ds.is_little_endian = True
ds.is_implicit_VR = False

# Guardar el archivo DICOM
ds.save_as(dicom_file)

print(f'Archivo DICOM guardado como {dicom_file}')


Archivo DICOM guardado como 1_ORIGINAL.dcm
