# DICOM
DICOM® — Digital Imaging and Communications in Medicine — is the international standard for medical images and related information. It defines the formats for medical images that can be exchanged with the data and quality necessary for clinical use.

DICOM® is implemented in almost every radiology, cardiology imaging, and radiotherapy device (X-ray, CT, MRI, ultrasound, etc.), and increasingly in devices in other medical domains such as ophthalmology and dentistry. With hundreds of thousands of medical imaging devices in use, DICOM® is one of the most widely deployed healthcare messaging Standards in the world. There are literally billions of DICOM® images currently in use for clinical care.

Since its first publication in 1993, DICOM® has revolutionized the practice of radiology, allowing the replacement of X-ray film with a fully digital workflow. Much as the Internet has become the platform for new consumer information applications, DICOM® has enabled advanced medical imaging applications that have “changed the face of clinical medicine”. From the emergency department, to cardiac stress testing, to breast cancer detection, DICOM® is the standard that makes medical imaging work — for doctors and for patients.

DICOM® is recognized by the International Organization for Standardization as the ISO 12052 standard.

https://www.dicomstandard.org/about


# FHIR
FHIR (Fast Healthcare Interoperability Resources) Specification, which is a standard for exchanging healthcare information electronically. 

https://www.hl7.org/fhir/overview.html

# De-identifying sensitive burnt-in text in DICOM images

1. Redact text Personal Health Information (PHI) present as pixels in DICOM images
2. Visually compare original DICOM images with their redacted versions


# Tools for Health Data Anonymization 
https://github.com/microsoft/Tools-for-Health-Data-Anonymization/tree/master


## Prerequisites
Before getting started, make sure presidio and the latest version of Tesseract OCR are installed. For detailed documentation, see the [installation docs](https://microsoft.github.io/presidio/installation).


Tesseract

```
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
```

```
tesseract --version

OUTPUT
tesseract 5.3.0
 leptonica-1.82.0
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.2) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libwebp 1.2.4 : libopenjp2 2.5.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.6.2 zlib/1.2.13 liblzma/5.4.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.4
 Found libcurl/7.88.1 OpenSSL/3.0.14 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 libpsl/0.21.2 (+libidn2/2.3.3) libssh2/1.10.0 nghttp2/1.52.0 librtmp/2.3 OpenLDAP/2.5.13
```

In [None]:
!pip install presidio_analyzer presidio_anonymizer presidio_image_redactor -q
!python -m spacy download en_core_web_lg -q 

## Dataset
Sample DICOM files are available for use in this notebook in `./sample_data`. Copies of the original DICOM data were saved into the folder with permission from the dataset owners. Please see the original dataset information below:
> Rutherford, M., Mun, S.K., Levine, B., Bennett, W.C., Smith, K., Farmer, P., Jarosz, J., Wagner, U., Farahani, K., Prior, F. (2021). A DICOM dataset for evaluation of medical image de-identification (Pseudo-PHI-DICOM-Data) [Data set]. The Cancer Imaging Archive. DOI: https://doi.org/10.7937/s17z-r072

In [None]:
import glob
from pathlib import Path
import matplotlib.pyplot as plt
import pydicom
from presidio_image_redactor import DicomImageRedactorEngine

## 1. Setup

In [None]:
def compare_dicom_images(
    instance_original: pydicom.dataset.FileDataset,
    instance_redacted: pydicom.dataset.FileDataset,
    figsize: tuple = (11, 11)
) -> None:
    """Display the DICOM pixel arrays of both original and redacted as images.

    Args:
        instance_original (pydicom.dataset.FileDataset): A single DICOM instance (with text PHI).
        instance_redacted (pydicom.dataset.FileDataset): A single DICOM instance (redacted PHI).
        figsize (tuple): Figure size in inches (width, height).
    """
    _, ax = plt.subplots(1, 2, figsize=figsize)
    ax[0].imshow(instance_original.pixel_array, cmap="gray")
    ax[0].set_title('Original')
    ax[1].imshow(instance_redacted.pixel_array, cmap="gray")
    ax[1].set_title('Redacted')

In [None]:
engine = DicomImageRedactorEngine()

## 2. Redacting from loaded DICOM image data

In cases where you already working with loaded DICOM data, the `.redact()` function is most appropriate.

In [None]:
# Load in and process your DICOM file as needed
dicom_instance = pydicom.dcmread('sample_data/0_ORIGINAL.dcm')

In [None]:
#dicom_instance.PixelData

In [None]:
dicom_instance

In [None]:
dicom_instance.pixel_array.shape

In [None]:
type(dicom_instance.pixel_array)

In [None]:
dicom_instance.PatientName= "kkk"

In [None]:

plt.figure(figsize=(10,10))
plt.imshow(dicom_instance.pixel_array)

In [None]:
# Redact
redacted_dicom_instance = engine.redact(dicom_instance, fill="contrast")

In [None]:
redacted_dicom_instance

In [None]:

plt.figure(figsize=(10,10))
plt.imshow(redacted_dicom_instance.pixel_array)

In [None]:
# Option 2: Redact from a loaded DICOM image and return redacted regions
redacted_dicom_image2, bboxes = engine.redact_and_return_bbox(dicom_instance, fill="contrast")

In [None]:
bboxes

### 2.2 Verify performance
Let's look at the original input and compare against the de-identified output.

In [None]:
compare_dicom_images(dicom_instance, redacted_dicom_instance)

We can also set the "fill" to match the background color to blend in more with the image.

In [None]:
redacted_dicom_instance_2 = engine.redact(dicom_instance, fill="background")
compare_dicom_images(dicom_instance, redacted_dicom_instance_2)

### 2.3 Adjust parameters
With the `use_metadata` parameter, we can toggle whether the DICOM metadata is used to augment the analyzer which determines which text to redact.

In [None]:
redacted_dicom_instance = engine.redact(dicom_instance, use_metadata=False) # default is use_metadata=True
compare_dicom_images(dicom_instance, redacted_dicom_instance)

We can also return the bounding box information for the pixel regions that were redacted.

In [None]:
redacted_dicom_instance, bbox = engine.redact_and_return_bbox(dicom_instance)
compare_dicom_images(dicom_instance, redacted_dicom_instance)
print(f"Number of redacted regions: {len(bbox)}")
print(bbox)

## 3. Redacting from DICOM files
Before instantiating your `DicomImageRedactorEngine` class, determine where you want your input to come from and where you want your output to be written to.

To protect against overwriting the original DICOM files, the `redact_from_file()` and `redact_from_directory()` methods will not run if the `output_dir` is a directory which already contains any content.

In [None]:
# Single DICOM (.dcm) file or directory containing DICOM files
input_path = 'sample_data/'

# Directory where the output will be written
output_parent_dir = 'output/'

### 3.1. Run de-identification
Use the `DicomImageRedactorEngine` class to process your DICOM images. If you have only one image to process and want to specify that directly instead of a directory, use `.redact_from_file()` instead of `.redact_from_directory()`.

In [None]:
# Redact text PHI from DICOM images
engine.redact_from_directory(
    input_dicom_path = input_path,
    output_dir = output_parent_dir,
    fill="contrast",
    save_bboxes=True # if True, saves the redacted region bounding box info to .json files in the output dir
)

Get file paths

In [None]:
# Original DICOM images
p = Path(input_path).glob("**/*.dcm")
original_files = [x for x in p if x.is_file()]

# Redacted DICOM images
p = Path(output_parent_dir).glob("**/*.dcm")
redacted_files = [x for x in p if x.is_file()]

Preview images

In [None]:
for i in range(0, len(original_files)):
    original_file = pydicom.dcmread(original_files[i])
    redacted_file = pydicom.dcmread(redacted_files[i])
    
    compare_dicom_images(original_file, redacted_file)

# DOTNET LINUX
```
 wget https://download.visualstudio.microsoft.com/download/pr/4e3b04aa-c015-4e06-a42e-05f9f3c54ed2/74d1bb68e330eea13ecfc47f7cf9aeb7/d
otnet-sdk-8.0.404-linux-x64.tar.gz

mkdir -p $HOME/dotnet && tar zxf dotnet-sdk-8.0.404-linux-x64.tar.gz -C $HOME/dotnet
export DOTNET_ROOT=$HOME/dotnet
export PATH=$PATH:$HOME/dotnet
````



In [None]:
# Load in and process your DICOM file as needed
dicom_instance = pydicom.dcmread('sample_data/0_ORIGINAL_anonymized.dcm')

In [None]:
plt.figure(figsize=(10,10))
plt.imshow(dicom_instance.pixel_array)

In [None]:
dicom_instance