# HIPAA Privacy Rule-based De-identification on DICOM Dataset

HIPAA provides two methods for de-identification: the "Safe Harbor" method and the "Expert Determination" method. The Safe Harbor method is more straightforward and involves anonymizing/redacting 18 specific types of identifiers from the data.

Here, we will focus on the Safe Harbor method, which includes removing or redacting identifiers such as names, geographic subdivisions smaller than a state, dates directly related to an individual, phone numbers, email addresses, and more.

After de-ID,  the DICOM file will be updated and uploaded to destiny storage and evaluated by AWS services, Rekongnition, Comprehend and Comprehend Medical.

## Setup De-identification Environment

Let's start by setting environment variables for de identification of DICOM file:
1) set local path of DICOM img folder.
2) set source and destiny s3 bucket.
3) set source and destiny prefix for DICOM file.
4) cleanup de-id DICOM dir and evaluation DICOM dir
5) set aws session with user profile name.

In [None]:
from med_img_de_id_class import ProcessMedImage
from common.utils import get_boto3_session, draw_img, dump_dict_to_tsv, dump_object_to_text
# setup environment
LOC_DICOM_FOLDER = '../images/med_phi_img/'
LOC_DE_ID_DICOM_FOLDER = '../images/med_de_id_img/'
LOC_EVAL_DICOM_FOLDER = '../images/med_eval_img/'
SOURCE_BUCKET = "de-id-src"
DESTINATION_BUCKET = "de-id-dst"
SOURCE_PREFIX = "dicom-images/"
DESTINATION_PREFIX = "de-id-dicom-images/"
EVAL_BUCKET = "de-id-evl"
EVAL_PREFIX = "eval-de-id-dicom-images/"

aws_session = get_boto3_session("esi")
rule_config_file_path= '../configs/de-id/de_id_rules_auto.yaml'

processor = ProcessMedImage(aws_session, rule_config_file_path)

## Parse DICOM Image

In [None]:
local_img_file = "1-053.dcm" 
# local_img_file ="RiveraMichael.dcm"
# local_img_file = "MeyerStephanie.dcm"
# local_img_file = "hefe.dcm"
# local_img_file = "TeFain.dcm"
# local_img_file = 'MartinChad-1-1.dcm'
# local_img_file = 'ScottKauf-Man.dcm'
# local_img_file = '1-043.dcm'
# local_img_file = 'lung-1-1.dcm'
# local_img_file = "DavidsonDouglas.dcm"
# local_img_file = "00002024.dcm"
# local_img_file = "00000044.dcm"
# local_img_file = "00000001.dcm"
# local_img_file = "00000027.dcm"
local_img_path = LOC_DICOM_FOLDER + local_img_file
src_key= SOURCE_PREFIX + local_img_file
dist_key= DESTINATION_PREFIX + local_img_file
result = processor.parse_dicom_file(None, None, local_img_path)

## De-Identification in Metadata of DICOM

In [None]:
# dump tags before De-id
dump_object_to_text(processor.ds, '../temp/ds_before_de_id.txt')
processor.de_identify_dicom()

## Draw DICOM Image Before De-identification

In [None]:
# show med image before de-identification
draw_img(processor.image_data)

## De-identification in pixel of DICOM

In [None]:
id_text_detected, text_detected = processor.detect_id_in_img(None, None)
if id_text_detected and len(id_text_detected):
    print(f'Sensitive text detected in {local_img_file}')
    print (id_text_detected)
    processor.redact_id_in_image(id_text_detected)
    print('Sensitive text in image have been redacted')

else:
    print(f'No sensitive text detected in {local_img_file}')


## Updated the DICOM with redacted metadata and blurred sensitive identification text in image.

In [None]:
import os
local_de_id_dicom_dir = f"{LOC_DE_ID_DICOM_FOLDER}test/{processor.patient_id}/{processor.studyInstanceUID}/{processor.seriesInstanceUID}/"
if not os.path.exists(local_de_id_dicom_dir):
    os.makedirs(local_de_id_dicom_dir)
local_de_id_dicom = os.path.join(local_de_id_dicom_dir, local_img_file )
processor.save_de_id_dicom(local_de_id_dicom)

## Parse redacted DICOM for evaluation with AWS Comprehend and Comprehend Medical

In [None]:
src_key = DESTINATION_PREFIX + local_img_file.replace("dcm", "png")
dist_key= EVAL_PREFIX + local_img_file 
result = processor.parse_dicom_file(DESTINATION_BUCKET, src_key, local_de_id_dicom, True)

# Show Image and Metadata in De-id DICOM File Before Evaluation

In [None]:
# dump tags before De-id
dump_object_to_text(processor.ds, '../temp/ds_after_de_id.txt')
# show med image before de-identification
draw_img(processor.image_data)

## Evaluate Redacted DICOM Metadata

In [None]:
from common.utils import dump_dict_to_tsv, get_date_time
detected_elements, tags, ids = processor.detect_id_in_tags()
if ids and len(ids) > 0:
    # print("Found PII/PHI in redacted DICOM: ", ids)
    # create a evaluation report
    eval_dict_list = []
    for i in range(len(ids)):
        eval_dict = {"tag": tags[i], "Detected PHI": ids[i]}
        eval_dict_list.append(eval_dict)
    dump_dict_to_tsv(eval_dict_list, f"../output/report/tags_de_id_evaluation_report_{get_date_time()}.tsv")
    # redact remaining PHI 
    processor.redact_tags(detected_elements)
    print("Remaining PHI information in de-identified DICOM metadata are redacted.")

else:
    print("No PII/PHI found in redacted DICOM")

## Evaluate Redacted DICOM Pixel Data

In [None]:
# check after redacted
id_text_detected, text_detected = processor.detect_id_in_img(DESTINATION_BUCKET, src_key, True)
if id_text_detected and len(id_text_detected) > 0:
    print(f'Sensitive text detected in pixel in {local_de_id_dicom}')
    dump_dict_to_tsv(id_text_detected, f"../output/report/img_de_id_evaluation_report_{get_date_time()}.tsv")
    print (id_text_detected)
    processor.redact_id_in_image(id_text_detected)
else:
    print(f'No sensitive text detected in {dist_key}')

## Update evaluated DICOM file if remaining PHI info detected and redacted

In [None]:
local_eval_dicom_dir =f"{LOC_EVAL_DICOM_FOLDER}test/{processor.patient_id}/{processor.studyInstanceUID}/{processor.seriesInstanceUID}/"
if not os.path.exists(local_eval_dicom_dir):
    os.makedirs(local_eval_dicom_dir)
local_eval_dicom = os.path.join(LOC_EVAL_DICOM_FOLDER, local_img_file)
processor.save_de_id_dicom(local_eval_dicom)
if (ids and len(ids) > 0) or (id_text_detected and len(id_text_detected) > 0):
    # show evaluated dicom
    result = processor.parse_dicom_file(None, None, local_eval_dicom)
    draw_img(processor.image_data)
    dump_object_to_text(processor.ds, '../temp/ds_after_eval.txt') 

## Self-learning: update rules for detecting PHI/PII information in DICOM file.

In [None]:
if ids and len(ids) > 0:
    processor.update_rules_in_configs(rule_config_file_path)

# close the processor
processor.close()