# DICOM introduction

### Working with DICOM is easy in Python

1. Import the tool that we will use to work with DICOM.

In [None]:
import pydicom

### Now we are good.

Lets load the DICOM file 'C:\Users\oli_n\Desktop\SynoWork\Presentations\R4 Python\DICOM\US.dcm'

In [None]:
ds = pydicom.dcmread('US.dcm')
# ds = pydicom.dcmread(r'D:\1.3.6.1.4.1.25403.154027277515425.16036.20210617022156.1\1.3.6.1.4.1.25403.154027277515425.16036.20210617022156.3.dcm')


### Easy. Let's explore.

Firstly, let's take the pixel data and make an image object using the Pillow (PIL) library.

In [None]:
from PIL import Image
dicom_img = Image.fromarray(ds.pixel_array)

#### Let's display the image.

In [None]:
dicom_img

#### Now let's have a look at the metadata

In [None]:
ds

#### One more thing- let's not forget the 'preamble' which also needs inspecting.

In [None]:
ds.preamble

## Let's Anonymise the DICOM.

Anonymise by individual tags.

In [None]:
ds.PatientName = 'Anonymouse'
ds.PatientID = 'MouseZero'

ds.StudyID = 'SEUS123456'
ds.InstitutionName = 'St Elsewhere'
ds.ReferringPhysicianName = 'Dr Dave'
ds.StationName = 'US Room'

### Explore by Value Representation (VR)

Different DICOM tags have different content types and therefore different rules of what content is allowed.

The complete list of these and exact descriptions of content can be found here:

http://dicom.nema.org/dicom/2013/output/chtml/part05/sect_6.2.html

In [None]:

def person_name_callback(dataset, data_element):
    if data_element.VR == 'PN':
        print(f'Person Name Tag: {data_element.description()}: "{data_element.value}"')
        data_element.value = 'Anonymous person'

ds.walk(person_name_callback)


This is quite a powerful technique.

It can be altered to replace all names (VR == 'PN') or all dates (VR == 'DA') etc.

### Private Tags

These are NOT specified in the DICOM stadard and are inserted by Manufacturers for their own purposes.

- Standard tags have an EVEN numbered group.

- Private tags have an ODD numbered group.


In [None]:
foot_ds = pydicom.read_file('foot.dcm')

In [None]:
foot_ds

#### Note the private tags e.g.:

- (0017, 0010) Private Creator                     LO: 'ACME GruntMaster 9000'

and:

- (07a3, 1018) Private tag data                    ST: 'Mr John Smith has an ouchy foot. Trodden on by badger.'

#### Also note the PHI in an unexpected location:

- (0010, 2110) Allergies                           LO: 'NKDA. But John Smith (X123456789) has faecal incontinence with milk.'

#### Let's remove all those pesky private tags:

In [None]:

def remove_private_tags(dataset, data_element):
    if data_element.tag.group % 2 != 0:
        del dataset[data_element.tag]

foot_ds.walk(remove_private_tags)


#### Or alternatively, you can just do this.  Because it's easier.

In [None]:
foot_ds.remove_private_tags()

# Done!

## Save the anonymised DICOM


In [None]:
ds.save_as('anonymised.dcm')


### But there's more!

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.



# Go the extra mile... The image still shows burned-in data - Pt name, IC etc.

### Let's use Tesseract to do OCR and feed us the text details.  We will use a wrapper function called 'pyTesseract' to allow us to interact with Tesseract.

for those who want to try this for themselves you will need to install both Tesseract _and_ pyTesseract

Tesseract: https://github.com/tesseract-ocr/tesseract#installing-tesseract

PyTesseract: https://pypi.org/project/pytesseract/

In [None]:
from PIL import Image, ImageDraw
import pytesseract

# If you don't have the tesseract executable in your PATH, include the following line:
# This works in my windows installation but will need to be changed to match your setup.

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\oli_n\AppData\Local\Programs\Tesseract-OCR\tesseract.exe'
#                                                  ^^^^^ This bit will probably need to be changed
#                                                        according to your system.

In [None]:
txt_data = pytesseract.image_to_data(dicom_img, output_type='dict')

# We run this twice as this one produces nicely formatted output.
print(pytesseract.image_to_data(dicom_img))


### Now we know what the text is and where it is, we can look through the list and blank it all out.

We will use the Pillow (PIL) library to draw a rectangle of colour #800080 (which is somewhat purple) over each of the text fields.

### There are many ways to do this, but for ease of use I have made a single function that will mark the text on a given image, based on the text data from Tesseract.


- create the ImageDraw object to allow it to edit the image

- Run through the list of text found

- Filter out the empty fields - tesseract is good but not perfect right out of the box

- Highlight text with a rectangle

If anyone is interested, it can be trained on new data to better recognise the fonts from US machines!


In [None]:
# Single function to mark the text -just provide the PIL image and the Tesseract text data
def mark_text(img, txt_data):
    img1 = ImageDraw.Draw(img)
    
    for i in range(len(txt_data['level'])):

        if txt_data['text'][i].strip() == '':
            continue
        #if float(txt_data['conf'][i]) < 50:
        #    continue

        left = txt_data['left'][i]
        right = left + txt_data['width'][i]
        top = txt_data['top'][i]
        bottom = top + txt_data['height'][i]
        txt = txt_data['text'][i]

        print(f'{i}: ({left},{top})-({bottom},{right})\t"{txt}"')

        shape = [(left, top), (right + 1, bottom + 1)]
        img1.rectangle(shape, outline = "#FF8822")
        img1.rectangle(shape, fill ="#800080")


In [None]:
mark_text(dicom_img, txt_data)
dicom_img

In [None]:
test_dicom = Image.fromarray(ds.pixel_array)
dicom_bw = test_dicom.convert('L')
dicom_bw

In [None]:
THRESHOLD = 0.75 * 255

def pixelThreshold(intensity):
    if intensity > THRESHOLD:
        return intensity
#         return 255
    else:
        return 0

highpass_img = dicom_bw.point(pixelThreshold)

highpass_img

In [None]:
print(pytesseract.image_to_data(highpass_img))

highpass_txt_data = pytesseract.image_to_data(highpass_img, output_type='dict')


In [None]:
dicom_bw2 = highpass_img.convert('RGB')

mark_text(img = dicom_bw2, txt_data = highpass_txt_data)

In [None]:
dicom_bw2

In [None]:
dicom_bw3 = dicom_img.copy()

mark_text(img = dicom_bw3, txt_data = highpass_txt_data)

mark_text(img = dicom_bw3, txt_data = txt_data)

dicom_bw3

# Thank you!

The libraries used were:

- pydicom: https://pypi.org/project/pydicom/
- Pillow: https://pypi.org/project/Pillow/
- numpy: https://pypi.org/project/numpy/
- pyTesseract: https://pypi.org/project/pytesseract/
