Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: copy the RSNA anonymizer protocol #206

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cfhammill
Copy link

Description

I've added to the deid/data/deid.dicom protocol
a large number of header removal and small number of blanks.
These header removals/blanks were extracted from the 2022-03-08
version of the RSNA anonymizer software
http://mirc.rsna.org/download/Anonymizer-installer.jar

The jar file was unzipped, the unzipped jar has a file
dicom-anonymizer.script with xml deidentification directives.
all @remove and @empty commands were extracted from the file
using grep, and transformed into the corresponding REMOVE and
BLANK lines using sed, copied in to a slightly modified
version of deid.dicom I've been modifying for work. The file was
post-processed using R to remove duplicated REMOVE directives.

The Anonymizer code is distributed under the RSNA public license
http://mirc.rsna.org/rsnapubliclicense.pdf which is I believe to be
compatible with the MIT license used in deid. So this derivative
work should satisfy the terms of that license to be included in
deid.

I've also updated the ADD directives to generate new UIDs as per
the example in the docs, and to copy back in the SOPClassUID field.
These can be removed as they may be too specialized to my use case.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • My code follows the style guidelines of this project

Open questions

Should the UID generating directives be removed?

@cfhammill
Copy link
Author

I've updated the PR to include just the deidentification directives from the RSNA anonymizer script, I added additional things that weren't in the first version of the PR and updated the commit message to reflect this. The most important changed:

I've also pulled tag processing functions @all @integer @round and @hashdate. These were added as additional REMOVEs, and then ADDed back, dates and UIDs get a func: processing lookup, key identifiers like PatientID get var: exact lookups. Please have a look when you get a chance.

Processing directives from the 2022-03-08 version of the
RSNA anonymizer software http://mirc.rsna.org/download/Anonymizer-installer.jar

The jar file was unzipped, the unzipped jar has a file
dicom-anonymizer.script with xml deidentification directives
all `@remove` and `@empty` commands were extracted from the file
using `grep`, and transformed into the corresponding REMOVE and
BLANK lines using `sed`.

Further, lines with @hashdate, @round, and @integer were extracted
these correspond to dates and uids. The tags were REMOVEd and a
ADD tag func: lookup was added for processing dates and UIDs.

@ALL directives correspond to key identifiers, those were REMOVEd
then ADDed as var: lookup.

The Anonymizer code is distributed under the RSNA public license
http://mirc.rsna.org/rsnapubliclicense.pdf which is I believe to be
compatible with the MIT license used in deid. So this derivative
work should satisfy the terms of that license to be included in
deid.
ADD PatientName var:patient_name
ADD PatientID var:patient_id
ADD PatientAge var:patient_age

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove empty space at the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about adding func:process_uid here? This function won't exist for most users. We do have plan to support deid provided functions (discussion was here) #203 but no implementation yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to remove the trailing empty lines. I think having the functions make sense provided the tags are removed first, at least with var: a missing lookup meant the tag was not added back to the file, I think that behaviour is ok for these. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add functions to a config file that a user is not expecting to be there. Perhaps you can test commenting them out, and then we can re-add when #207 is implemented and we can use a function there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't quite understand, is this a difference between func: and var:? Because deid.dicom includes var:'s that are unexpected when working from within python (I haven't used the command line scripts). But I'm happy to comment them out anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Var and func both won't work without the user providing functions and variables in the lookup. E.g., I would need to define that function or a variable of a particular name and pass it to the function to perform the parsing. We generally want these core provided recipes to not require that extra customization - e.g., most people are going to try it and be surprised that there is an extra function or variable dependency!

However, if / when we have these deid provided functions, we will easily be able to add them to the defaults. Does that make sense? Some of the documentation examples of using func/var might help.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean they won't work - it appears to me that the default behaviour is to leave the tag unmodified, which is why for each ADD tag var: and ADD tag func: I preceded them with REMOVE tag, that way if the function or variable isn't supplied the tag is removed. Quick example:

import os
import tempfile
from pydicom.dataset import FileDataset, FileMetaDataset
from pydicom.uid import UID

# Create some temporary filenames
suffix = '.dcm'
filename_little_endian = tempfile.NamedTemporaryFile(suffix=suffix).name
filename_big_endian = tempfile.NamedTemporaryFile(suffix=suffix).name

print("Setting file meta information...")
# Populate required values for file meta information
file_meta = FileMetaDataset()
file_meta.MediaStorageSOPClassUID = UID('1.2.840.10008.5.1.4.1.1.2')
file_meta.MediaStorageSOPInstanceUID = UID("1.2.3")
file_meta.ImplementationClassUID = UID("1.2.3.4")

print("Setting dataset values...")
# Create the FileDataset instance (initially no data elements, but file_meta
# supplied)
ds = FileDataset(filename_little_endian, {},
                 file_meta=file_meta, preamble=b"\0" * 128)

ds.PatientAge = "206"

from deid.config import DeidRecipe
from deid.dicom.parser import DicomParser
recipe = DeidRecipe()
recipe.deid["header"].append({'action' : 'ADD', 'field' : 'PatientAge', 'value' : 'var:new_age'})
dparser = DicomParser(ds, recipe = recipe)

dparser.parse()
ds

Retains the PatientAge of 206 and does not error.

recipe2 = DeidRecipe()
recipe2.deid["header"].append({'action' : 'REMOVE', 'field' : 'PatientAge'})
recipe2.deid["header"].append({'action' : 'ADD', 'field' : 'PatientAge', 'value' : 'var:new_age'})

dparser = DicomParser(ds, recipe = recipe2)
dparser.parse()

ds

shows the PatientAge field deleted

and finally

dparser.lookup["new_age"] = 102
dparser.parse()

ds 

shows the modified age of 102

But your library your choice! I'm happy to comment those actions out until #207 is complete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I can be more clear - it will run but it will emit this error message:

if value_option not in item:
bot.warning("%s not found in item lookup." % (value_option))
return None
.

Hmm. So I guess I don't feel very strongly about this, so if your preference is to leave them, that's okay with me. But likely we will need to refactor a bit if/when we add default functions (do you agree we should provide some default to plug in to this recipe?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh interesting, I've never seen those warnings when working in the python repl!

Yes I think including defaults would be helpful, autogenerating new UIDs is a particularly appealing example. The RSNA anonymizer hashes the date, we could provide functionality like that as well if it was desired. I was just trying to stick to the spirit of their recipe.

Copy link
Member

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay so I'm okay if you want to leave the func/var lines. A few more tweaks before this is ready to merge:

  1. please bump the version in deid/version.py
  2. add a corresponding note to the CHANGELOG.md
  3. find a good place in the docs to mention this template. Perhaps on this page after the last section, provide a simple list of recipes provided (and description?) E.g.:

We provide a set of default and pre-propulated recpies for you to use in the deid/data directory, including:

  • deid.dicom: A default recipe that will make a best effort to provide header and image pixel filters based on experience and CTP.
  • deid.dicom.ultrasound: intended for ultrasound
  • deid.dicom.xray.chest: intended for chest xray
  • [deid.dicom.rsna.anon](link here): your description

The above also has me thinking we might want to name the file according to the others, with deid.dicom as the first part to better identify it's purpose, and then perhaps remove the year so it's not dated? We can always derive the last update date from git.

@cfhammill
Copy link
Author

Thanks @vsoch, I will make the changes suggested above, as soon as I get a chance.

@vsoch
Copy link
Member

vsoch commented Apr 12, 2022

Sounds good! I'm giving the other folks mentioned in the issue a little bit to respond, and if they don't it probably means they don't have bandwidth and I'll make some time in a coming evening / weekend to implement these deid-provided functions.

@vsoch
Copy link
Member

vsoch commented Apr 14, 2022

@cfhammill please see #208 - when this is merged you will be able to provide a func: with a deid defined function and it will work without a hitch.

@vsoch
Copy link
Member

vsoch commented Oct 8, 2022

@cfhammill are you still interested in contributing this recipe? We've since added deid-provided replace uid functions (deid_func:<name>), so it would be feasible to do here: https://pydicom.github.io/deid/user-docs/recipe-funcs/

@cfhammill
Copy link
Author

yes I am still interested in contributing, thanks @vsoch. I'll try to find time in the next couple weeks to make the relevant changes.

@vsoch
Copy link
Member

vsoch commented Oct 11, 2022

Sounds good! Thank you!

@cclauss
Copy link

cclauss commented Mar 12, 2023

Status?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants