Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying whitelist and blacklist filters #33

Closed
BrianKolowitz opened this issue Dec 22, 2017 · 4 comments
Closed

Applying whitelist and blacklist filters #33

BrianKolowitz opened this issue Dec 22, 2017 · 4 comments

Comments

@BrianKolowitz
Copy link
Contributor

Hi, I have a question regarding the filter section of my config and my source code. In my configuration https://github.com/BrianKolowitz/deid/blob/development/my_examples/deid/deid.dicom I specify a whitelist

%filter whitelist

LABEL Xray
  contains Modality CR|DX

in my code https://github.com/BrianKolowitz/deid/blob/development/my_examples/dicom/my_deid.py i specify the configuration

 cleaned_files = replace_identifiers(dicom_files=dicom_files,
                                        ids=updated_ids,
                                        deid=deid,
                                        config=config_file_path,
                                        remove_private=True,
                                        output_folder=output_path)

but i see images with modalities PR and RG in my output_folder.

Is this a bug or am I not properly using the library?

@vsoch
Copy link
Member

vsoch commented Dec 22, 2017

hey @BrianKolowitz ! The library is agnostic to the names of the filters, actually. The ones we are using (whitelist, blacklist, greylist) correspond with different actions that we implement. For example:

  • whitelist: we most it through, it likely doesn't have burned in pixels
  • blacklist: definitely has burned in pixels, quarantine
  • greylist: not sure, a human needs to look over and possibly customize filter

And when you run the above to replace identifiers with the above, although the criteria for filtering is represented in that deid recipe, the action to replace_identifiers doesn't look at any kind of filtering. You would have already done this and removed the ones you didn't want from your list of dicom_files. Actually, I made an object called a DicomCleaner for just this task (as opposed to handing stuff around to different functions).

So what you want to do then is a workflow that looks like:

  1. start with raw dicom files, and design your filters (looks like you are done here!)
  2. Then you will want to create a DicomCleaner and run it with your input list of dicoms, and specifically the function to "detect" That looks like:
from deid.dicom import DicomCleaner

# Here is some dummy file you have to test
dicom_file = "example.dcm"

# Create the cleaner
client = DicomCleaner()

# If you intend to run cleaning, you can provide an output folder. Otherwise just skip this
client = DicomCleaner(output_folder='/home/vanessa/Desktop')

Running detect is just handing the file to the cleaner client. This is likely the extent of how you will want to use the cleaner. The output is a datastructure with the result, and detect means we take your deid recipe and parse headers looking for matches to the filters.

client.detect(dicom_file)

{'flagged': True,
'results': [{'coordinates': [],
   'group': 'blacklist',
   'reason': ' ImageType missing  or ImageType empty '}]}

Then you could parse that datastructure and deal with the files appropriately, and then the ones that you want to continue processing could go into replace_identifiers. I haven't tested this fully yet, but we also have some (very basic) functions to perform a cleaning, and they depend on having known coordinates for PHI based on modality / image type, etc. That would look like this (after detect):

client.clean()

# And then there are a few saving functions (dcm and png)
client.save_png()
client.save_dicom()

If there are coordinates, they are blanked, otherwise no change. This is again reliant on how good your list is. Much better would be an OCR method, which I started but it needs more testing and development, if you are interested --> https://github.com/pydicom/dicom-scraper

A full example script is here:
https://github.com/pydicom/deid/blob/master/examples/dicom/pixels/run-cleaner-client.py

If you step through this and want to write up some docs for the (web friendly / readable) version it would be greatly appreciated! I wrote them into that script but didn't pass on the knowledge to the docs yet. Let me know if you have other questions.

@BrianKolowitz
Copy link
Contributor Author

Thanks. Is there a way to accomplish this in one line?

FORMAT dicom
%filter whitelist
LABEL Xray
contains Modality CR|DX

%filter blacklist
LABEL Not Xray
equals Modality PR|RG

I'd like to specify something like this

%filter blacklist
LABEL Not Xray
notin Modality [CR,DX]

so I don't have to be exhaustive in the modalities I list

@vsoch
Copy link
Member

vsoch commented Dec 22, 2017

Could you write out in (people terms) what you are trying to do - basically "not in Modality CR or DX?" There should be a notequals, so like:

%filter blacklist
LABEL Not Xray
notequals Modality CR|DX

The whole list of filters are:

contains
notcontains (looks like there is a bug in filters.py for this, will fix soon!)
equals
notequals
missing
present
empty

The stuff on the right side is all regular expressions, so whatever regular expression string you might use is fair game! if there is a filter that you think would be useful to add, let's add it!

@BrianKolowitz
Copy link
Contributor Author

I think it's fine for my current needs, I'm trying to include only CR OR DX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants