Fix for #165 - Ultrasound Pixel Deidentification #166

wetzelj · 2020-11-24T16:21:47Z

Description

Related issues: #165

Modified pixel cleaning to accommodate various dimensions of stored pixels encountered when cleaning ultrasounds. Depending on the study, the pixel data array may be:

2D - shape = (X, Y) - Greyscale image
3D - shape = (frames, X, Y) - Greyscale cine clip
3D - shape = (X, Y, channel) - RGB image
4D - shape = (frames, X, Y, channel) - RGB cine clip

This commit includes a fix as well as tests for each of the above conditions.

Checklist

I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
My code follows the style guidelines of this project

Open questions

The pixel data in the test images that I've uploaded with this PR have not been modified from the way they were sent from the modality. It was a struggle to find a RGB cine clip that was under the 100Mb file size limitation imposed by github. While the sample I provided was 99.6Mb (as reported by Win10), github detected this as >100Mb. To circumvent this, I added code to utils.py - get_files to detect and decompress files. It's not great, however, I'd like to open an issue to tackle test image storage as a future effort. It might be beneficial to compress everything for upload and decompress the images prior to use in the tests. With this proposed new issue, I believe that I have an opportunity to build out our test image data set with a larger variety of PHI-safe test images from a variety of makes/models of scanners. Our department has an annual employee art show in which we can image (XR, CT, US) objects to serve as the basis for works of art. Given that these images may not contain any body part and are tracked only to give back to the employee for use in creating their art work, they are not in any way tied to any patient or demographics. If you think this data set would be beneficial, I can start to work on the logistics and approvals for sharing this data. Currently the majority of the images are XR.

Corrected bugs with "all" keyword. Added additional test methods.

Adding new images to test the variant dimensions that can be encountered when deidentifying pixel data. Introduced the ability to handle zipped test files to bypass the github 100Mb file size limit for the RGB cine clip. Renamed the test file for the pixel dimension tests.

vsoch · 2020-11-24T17:28:04Z

With this proposed new issue, I believe that I have an opportunity to build out our test image data set with a larger variety of PHI-safe test images from a variety of makes/models of scanners. Our department has an annual employee art show in which we can image (XR, CT, US) objects to serve as the basis for works of art.

I really like this idea. And actually, we can fairly easy separate the small example data (e.g., dicom cookies) from those that are just used for testing, which arguably shouldn't even be stored in the repository because they are so large. I think what would make sense to do is follow the convention of pydicom-data, and even first ask if there is room to accept files for deid here to that library (so pydicom can have a single, shared data library). @darcymason would this be something we could consider? If we want to keep the libraries separate, and you'd want to make a new dataset not just for deid but for other dicom enthusiasts then we could also have it branded differently and then used by the library here. Either way, we could definitely have a function to decompress after install.

wetzelj · 2020-12-01T15:16:56Z

Hey @vsoch, would you object to merging this PR to fix the ultrasound issue in advance of the potential changes for the test images?

vsoch · 2020-12-01T16:49:51Z

Is there a rush? And just to clarify - the cine clips added are all under 10MB?

I think my preference would be to figure out how to provide the images externally (and retrieve on test) - and we could even do that simply with putting them in a separate repository, and then downloading. My worry is that I would merge, and then you wouldn't follow up on doing that.

wetzelj · 2020-12-01T19:39:26Z

The rush is just that we're trying to deidentify a specific ultrasound which caused us to find this error. Obviously I can deploy from my fork to get around this, but the bug in question is really an extension of PR #134 and I'd like to fix this for other users as well.

The files that I've added are all under 10Mb - the two cine clips are compressed, uploaded as .zip files (the uncompressed dcm files are not <10Mb). I introduced a change into utils.py to allow for test files to be uploaded as .zip and decompressed into the temp directory for testing. This was done simply to enable the bug fix to be implemented with tests. If you would prefer, I can simply remove the cine clip tests from this commit.

I'm more than happy to work on the changes for the test files, but before I go down that path I want to ensure that I completely understand how you would want this to be implemented. Once the test file repository is established I would be happy to modify the tests to utilize it, but ultimately don't feel that a bugfix should be held up by a refactoring of the unit tests.

Please understand, with almost every change I've requested and or corrected, I've tried to implement corresponding tests to improve the project as a whole - but at the same time, my first priority has to be to my end user requests and satisfying their requirements and requests.

vsoch · 2020-12-01T20:03:36Z

Please understand, with almost every change I've requested and or corrected, I've tried to implement corresponding tests to improve the project as a whole - but at the same time, my first priority has to be to my end user requests and satisfying their requirements and requests.

I totally understand, you're completely fantastic @wetzelj ! If the images were still 100MB I would have larger issue, but since they are closer to 10MB I'd be happy to merge with the current fixes, because it puts the library in a better place. Thanks for clarifying, stay tuned!

vsoch · 2020-12-01T20:06:38Z

https://pypi.org/project/deid/0.2.24/

And for the scenario I have in mind, I am thinking something akin to what pydicom does. It looks like they have a data manager class and then the exact urls.

wetzelj added 3 commits November 18, 2020 12:04

Correcting issue with deidentifying pixels on RGB images

e7ae887

Correcting issue with deidentifying pixels on cine clips

a617d0b

Corrected bugs with "all" keyword. Added additional test methods.

vsoch approved these changes Dec 1, 2020

View reviewed changes

vsoch merged commit 1782354 into pydicom:master Dec 1, 2020

This was referenced Dec 7, 2020

Ultrasound Deidentification - Pixel Masking Failure #165

Closed

Refactor Unit Tests to utilize separate data store for sample images #167

Closed

wetzelj mentioned this pull request Sep 29, 2022

US data corrupted by pixel anonymization #228

Closed

wetzelj deleted the fix/rgb-ultrasound branch November 21, 2022 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for #165 - Ultrasound Pixel Deidentification #166

Fix for #165 - Ultrasound Pixel Deidentification #166

wetzelj commented Nov 24, 2020

vsoch commented Nov 24, 2020

wetzelj commented Dec 1, 2020

vsoch commented Dec 1, 2020

wetzelj commented Dec 1, 2020

vsoch commented Dec 1, 2020

vsoch commented Dec 1, 2020

Fix for #165 - Ultrasound Pixel Deidentification #166

Fix for #165 - Ultrasound Pixel Deidentification #166

Conversation

wetzelj commented Nov 24, 2020

Description

Checklist

Open questions

vsoch commented Nov 24, 2020

wetzelj commented Dec 1, 2020

vsoch commented Dec 1, 2020

wetzelj commented Dec 1, 2020

vsoch commented Dec 1, 2020

vsoch commented Dec 1, 2020