Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #165 - Ultrasound Pixel Deidentification #166

Merged
merged 3 commits into from
Dec 1, 2020

Conversation

wetzelj
Copy link
Contributor

@wetzelj wetzelj commented Nov 24, 2020

Description

Related issues: #165

Modified pixel cleaning to accommodate various dimensions of stored pixels encountered when cleaning ultrasounds. Depending on the study, the pixel data array may be:

  • 2D - shape = (X, Y) - Greyscale image
  • 3D - shape = (frames, X, Y) - Greyscale cine clip
  • 3D - shape = (X, Y, channel) - RGB image
  • 4D - shape = (frames, X, Y, channel) - RGB cine clip

This commit includes a fix as well as tests for each of the above conditions.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • My code follows the style guidelines of this project

Open questions

The pixel data in the test images that I've uploaded with this PR have not been modified from the way they were sent from the modality. It was a struggle to find a RGB cine clip that was under the 100Mb file size limitation imposed by github. While the sample I provided was 99.6Mb (as reported by Win10), github detected this as >100Mb. To circumvent this, I added code to utils.py - get_files to detect and decompress files. It's not great, however, I'd like to open an issue to tackle test image storage as a future effort. It might be beneficial to compress everything for upload and decompress the images prior to use in the tests. With this proposed new issue, I believe that I have an opportunity to build out our test image data set with a larger variety of PHI-safe test images from a variety of makes/models of scanners. Our department has an annual employee art show in which we can image (XR, CT, US) objects to serve as the basis for works of art. Given that these images may not contain any body part and are tracked only to give back to the employee for use in creating their art work, they are not in any way tied to any patient or demographics. If you think this data set would be beneficial, I can start to work on the logistics and approvals for sharing this data. Currently the majority of the images are XR.

Corrected bugs with "all" keyword.  Added additional test methods.
Adding new images to test the variant dimensions that can be encountered when deidentifying pixel data.  Introduced the ability to handle zipped test files to bypass the github 100Mb file size limit for the RGB cine clip.  Renamed the test file for the pixel dimension tests.
@vsoch
Copy link
Member

vsoch commented Nov 24, 2020

With this proposed new issue, I believe that I have an opportunity to build out our test image data set with a larger variety of PHI-safe test images from a variety of makes/models of scanners. Our department has an annual employee art show in which we can image (XR, CT, US) objects to serve as the basis for works of art.

I really like this idea. And actually, we can fairly easy separate the small example data (e.g., dicom cookies) from those that are just used for testing, which arguably shouldn't even be stored in the repository because they are so large. I think what would make sense to do is follow the convention of pydicom-data, and even first ask if there is room to accept files for deid here to that library (so pydicom can have a single, shared data library). @darcymason would this be something we could consider? If we want to keep the libraries separate, and you'd want to make a new dataset not just for deid but for other dicom enthusiasts then we could also have it branded differently and then used by the library here. Either way, we could definitely have a function to decompress after install.

@wetzelj
Copy link
Contributor Author

wetzelj commented Dec 1, 2020

Hey @vsoch, would you object to merging this PR to fix the ultrasound issue in advance of the potential changes for the test images?

@vsoch
Copy link
Member

vsoch commented Dec 1, 2020

Is there a rush? And just to clarify - the cine clips added are all under 10MB?

I think my preference would be to figure out how to provide the images externally (and retrieve on test) - and we could even do that simply with putting them in a separate repository, and then downloading. My worry is that I would merge, and then you wouldn't follow up on doing that.

@wetzelj
Copy link
Contributor Author

wetzelj commented Dec 1, 2020

The rush is just that we're trying to deidentify a specific ultrasound which caused us to find this error. Obviously I can deploy from my fork to get around this, but the bug in question is really an extension of PR #134 and I'd like to fix this for other users as well.

The files that I've added are all under 10Mb - the two cine clips are compressed, uploaded as .zip files (the uncompressed dcm files are not <10Mb). I introduced a change into utils.py to allow for test files to be uploaded as .zip and decompressed into the temp directory for testing. This was done simply to enable the bug fix to be implemented with tests. If you would prefer, I can simply remove the cine clip tests from this commit.

I'm more than happy to work on the changes for the test files, but before I go down that path I want to ensure that I completely understand how you would want this to be implemented. Once the test file repository is established I would be happy to modify the tests to utilize it, but ultimately don't feel that a bugfix should be held up by a refactoring of the unit tests.

Please understand, with almost every change I've requested and or corrected, I've tried to implement corresponding tests to improve the project as a whole - but at the same time, my first priority has to be to my end user requests and satisfying their requirements and requests.

@vsoch
Copy link
Member

vsoch commented Dec 1, 2020

Please understand, with almost every change I've requested and or corrected, I've tried to implement corresponding tests to improve the project as a whole - but at the same time, my first priority has to be to my end user requests and satisfying their requirements and requests.

I totally understand, you're completely fantastic @wetzelj ! If the images were still 100MB I would have larger issue, but since they are closer to 10MB I'd be happy to merge with the current fixes, because it puts the library in a better place. Thanks for clarifying, stay tuned!

@vsoch vsoch merged commit 1782354 into pydicom:master Dec 1, 2020
@vsoch
Copy link
Member

vsoch commented Dec 1, 2020

https://pypi.org/project/deid/0.2.24/

And for the scenario I have in mind, I am thinking something akin to what pydicom does. It looks like they have a data manager class and then the exact urls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants