Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

US data corrupted by pixel anonymization #228

Closed
timothee-l opened this issue Sep 29, 2022 · 15 comments
Closed

US data corrupted by pixel anonymization #228

timothee-l opened this issue Sep 29, 2022 · 15 comments

Comments

@timothee-l
Copy link

Input is Ultrasound multiframes, decompressed (LittleEndianExplicit), RGB encoded.
Output is a scrambled image.

Instructions are very simple:

                client = DicomCleaner(output_folder=output_folder + sub_path, deid='config.txt')
                client.detect(path + '\\' + file)
                client.clean()
                client.save_dicom()

And my config file is also very basic

FORMAT dicom

%filter graylist

LABEL Philips Ultrasound Header
    contains Manufacturer Philips
    + contains Modality US
    + contains ImageType Cardiology
    coordinates 0,0,1024,23

This is an input/output example. It is affecting all of the multiframes in the dataset:
https://imgur.com/a/GF7ZO2w

@vsoch
Copy link
Member

vsoch commented Sep 29, 2022

It looks like it's not cleaning the right dimension, or possibly saving incorrectly. I'm not sure we've ever done Ultrasound multiframes before (I'm not sure I've worked with them). Are you a Python developer and able to debug this and pull request?

@wetzelj
Copy link
Contributor

wetzelj commented Sep 29, 2022

Seeing this issue pop through reminded me of #166. I thought we handled this, but evidently not - or at least not the specific situation @timothee-l is encountering. I don't have the bandwidth to research - but thought it would be beneficial to add #166 into the discussion.

@timothee-l
Copy link
Author

timothee-l commented Sep 29, 2022

I can give it a try.
To clarify, the tool has worked with US multiframes before - as long as they were RGB encoded (YBR also caused some sort of corruption).

@vsoch
Copy link
Member

vsoch commented Sep 29, 2022

I think step one, either way, is getting a test dummy dataset to reproduce the issue. @timothee-l is this something you could provide? We have an external data repository now https://github.com/pydicom/deid-data

@timothee-l
Copy link
Author

I can share the pixel data and other tags you may need - but not the whole dicom (I do not own the data)

@vsoch
Copy link
Member

vsoch commented Sep 29, 2022

Could you maybe make an empty dicom of that type and add the pixel data and tags to it? That would work!

@timothee-l
Copy link
Author

sample.zip

Here is a sample of the data I am trying to process.
I will try the files on your data repo.

@vsoch
Copy link
Member

vsoch commented Sep 30, 2022

Perfect! As long as I can reproduce your error, I should be able to debug and work on it.

@vsoch
Copy link
Member

vsoch commented Sep 30, 2022

stupid question - do you provide this to deid as a .zip, or just the dicom on its own?

@timothee-l
Copy link
Author

Just the dicom on its own

@vsoch
Copy link
Member

vsoch commented Oct 1, 2022

okay I've reproduced! I've verified the format is:

  • 4D - shape = (frames, X, Y, channel) - RGB

and I've walked through the logic of clean. Nothing as jumping out at me as wrong, but indeed the image is mangled. @wetzelj do you have any ideas/ suggestions for what to try or look at, beyond the obvious?

@wetzelj
Copy link
Contributor

wetzelj commented Oct 3, 2022

I didn't dive into the code for this response, but did run a few tests.

My speculation at this point is that it could be something to do with the fact that this image has undergone lossy compression at some point it it's lifetime - in all of my use cases, we've always dealt with images that have not undergone lossy compression.

I would expect that this image must be decompressed to get back to a standard pixel array before we apply any sort of pixel masking rules.
https://pydicom.github.io/pydicom/dev/old/image_data_handlers.html

@vsoch
Copy link
Member

vsoch commented Oct 3, 2022

Tried that just now:

from pydicom import read_file
from deid.dicom.pixels import DicomCleaner
import os
here = os.getcwd()
output_folder = os.path.join(here, 'out')
file = "ultrasound-multiframe.dcm"

# Decompress first
dcm = read_file(file)
dcm.decompress()
file_decompressed = "ultrasound-multiframe-decompressed.dcm"
dcm.save_as(file_decompressed)

# Now clean
client = DicomCleaner(output_folder=output_folder, deid='config.txt')
client.detect(file_decompressed)
client.clean()
client.save_dicom()

Didn't seem to make a difference (still messed up!) but was a good idea to try.

@timothee-l
Copy link
Author

There appears to be a mismatch between the pixel data (compressed) and the transfer syntax uid which says uncompressed, as you said. So yes the problems seem to originate from my data. I think the issue can be closed, Thanks!

@vsoch
Copy link
Member

vsoch commented Oct 4, 2022

Sounds good thanks!

@vsoch vsoch closed this as completed Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants