Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic examples in docs fail due to missing metadata #106

Closed
Bonnevie opened this issue Aug 26, 2022 · 6 comments
Closed

Basic examples in docs fail due to missing metadata #106

Bonnevie opened this issue Aug 26, 2022 · 6 comments

Comments

@Bonnevie
Copy link

  • IDIS Core version: 1.0.1
  • Python version: 3.8
  • Operating System: Ubuntu

This library seems to still be undergoing maintenance, so a quick observation.

Description

Running Core.deidentify seems to not produce a copy as otherwise described in the docs, and in fact produces a DICOM object that cannot be saved as it has neither preamble nor filemeta. If metadata is restored with pydicom's fix_file_meta, it can be saved, but not restored.

I'd expect, based on the library's self-stated objectives, that operations were available to apply a confidentiality profile + custom rules and not otherwise transform the DICOM object.

What I Did

I have verified that the basic deidentification example of the library at https://github.com/sjoerdk/idiscore/blob/master/examples/deidentify_a_dataset_basic.py does not work:

import pydicom

from idiscore.core import Core, Profile
from idiscore.defaults import get_dicom_rule_sets

sets = get_dicom_rule_sets()  # Contains official DICOM deidentification rules
profile = Profile(  # Choose which rule sets to use
    rule_sets=[sets.basic_profile, sets.retain_modified_dates, sets.retain_device_id]
)
core = Core(profile)  # Create an deidentification core

# read a DICOM dataset from file and write to another
core.deidentify(pydicom.dcmread(my_own_file)).save_as("deidentified.dcm")

With traceback:

AttributeError                            Traceback (most recent call last)
<ipython-input-4-1f9729cc0bc5> in <cell line: 13>()
     11 
     12 # read a DICOM dataset from file and write to another
---> 13 core.deidentify(pydicom.dcmread(mydicom)).save_as("deidentified.dcm")

~/.cache/pypoetry/virtualenvs/xray-data-registry-U7OvTqMo-py3.8/lib/python3.8/site-packages/pydicom/dataset.py in save_as(self, filename, write_like_original)
   2059             Write a DICOM file from a :class:`FileDataset` instance.
   2060         """
-> 2061         pydicom.dcmwrite(filename, self, write_like_original)
   2062 
   2063     def ensure_file_meta(self) -> None:

~/.cache/pypoetry/virtualenvs/xray-data-registry-U7OvTqMo-py3.8/lib/python3.8/site-packages/pydicom/filewriter.py in dcmwrite(filename, dataset, write_like_original)
   1036     if None in encoding:
   1037         if tsyntax is None:
-> 1038             raise AttributeError(
   1039                 f"'{cls_name}.is_little_endian' and "
   1040                 f"'{cls_name}.is_implicit_VR' must be set appropriately "

AttributeError: 'Dataset.is_little_endian' and 'Dataset.is_implicit_VR' must be set appropriately before saving

Likely fix

In the apply_rules method, the output object is initialized with a plain default deidentified = Dataset(). This likely fails to copy the metadata over, and should be replaced with an actual deep copy (preferably optionally including pixel data!).

@Bonnevie
Copy link
Author

The problem is also largely fixed post hoc by:

def deidentify(dcm: FileDataset) -> FileDataset:
    deid_dcm = core.deidentify(dcm)
    deid_dcm.PixelData = dcm.PixelData
    deid_dcm.file_meta = dcm.file_meta
    deid_dcm.preamble = dcm.preamble
    return deid_dcm

@sjoerdk
Copy link
Owner

sjoerdk commented Aug 31, 2022

Thanks for the clear feedback!

Regarding documentation mentioning deepcopy
You are right that the documentation still mentions a deepcopy while this is actually a modification in place. This is sloppy maintenance on my part.

The change from deepcopy to modification in place is a very late commit (6be8769)
This change was made for pragmatic reasons. Memory usage was sometimes hard to control even with separate handling for pixeldata.

The docs should be updated. I made #108 for this. Thanks for noticing

Regarding the example not working
I cannot reproduce this. Are you sure the file you are loading has a correct header itself?
Does the following work?

pydicom.dcmread(my_own_file).save_as("deidentified.dcm")

I cannot find the output object being initialized with an empty Dataset() call. Here it is just passed through: https://github.com/sjoerdk/idiscore/blob/master/idiscore/core.py#L170

I'm sticking to pydicom conventions for reading and writing, so if a DICOM file has no preamble or header to begin with, I leave it up to the user to apply post or ad hoc fixes.

@Bonnevie
Copy link
Author

Hi @sjoerdk, sorry for not getting back to you more swiftly.

I think the problem with your reproduction is that you are on the master branch whereas I'm on the latest versioned branch with tag v1.0.1, which is what is installed with pip.

See here for deidentified = Dataset():

deidentified = Dataset()

But great news then, if you cannot reproduce - that must mean it's fixed on master? Maybe you could do a minor release?

The DICOM file has a preamble and file_meta, so the problem was that IDISCORE was deleting them when doing its rewrites.

@sjoerdk
Copy link
Owner

sjoerdk commented Sep 13, 2022

Good point! Thanks again for the checks. I think I missed a commit tag in v1.0.2, which made CI skip the pypi deploy in this run.

I'll make a new deploy. I'll update package management to pep 517 and start using poetry instead of setup tools. New pypi release will be somewhere next week.

@Bonnevie
Copy link
Author

That sounds great! Thanks for the fast response and the good work.

@sjoerdk
Copy link
Owner

sjoerdk commented Sep 15, 2022

Version 1.1.0 is now pushed to pypi. This fixes the issues when running the example files.
Updating the docs regarding deepcopy will be picked up in #108.

To prevent issues with the example files in the future they should be included in the standard tests. Created #112 for this.

@sjoerdk sjoerdk closed this as completed Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants