Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to parse private tags #125

Merged
merged 5 commits into from
Mar 28, 2020
Merged

Add support to parse private tags #125

merged 5 commits into from
Mar 28, 2020

Conversation

vsoch
Copy link
Member

@vsoch vsoch commented Mar 13, 2020

Description

This is a first shot to add parsing of private tag groups for deid. I'll definitely need help to do further testing, possibly add more tests for extracting the tags as fields, and tips based on dicom expertise / real world use cases that I might not be aware of. This PR addresses:

Related issues:

  • Discussion: Add class to handle get/set #119: there isn't a global get/set strategy, so I wound up needing to walk through the code and add handling for these special cases. The other option is doing some much larger refactor and wrapping the dicom object with some general get() and set() functions to return fields and values, respectively, but the class already has a get(), and the logic to get or set a value doesn't always seem consistent between cases. I'm a bit unhappy with the fact that the implementation doesn't have a clean solution like this, but I'm thinking this should be addressed in another PR - this one is already quite big and will require a lot of work.
  • Apply deid recipe actions to private DICOM tags #118 this is really the more important issue of the two - we want private tags to be included in the larger parsing, if they aren't removed.

@wetzelj here is my thinking for how we should go about this:

  • finish the PR here so that it works for your use case. I'm interested to know about spots that I miss, bugs that you run into, and (even better) some more tests that we can add to ensure that the newly added functionality works as expected! Note that I've added a test_dicom_fields.py file that does some basic parsing of dicom private tags. What other tests should we have there?
  • when this PR is merged, I want to focus on the user interaction. Specifically, there should be a wrapper for a Dicom that exposes some get_field and get_value function that will work for any of the data types that might be found, and also a DeidDicom class that makes it much easier to run a get and replace action.

Other small changes

  • I don't require an ids field to be provided to replace_identifiers. It could be that there is a recipe, and values will be extracted for a dicom without running get_identifiers first.
  • dicom_dir() is a new function in fields.py that will essentially return dicom.dir() plus the tags keys. This allows us to easily iterate through all dicom fields (including private tags) if they weren't removed.
  • I added the cat.dcm as a testing dataset under animals, namely because it's the only example I have with private tags and is needed for testing.

That's about as far as my dinosaur brain can think through this today. Hopefully this is a good start, definitely fun to work on!

Signed-off-by: vsoch <vsochat@stanford.edu>
this means small updates to most functions to have special handling in
the case of being provided a string (field name) versus a tag (BaseTag)
that will produce a DataElement for dicom.get(field). This will need
robust testing to ensure that we dont need to be more specific about
some if these if/else (e.g., checking for a type instead). Additionally,
Ive added the dicom_dir() function under fields that will serve the
same purpose as dicom.dir() but also provide the keys to private tags,
which are type pydicom.tag.BaseTag

Signed-off-by: vsoch <vsochat@stanford.edu>
@vsoch vsoch changed the title Add/private tag groups Add support to parse private tags Mar 13, 2020
@wetzelj
Copy link
Contributor

wetzelj commented Mar 17, 2020

Sorry for my delay in getting back to this. I started a review this morning and have been trying to do some testing.

I believe that there's an issue that may require a change in approach. Currently, the dicom_dir function with fields.py returns a list of strings or tags. Since the type of the key value is variable, at present, this causes exceptions throughout because once we start operating on the Tag-keyed items, we're trying to compare a Tag object to a string.

My gut reaction says that it may be better to convert everything to a string within dicom_dir() or somehow separate the string-key lists vs the tag-key lists - but this would then cause some other necessary conversion later. Otherwise, keeping it as is, we'll have to introduce conversions at the points listed above (which is probably not a exhaustive list - Knowing that you'd have a better perspective of impacts across the board, I stopped and decided to comment when I got to these three).

Overall, I'm starting to do some testing/review now and would be more than happy to add test cases as I work through the testing.

@vsoch
Copy link
Member Author

vsoch commented Mar 17, 2020

@wetzelj I'm aware of the list of both tags and strings, we need to preserve the tag objects to correctly index the dicom still.

For

if contender in skip:
can you show a case of when skip would want to reference a private tag? As it currently is implemented, the contenders will include private tags if the requester has not removed them already and then return them in the list. Why is this an issue?

I'm also aware of

expanded_fields = [x for x in item if re.search(expanded_regexp, x)]
, but since private tags are flat (or are they?) I didn't think this was an issue. I'm not sure how we would go about expanding a private tag index. Do you have an example? An actual example in data, and then how you would want it parsed, would be needed to figure this out. Right now, it would be essentially skipped because the expanded fields don't exist.

if action["field"] in fields:
is definitely a bug, because the user can only provide strings and we are comparing against a list of strings and tags. But also note that the section you linked is for the put config, which is separate from the deid recipe (it's provided by the library).

The approach to convert everything to a string will lose the ability to index the dicom using the tag. Do you have a suggested approach then for being able to:

  • given a string, detect if it's a tag
  • generate the tag from it, and then do the interaction
  • convert back if necessary

That seems like a lot of work to have to do, because it would mean any time a tag isn't found as a string we'd need to try to test it being a tag. And then try to generate it on the fly. It seems like the approach that I took to try to maintain using the actual tag to the largest extent possible avoids that, but you are correct that we need to look out for these edge cases / errors where the tag would be missed.

@vsoch
Copy link
Member Author

vsoch commented Mar 17, 2020

Overall, I'm starting to do some testing/review now and would be more than happy to add test cases as I work through the testing.

Great thank you! I greatly appreciate this - it's really been lovely to work with you and get your help and feedback after I do a large round of changes.

@wetzelj
Copy link
Contributor

wetzelj commented Mar 17, 2020

can you show a case of when skip would want to reference a private tag?

Honestly, no. I can't. :) The problem isn't that the private tag is in skip. The problem is that the private tag becomes a contender, since dicom_dir() is used to build the contenders list. If we agree that private tags could never be used as "skips", the following would be perfectly acceptable.

if  isinstance(contender, str) and contender in skip:
            continue

but since private tags are flat (or are they?)

Unfortunately, no, they're not guaranteed to be flat. The image below shows a sample from one of our images to be deidentified where there is a private sequence. Regardless, as with the skip value above, the breaking issue here is that "x" ends up being a Tag object and we're trying to use it in the second parameter of re.search().

PrivateSequence

I could potentially see a need to say something like REMOVE ALL (0029, 1043) and have it remove the items from both of the sequence items in the image above. But to be quite honest, I think that is the lesser use-case to the need to include the private tags when performing the REMOVE ALL values:phi scenario.

is definitely a bug, because the user can only provide strings and we are comparing against a list of strings and tags. But also note that the section you linked is for the put config, which is separate from the deid recipe (it's provided by the library).

Yes, this is in the put config, so it's not 100% relevant to my recipe-based solution...

As far as a suggestion goes - As I've been trying to come up with a suggestion that I like, one (probably naive) question keeps coming up (and yes, I realize that this would upend the design, so I'm not necessarily suggesting it): Is there some benefit that we get by building the list of fields vs using the pydicom Dataset object directly? It seems to me that, rather than dealing with the string identifiers and then using the get() to actually get the tag, we could be acting directly on the Dataset object. If we were to be able to act on the Dataset object itself, the entire question of name vs Tag goes away. Everything would be a Tag.

Thoughts? Am I crazy? :)

@vsoch
Copy link
Member Author

vsoch commented Mar 17, 2020

The reason I decided to leave like this:

if contender in skip:
            continue

is because someone potentially could provide a tag in the list, in which case it would be skipped.

Is there some benefit that we get by building the list of fields vs using the pydicom Dataset object directly?

In that the dicom.dir() is just a list of strings, and that doing a dicom.get() works for a string or tag (on the Dataset object) we don't gain anything in not extracting some lookup (of strings) and tags first. What we lose, however, is being able to run a get_identifiers to preview the key value pairs, and then the replace_identifiers to move forward with it. You are correct that having the lookup is an extra layer, but I think it's useful for this purpose. Happy to consider another design if something more elegant is derived that allows for this functionality to persist.

@vsoch
Copy link
Member Author

vsoch commented Mar 17, 2020

A potential refactor that I could see is to derive a recursive model of a Field, where it would expose get, set, and (inner) handle all of the discrepancies between tag and other. But that would just be cleaning up the current PR here and doing a major refactor - if the design here is fundamentally flawed that probably won't work either. But it would be an overall better design to provide a consistent interface to interact with "some field object from a dicom Dataset"

@vsoch
Copy link
Member Author

vsoch commented Mar 17, 2020

And sharing ideas is never crazy - it's great and fun, and better to get them out for discussion than let them cause a ruckus in your frontal and parietal lobes!

@wetzelj
Copy link
Contributor

wetzelj commented Mar 17, 2020

There's still a bug with if contender in skip::
Given:
contenders = ['PatientID', (0009,0001)]
skip = ['Field1', 'Field2']

When performing this loop and on the iteration for (0009,0001), we get an exception "Cannot compare Tag with non-Tag item". If we want to enable Tags as skips, then I think we're going to need to include type comparisons as well.

I'm going to let the rest of it sit overnight and see what new ideas come to me tomorrow.

@vsoch
Copy link
Member Author

vsoch commented Mar 17, 2020

I'm looking over get_fields, and the primary use of the list skip is to prevent the lookup from returning PixelData and other fields that normally aren't parsed. Now that contenders comes from dicom_dir, I can see that this loop would be triggered. So - I'm going to test providing a list of tags to the function:

# skip is a list of private tags from your animal image
[(0019, 0010),
 (0019, 1007),
 (0019, 1021),
 (0019, 1028),
 (0019, 1030),
 (0019, 10f5),
 (0019, 10fa),
 (0019, 10fb),
 (0019, 10fc),
 (0019, 10fd),
 (0019, 10fe)]

# this should trigger an error?
get_fields(dicom, skip=skip)

but there isn't an error, I get back the full data structure (private tags included!)

{'AcquisitionContextSequence': '[]',
 'AcquisitionDate': '20200209',
...
 'ViewCodeSequence__CodeMeaning': 'antero-posterior',
 'ViewPosition': 'AP',
 'WindowCenter': '2048',
 'WindowWidth': '4096',
 (0011, 0003): 'Agfa DR'}   # this is the only private tag that wasn't in the skip list

so I'm not able to reproduce this (I don't get an exception):

When performing this loop and on the iteration for (0009,0001), we get an exception "Cannot compare Tag with non-Tag item". If we want to enable Tags as skips, then I think we're going to need to include type comparisons as well.

Could there be some differences in our versions of pydicom, or something like that? Can you provide me with an exact reproduction of this error? Also note in case you missed it that we've tested expand_field_expression to use a string to return a list of tags (works!)

print("Testing that we can also search private tags based on numbers.")
fields = expand_field_expression(
    dicom=dicom, field="contains:0019", contenders=contenders
)
[(0019, 0010),
 (0019, 1007),
 (0019, 1021),
 (0019, 1028),
 (0019, 1030),
 (0019, 10f5),
 (0019, 10fa),
 (0019, 10fb),
 (0019, 10fc),
 (0019, 10fd),
 (0019, 10fe)]

# We should have a tag object in the list now!
assert isinstance(fields[0], BaseTag)

@wetzelj
Copy link
Contributor

wetzelj commented Mar 18, 2020

Take a look at this (recreating the issue with the cat image): wetzelj@85de348

The two new test cases (in progress - for discussion purposes only) demonstrate the issues that I've seen above. Both of these cases generate exceptions and are representative of the issues that I hit when trying to run replace_identifiers on a "real" run.

I've also added a new dicom test file. The file was a Head CT. The image has been replaced and the header has been fully deidentified, but otherwise all fields included in the header maintain the structure as it was in the original image.

…ings

Signed-off-by: vsoch <vsochat@stanford.edu>
@vsoch
Copy link
Member Author

vsoch commented Mar 18, 2020

Thank you @wetzelj ! For the testing of fields, I've reproduced the issue with having expand_sequences parsing a list of fields that have private tags. The fix I added just now is to parse these as a string: 90e322b

What this means is that the list of expanded sequences, if a string for a private tag matches (unlikely) is unlikely to actually be useful for a further action. However, I don't see it being reasonable to support private tags nested in other private tags, so I think this is okay for now.

I'm not able to trigger this error locally when I run the test, however - is this leading to an error for you?


    def test_skip_list(self):
        # Demonstrates the bug with deid/dicom/fields.py - ln 222
        print("Test exclusion of items from skip list")
        from deid.dicom.fields import dicom_dir

        skip = ['PixelData']

        dicom = get_dicom(self.dataset)
        contenders = dicom_dir(dicom)        
        newcontenders = []

        for contender in contenders:
            if contender in skip:
                continue
            else:
                newcontenders.append(contender)

        with self.assertRaises(ValueError) as exc:
            pixeldata = newcontenders.index('PixelData')

        self.assertEqual("'PixelData' is not in list", str(exc.exception))

If you are getting an error (and I'm not) it is possibly related to a version of pydicom so we should test these details. Also, any changes that you add (tests, new datasets, which are greatly appreciated!) when all is said and done, you can PR to the branch here so that they are included.

@wetzelj
Copy link
Contributor

wetzelj commented Mar 19, 2020

I agree that there shouldn't be a conflict between named and non-named fields.

However, I think at some point, we will have to handle private tag sequences. If you step through the new medical image through test_conditional_expansion, you'll see that fields = get_fields(dicom, expand_sequences=True) does not expand private tag sequences. The private tags that are sequences remain as sequence Tag objects.

It's probably okay to take a pass on this issue for now. Looking at the new medical dicom image I uploaded - we would just have to be aware that if the user was processing a rule like:

REMOVE CodeValue
Would remove tags within sequences:
'CodeValue'
'CTDIPhantomTypeCodeSequence__CodeValue'
'ProcedureCodeSequence__CodeValue'
'RequestAttributesSequence__RequestedProcedureCodeSequence__CodeValue'
'RequestedProcedureCodeSequence__CodeValue'

Whereas something like:
REMOVE (0029, 0010)
Would remove only:
(0029, 0010)
and would not remove the 2 occurrences of:
(0029, 1140)__(0029, 0010)

BTW... I'm about to hop into a few meetings, but am planning to first submit a PR to this branch for just the new medical test images.

@wetzelj
Copy link
Contributor

wetzelj commented Mar 19, 2020

Regarding test_skip_list - yes, this is triggering an error. On the iteration of for contender in contenders when the contender value is (0011,0003) an exception is thrown: TypeError("Cannot compare Tag with non-Tag item")

Following is the complete list of packages with version that I'm testing against. Pydicom is still using 1.2.1 as this exact version is required in version.py. Could this be an issue with the version of python?

# Name                    Version                   Build  Channel
astroid                   2.3.1                    py36_0  
blas                      1.0                         mkl  
certifi                   2019.9.11                py36_0  
colorama                  0.4.1                    py36_0  
cycler                    0.10.0                   pypi_0    pypi
deid                      0.1.41                    dev_0    <develop>
icc_rt                    2019.0.0             h0cc432a_1  
intel-openmp              2019.4                      245  
isort                     4.3.21                   py36_0  
lazy-object-proxy         1.4.2            py36he774522_0  
matplotlib                2.1.2                    pypi_0    pypi
mccabe                    0.6.1                    py36_1  
mkl                       2019.4                      245  
mkl-service               2.3.0            py36hb782905_0  
mkl_fft                   1.0.14           py36h14836fe_0  
mkl_random                1.1.0            py36h675688f_0  
numpy                     1.17.2                   pypi_0    pypi
pandas                    0.25.1           py36ha925a31_0  
pip                       19.2.3                   py36_0  
pydicom                   1.2.1                    pypi_0    pypi
pylint                    2.4.2                    py36_0  
pynetdicom                1.4.1                    pypi_0    pypi
pyparsing                 2.4.2                    pypi_0    pypi
python                    3.6.9                h5500b2f_0  
python-dateutil           2.8.0                    py36_0  
pytz                      2019.2                     py_0  
pyyaml                    5.1.2            py36he774522_0  
setuptools                41.2.0                   py36_0  
six                       1.12.0                   pypi_0    pypi
sqlite                    3.29.0               he774522_0  
typed-ast                 1.4.0            py36he774522_0  
vc                        14.1                 h0510ff6_4  
vs2015_runtime            14.16.27012          hf0eaf9b_0  
wheel                     0.33.6                   py36_0  
wincertstore              0.2              py36h7fe50ca_0  
wrapt                     1.11.2           py36he774522_0  
yaml                      0.1.7                hc54c509_2

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

Weird, so the line for that particular error (https://github.com/pydicom/pydicom/blame/caf0db105ddf389ff5025937fd5f3aa1e61e85e7/pydicom/tag.py#L156) was added 8 years ago, so it should be triggering for me too. Based on the function that I see here it looks like it's a less than function, so it means we are asking if a tag is somehow less than another object? I'm wondering - if you open up a PR with this test, does the error reproduce? It didn't trigger for me locally when I ran the test file - only the second example did.

For the expansion of private tags - I see the issue. I think we could work on this, but do you have an example image that has a sequence of nested private tags that we could use to develop?

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

Oops and one correction to the above - we are using the equals comparator here https://github.com/pydicom/pydicom/blob/master/pydicom/tag.py#L168 and not less than (of course!). But this is still 8 years old, so the conclusions (that I should see the error too!) are the same.

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

Oh! And here is a difference - I actually have pydicom 1.4.1

pydicom==1.4.1

which is (almost) at the latest - I think I installed it a while back because I wanted to test if everything continued to work. Could perhaps this be something that would explain the differences?

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

That did it! I downgraded to 1.2.1 and I reproduced your issue:

======================================================================
ERROR: test_conditional_expansion (test_dicom_fields.TestDicomFields)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vanessa/Desktop/Code/deid/deid/tests/test_dicom_fields.py", line 123, in test_conditional_expansion
    expanded_fields = [x for x in fields if re.search(expanded_regexp, x)]
  File "/home/vanessa/Desktop/Code/deid/deid/tests/test_dicom_fields.py", line 123, in <listcomp>
    expanded_fields = [x for x in fields if re.search(expanded_regexp, x)]
  File "/home/vanessa/anaconda3/lib/python3.7/re.py", line 183, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

======================================================================
ERROR: test_skip_list (test_dicom_fields.TestDicomFields)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vanessa/anaconda3/lib/python3.7/site-packages/pydicom/tag.py", line 168, in __eq__
    other = Tag(other)
  File "/home/vanessa/anaconda3/lib/python3.7/site-packages/pydicom/tag.py", line 97, in Tag
    long_value = int(arg, 16)
ValueError: invalid literal for int() with base 16: 'PixelData'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/vanessa/Desktop/Code/deid/deid/tests/test_dicom_fields.py", line 93, in test_skip_list
    if contender in skip:
  File "/home/vanessa/anaconda3/lib/python3.7/site-packages/pydicom/tag.py", line 170, in __eq__
    raise TypeError("Cannot compare Tag with non-Tag item")
TypeError: Cannot compare Tag with non-Tag item

----------------------------------------------------------------------
Ran 3 tests in 0.060s

FAILED (errors=2)

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

We probably need to be able to still support both at this point, I'll dig into this now that I've reproduced! I love it when that happens :)

@wetzelj
Copy link
Contributor

wetzelj commented Mar 19, 2020

For the expansion of private tags - I see the issue. I think we could work on this, but do you have an example image that has a sequence of nested private tags that we could use to develop?

Yep! See PR #126. Sequence (0029, 0010) contains other private tags.

We probably need to be able to still support both at this point, I'll dig into this now that I've reproduced! I love it when that happens :)

I'm pretty sure the only reason I was back at 1.2.1 was for deid. I'm going to upgrade to 1.4.1 (or latest?) Either way... Yay! You were able to reproduce the issue! :)

wetzelj and others added 2 commits March 19, 2020 12:19
Adding new sample image created from a Head CT.
Updated number of expected test files.
currently, if you parse through a list of fields with a private tag,
for pydicom 1.2.1 you will get an error about comparison of a non tag
with a tag. This doesnt happen with 1.4.1, however 1.2.1 needs to still
be supported

Signed-off-by: vsoch <vsochat@stanford.edu>
@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

okay I just added a fix for the issue of parsing through tags (and not being able to do the comparison) 87a1103#diff-0c993b1a3139505b68bd0fe50eedc74eR223. What I'm looking at now is that dicom_dir() doesn't return private tag sequences, the reason being that get_private doesn't return sequences with private tags. I'm going to look at this next - my thinking is that if dicom.dir() can return a sequence as long as it's a string index, the get_private should also be able to return private tags that are of type Sequence (or a tag equivalent?) Haven't looked deeply into it yet.

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

One small note - it's (0029, 1140) that is private that also contains other private (the one you mentioned isn't). And i check for sequence via data_element.VR == "SQ".

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

Ah I see what is happening - so we actually are getting all private tags with get_private, however we are unwrapping them from their structure. For example, to start we have:

(0029, 1140)  [Application Header Sequence]   2 item(s) ---- 
   (0029, 0010) Private Creator                     LO: 'SIEMENS MEDCOM HEADER'
   (0029, 1041) [Application Header Type]           CS: 'VIA_CARE_KV_WIND'
   (0029, 1042) [Application Header ID]             LO: 'ORIG WINDOW VALUES FROM CAREKV'
   (0029, 1043) [Application Header Version]        LO: 'V1 20120914'
   ---------
   (0029, 0010) Private Creator                     LO: 'SIEMENS MEDCOM HEADER'
   (0029, 1041) [Application Header Type]           CS: 'SOM 5 TPOS'
   (0029, 1042) [Application Header ID]             LO: 'SOM 5 NULLPOSITION'
   (0029, 1043) [Application Header Version]        LO: 'VB10A 20030626'

And then the result returns a flattened:

[(0029, 0010) Private Creator                     LO: 'SIEMENS MEDCOM HEADER',
 (0029, 1041) [Application Header Type]           CS: 'VIA_CARE_KV_WIND',
 (0029, 1042) [Application Header ID]             LO: 'ORIG WINDOW VALUES FROM CAREKV',
 (0029, 1043) [Application Header Version]        LO: 'V1 20120914',
 (0029, 0010) Private Creator                     LO: 'SIEMENS MEDCOM HEADER',
 (0029, 1041) [Application Header Type]           CS: 'SOM 5 TPOS',
 (0029, 1042) [Application Header ID]             LO: 'SOM 5 NULLPOSITION',
 (0029, 1043) [Application Header Version]        LO: 'VB10A 20030626']

So this get_private might work as we want, because it's essentially providing a lookup for private tags.

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

So your concern is with respect to:

Would remove only:
(0029, 0010)
and would not remove the 2 occurrences of:
(0029, 1140)__(0029, 0010)

meaning that we somehow need to represent these nested tags as full fledged citizens when this action is happening... :/

@wetzelj
Copy link
Contributor

wetzelj commented Mar 19, 2020

I think that approach should work.

However, in starting to look in more detail at sequence expansion, I've noticed that there's actually a bug here. As currently implemented, when extract_sequences() flattens the sequence, it simply flattens the fields within the sequence, ignoring the fact that sequences can be arrays.

Given a header of:

(0008, 1032)	Procedure Code Sequence
  >BEGIN ITEM 1
  >(0008,0100)		Code Value : CT0001
  >(0008,0102)		Coding Scheme Designator : SIEMENS_RIS
  >(0008,0104)		Code Meaning : CT BRAIN WO IVCON
  >END ITEM 1
(0032,1032) 	Requesting Physician : HIBBARD^JULIUS^^

This get_fields()/extract_sequences() function converts this (correctly in this case) to:

ProcedureCodeSequence__CodeValue : CT0001
ProcedureCodeSequence__CodingSchemeDesignator : SIEMENS_RIS
ProcedureCodeSequence__CodeMeaning : CT BRAIN WO IVCON
Requesting_Physician :  HIBBARD^JULIUS^^

However, when there are multiple items in the sequence:

(0008, 1032)	Procedure Code Sequence
  >BEGIN ITEM 1
  >(0008,0100)		Code Value : CT0001
  >(0008,0102)		Coding Scheme Designator : SIEMENS_RIS
  >(0008,0104)		Code Meaning : CT BRAIN WO IVCON
  >END ITEM 1
  >BEGIN ITEM 2
  >(0008,0100)		Code Value : CT_6352
  >(0008,0102)		Coding Scheme Designator : AGFA_PACS
  >(0008,0104)		Code Meaning : CT BRAIN WO CONTRAST
  >END ITEM 2
(0032,1032) 	Requesting Physician : HIBBARD^JULIUS^^

The get_fields()/extract_sequences() function incorrectly converts this to below (losing the first occurrence item:

ProcedureCodeSequence__CodeValue : CT_6352
ProcedureCodeSequence__CodingSchemeDesignator : AGFA_PACS
ProcedureCodeSequence__CodeMeaning : CT BRAIN WO CONTRAST
Requesting_Physician :  HIBBARD^JULIUS^^

Instead, it is really important to maintain the integrity of these occurrences - potentially building something like this:

ProcedureCodeSequence__0__CodeValue : CT0001
ProcedureCodeSequence__0__CodingSchemeDesignator : SIEMENS_RIS
ProcedureCodeSequence__0__CodeMeaning : CT BRAIN WO IVCON
ProcedureCodeSequence__1__CodeValue : CT_6352
ProcedureCodeSequence__1__CodingSchemeDesignator : AGFA_PACS
ProcedureCodeSequence__1__CodeMeaning : CT BRAIN WO CONTRAST
Requesting_Physician :  HIBBARD^JULIUS^^

The cthead1.dcm file that I added with #126 has an instance of a private tag sequence with multiple occurrences. In my 125/new-samples branch, I've added an additional sample file with multiple-occurrence public tags. As well as a test case for this situation. Before submitting a PR, I want to make the test case less tightly coupled to the humans dataset.


All that said, do you think it would be better to create an additional issue for sequence expansion, and keep this PR to supporting private tags in non-Sequences?

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

I see what you mean about interacting with fields directly, because when we do dicom.dir() we don't get private tags, when we do dicom_dir(dicom) we get private tags, but then they are unwrapped. I'm not really sure what action to take at this point, because we are extracting flattened structures but deid doesn't have a way for the user to even reference sequences.

@wetzelj
Copy link
Contributor

wetzelj commented Mar 19, 2020

I just saw your most recent comments... (after I posted my last one).

I noticed that too about how private sequences are unwrapped. Ultimately, I think there is quite a bit outstanding with sequences in general.

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

But @wetzelj please see the example that I just posted - #125 (comment) at least in the case of get_private we do preserve each of entries indexed with 0 and 1. In the case that they are converted to strings, however, I can see this being lost.

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

I can help only to the extent there is a reasonable idea / discussion for how to implement a change or fix. If it requires a total refactor of the library that sort of sucks, but I'm open to the idea.

@wetzelj
Copy link
Contributor

wetzelj commented Mar 19, 2020

I understand what you're saying in #125 (comment), but with the multiple occurrences in the lookup, looking up (0029, 1041) would result in multiple values. Maybe this is okay?

Either way, do you think that it would be worthwhile to forego the sequence issue to a new issue and focus this PR on non-sequence private tags? From my perspective, being able to handle non-sequence private tags has huge benefit.

@vsoch
Copy link
Member Author

vsoch commented Mar 19, 2020

@wetzelj this work is all being done on your behalf, so I will go with whatever you prefer. If the library is improved from where it was, I say we should move forward. But I'd still like to work with you (in likely another scoped issue) to address these other issues, because I won't sleep at night if I've realized that my software sucks. :P

@vsoch
Copy link
Member Author

vsoch commented Mar 20, 2020

Here is something to think about - how important is it to have separate get and set identifiers functions? Because I can imagine implementing a version that removes the intermediate dictionary layer and just interacts with the pydicom Dataset. Instead of deriving values and looping through later, we figure out how to do it once. We’d still need to balance that with some ability to define custom functions and lists, however.

@wetzelj
Copy link
Contributor

wetzelj commented Mar 20, 2020

Your software is great! These are just some little tweaks to make it more awesome than it already is.

Regarding separate get/set identifiers functions - I'm formulating my thoughts on this, I want to make sure that I have a coherent, well thought out proposal before I respond.

@vsoch
Copy link
Member Author

vsoch commented Mar 20, 2020

Sounds good! Let me know after you'd had some time to think:

  1. what still remains for this PR, is the library overall improved? Worth a release?
  2. what your thoughts are for next steps

I do think it would be worth a try to maintain the current get/set identifiers, but write a class that takes in dicom files, custom functions and a recipe, and can just be used to clean the headers. It would be akin to the DicomCleaner but for header (and not pixels).

@vsoch
Copy link
Member Author

vsoch commented Mar 28, 2020

Ping! Hope you are doing ok :)

@vsoch
Copy link
Member Author

vsoch commented Mar 28, 2020

This PR isn't perfect, but it improves the library quite a bit, so I'm going to merge. Please open a new issue @wetzelj when you are ready to discuss next steps.

@vsoch vsoch merged commit 72a7daf into master Mar 28, 2020
@vsoch vsoch deleted the add/private-tag-groups branch March 28, 2020 16:01
@vsoch vsoch mentioned this pull request Mar 29, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants