-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
action order on same field #197
Comments
I would try to follow the example and do a REMOVE ALL and then add the fields you want to keep. If that is successful we can update the docs to reflect it - I think the spirit of the docs is that if you were to have all those commands in the same file for the same field, it's saying that Please test it out for your use case and report back! |
I tested on
and it ends up with everything removed. Same if I swap lines. So no idea how can I only concentrate on tags I need with dropping everything that I don't need with one-liner (aka whitelisting)? Writing documentation for open-source project based on blackbox testing sounds really weird. |
ah then the documentation is correct. :)
I don't understand this comment (it seems like it could be a bit insulting) but I'll just say that if you see an issue in the docs you are welcome to suggest changes. In this case the docs are correct, so there's that. In practice it's not hugely practical to just wipe every header field from a dicom - many of them are needed. I would suggest you look at your data and come up with reasonable filters (e.g., removing everything with a date, or name, the library supports regular expression matching for that). Insulting the maintainers of the library for their documentation is not going to get you very far. Just FYI. :) |
I'm sorry it it sounded insulting, just in my world documentation is based on actual code logic and is updated together with code changes, otherwise other teams depending on product will struggle. It is not exactly an open-source world, but I am trying to follow same practices when contributing into OSS :) As for this specific example in docs:
which doesn't make too much sense (even just-added Sadly my usecase is exactly about dropping everything except for tags needed to display image, and uuids required to distinguish between studies are being anonymized. The rule we got from healthtech guys from google is "use whitelists to avoid GDPR/etc problems". In total it makes about ~30 fields to be left in file. In one of example recipes I found there is 240 rows of tags being replaced - I just don't want to manage such a big list, if I could go with ~30 lines whitelist. What do you think if I i.e. raise PR with whitelisting support? Not sure yet, how, if I would like to keep existing
meaning that tag is removed unless any of existing actions is added to field (existing actions become whitelisting-actions). Smth like that. Would it fit into deid roadmap? |
No worries @glebsts - sometimes it's hard to tell with communication on these boards. Your use case is new, and I think we could definitely support it with the right tweaks, and it would also be good to fix that (now dated and not working) section of the docs. Note that we do have a "protected" set of fields https://github.com/pydicom/deid/blob/master/deid/dicom/config.json that would hurt the integrity of the dicom file if removed, so if you see something not being removed that you would expect, take a look there. We have whitelisting for the filter section of deid, so logically we could support some kind of the same thing for header fields. But I'm wondering can we handle this case just with the current tags? E.g., if REMOVE ALL and then a series of KEEP were to work, wouldn't that be sufficient? I ask because it would be messy in the format of the deid.dicom to have a header with both action statements and filters, especially when the action statements can already have regular expressions. Let me think more about it - and feel free to write up a spec in the meantime. I'm going into a few hours of work and ping me again if I don't remember to respond! |
Oh and could you give me an example (with comments) about how a blacklist/whitelist action would work? |
I myself go to sleep now, it is midnight here in GMT+3 DTS :) Will get back to OSS in couple of days.
But I was not yet too deep in deid internals, not sure if I can first iterate whole recipe and build list of fields to keep and check it upon every remove action. Will see.. |
To answer your first question - yes we would absolutely be open to a PR to make this change - looking forward to chatting in a few days when you are back! |
Hi there, quite long read for you - tests and my three statements and suggestions I'd like to implement. Vanessa, pls add your opinion on those before I start digging in code. So I added some tests to
that produces following:
and fails with
that produces following:
and fails with
that produces following:
and fails with Here I have following statements (order not related to tests):
|
Thank you for the detailed write-up - here are my thoughts so you can move forward:
I agree - I would remove the REMOVE > ADD tag from the docs, as I believe it reflects the original implementation of deid where we parse over actions first, decided on some "final" state and then proceeded. This version moves through the rules top to bottom and essentially follows them, so it makes sense the user can ADD, REMOVE, ADD, REMOVE in infinitude. It indeed does mimic replace, at least with that sequence of actions. I don't think we should change this behavior, but should just fix that section in the docs.
If ADD/REMOVE can be thought of as changing with state, KEEP would be akin to a whitelist, as to say "no matter what happens, do not remove this tag." And that probably means it is allowed to be edited (e.g., KEEP would work with REPLACE or ADD but just not remove). The easiest way to implement this would be to parse KEEP first, and then ensure if something is to be REMOVE it is not in the KEEP set.
I agree, and probably in the docs we would want to stress that most actions are done based on current state, but keep is more global (and thus doesn't matter where it's used).
I would absolutely love an improvement to the docs here! I likely documented the pattern to use a regex but not in context of a particular tag. |
Thank you for comments.
Could you please explain what do you mean by "based on current state"? In general I take that as "Yes" and will proceed with implementing whitelists (aka "KEEP overrules REMOVE") and updating docs. |
Sure! So imagine that we have a box with different colored balls (this is the dicom file, and each ball is a header entry). If we then have a list of actions, we can move down the list one by one and interact with the balls. E.g.,:
So those are stateful actions because using them at some timepoint will change the state of what is in the box. So KEEP would be a global command (not stateful) so if you did the same:
So (just generally) when I'm comparing actions like ADD/REMOVE I'm thinking of those as stateful because they are scoped to a specific point in time. A KEEP, however, is not specific to a point and time - no matter where you use it in your recipe it is going to "pin" some global state. Does that kind of make sense? |
Yes, thank you, nice example :) I'll think how to put it into words without using balls, but saying in advance - feel free to patch my wording, as you seem to be native speaker :) |
Haha definitely! It's probably not the best example - they always used colored balls in bags as examples in statistics courses, so I'm not sure I did it justice 😆 |
Hi there, |
I think the approach is fairly straight forward - the one issue I see is that self.skip is actually a property, so I'm not sure it makes sense to append to that. Perhaps that logic should be moved into the property? Also - are we sure that KEEP will only be relevant for a field? E.g., we are just appending the KEEP field to skips. What if there is a more fine tuned logic? |
As for "more logic" I could only come up with some expansion on |
It functionally works, but if someone wants to access the consistent list of things to skip, it doesn't make sense to have custom logic separate in the parse() function and then essentially return a different answer. This way also doesn't allow the disable_skips attribute to apply to the current recipe. So I would do: @property
def skip(self):
"""
Return a list of fields to skip, as defined in the self.config
"""
skips = []
if self.disable_skips:
return skips
if self.config:
skips = self.config.get("get", {}).get("skip", {})
if self.recipe.deid is not None:
for action in self.recipe.get_actions(action="KEEP"):
skips.append(action.get("field"))
return skips With the current functionality we:
Let me know if there is anything you want to discuss. |
Thank you for well-structured answer :)
(might be slightly shortened, but logically same) Would like to hear your opinion on that :) |
That's a good point - perhaps we should have two properties then? self.skip remains as it is now, and then self.keep could be the similar functionality, but run the logic that you have now in the parse function. That way, we'd have the logic separate but still not be appending to a property, and the other issues I raised. E.g., @property
def keep(self):
keeps = []
if self.recipe.deid is not None:
for action in self.recipe.get_actions(action="KEEP"):
keeps.append(action.get("field"))
return keeps And then in the function call to get_fields just include both of them: def get_fields(self, expand_sequences=True):
"""expand all dicom fields into a list, where each entry is
a DicomField. If we find a sequence, we unwrap it and
represent the location with the name (e.g., Sequence__Child)
"""
if not self.fields:
self.fields = get_fields(
dicom=self.dicom,
expand_sequences=expand_sequences,
seen=self.seen,
skip=self.skip + self.keep,
)
return self.fields as including the list of keeps as skips does make sense for applying the keep logic! |
yes, makes sense.. I'll check disable_skips usage and go the way you've suggested soon, together with docs update |
@vsoch you are welcome to approve (or not approve :)) and merge |
* #197 'remove and keep combo' - added tests that describe situation where remove overrules keep, and also explicitly show that except is using value as substring/regex match * #197 'remove and keep combo' - added quite straightforward "add fields marked with KEEP to skiplist", also fixed docstring on parser#perform_action and removed unused parameter, fixed typo in random place * #197 'remove and keep combo' - quotes changed as per code format convention * #197 'remove and keep combo' - changed keep list to be separate property, updated docs on action order, added two tests for ensuring blank action order * #197 'remove and keep combo' - changed keep fields collector to be more pythonic * #197 'remove and keep combo' - more safeguards in keep fields collector, clarified docs upon field that should not be None * #197 'remove and keep combo' - changelog update, typo fix, version bump
Closed with #198 |
#197 'remove and replace/jitter combo' - added once-evaluated list of…
* #193 'bump pydicom' - bumped pydicom to 2.2.2, local tests green, changelog update, added PyCharm to gitignore * WIP 197 remove + keep combo should keep (#198) * #197 'remove and keep combo' - added tests that describe situation where remove overrules keep, and also explicitly show that except is using value as substring/regex match * #197 'remove and keep combo' - added quite straightforward "add fields marked with KEEP to skiplist", also fixed docstring on parser#perform_action and removed unused parameter, fixed typo in random place * #197 'remove and keep combo' - quotes changed as per code format convention * #197 'remove and keep combo' - changed keep list to be separate property, updated docs on action order, added two tests for ensuring blank action order * #197 'remove and keep combo' - changed keep fields collector to be more pythonic * #197 'remove and keep combo' - more safeguards in keep fields collector, clarified docs upon field that should not be None * #197 'remove and keep combo' - changelog update, typo fix, version bump * #197 'remove and replace/jitter combo' - added once-evaluated list of fields being jittered or replaced to prevent them from being removed, tests, extended description of @keep property, changelog update, typo fix, version bump * #197 'remove and replace/jitter combo' - run black * Update deid/version.py - agreed to bump to 0.3 * Bug with return of derive_ctp_coordinate Downstream code expects coordinate definitions to be a comma separated list of values, not a list of ints. * Update deid/dicom/parser.py Committing suggestion. * change derive_ctp_coordinates to private function Co-authored-by: Gleb <gleb.stsenov@gmail.com> Co-authored-by: Vanessasaurus <814322+vsoch@users.noreply.github.com>
* add support for translating ctp coordinates * version bump * pin numpy and python 3.7, bump minor version * Add/remove keep behavior (Previous PR #199 - 0.2.29 rc) (#237) * #193 'bump pydicom' - bumped pydicom to 2.2.2, local tests green, changelog update, added PyCharm to gitignore * WIP 197 remove + keep combo should keep (#198) * #197 'remove and keep combo' - added tests that describe situation where remove overrules keep, and also explicitly show that except is using value as substring/regex match * #197 'remove and keep combo' - added quite straightforward "add fields marked with KEEP to skiplist", also fixed docstring on parser#perform_action and removed unused parameter, fixed typo in random place * #197 'remove and keep combo' - quotes changed as per code format convention * #197 'remove and keep combo' - changed keep list to be separate property, updated docs on action order, added two tests for ensuring blank action order * #197 'remove and keep combo' - changed keep fields collector to be more pythonic * #197 'remove and keep combo' - more safeguards in keep fields collector, clarified docs upon field that should not be None * #197 'remove and keep combo' - changelog update, typo fix, version bump * #197 'remove and replace/jitter combo' - added once-evaluated list of fields being jittered or replaced to prevent them from being removed, tests, extended description of @keep property, changelog update, typo fix, version bump * #197 'remove and replace/jitter combo' - run black * Update deid/version.py - agreed to bump to 0.3 * Bug with return of derive_ctp_coordinate Downstream code expects coordinate definitions to be a comma separated list of values, not a list of ints. * Update deid/dicom/parser.py Committing suggestion. * change derive_ctp_coordinates to private function Signed-off-by: vsoch <vsoch@users.noreply.github.com> Co-authored-by: Gleb <gleb.stsenov@gmail.com> Co-authored-by: Vanessasaurus <814322+vsoch@users.noreply.github.com> Signed-off-by: vsoch <vsoch@users.noreply.github.com> Co-authored-by: wetzelj <wetzelj@ccf.org>
Hello,
I am confused by following in docs:
Also
But at the same time there is an example of
Here I naturally make assumption that
ALL
meansALL
, so each field is removed, and according to rules quoted above,REMOVE
rules overKEEP
, so tag kinda has to be removed.How does it work? How is
PixelData
expected to be kept?In my usecase I want to drop all information except for certain tags, part of them I want to keep, part of them I want to replace with my custom func.
I don't want to manage full set of DICOM tags, but rather remove all and preserve only certain subset.
How?
The text was updated successfully, but these errors were encountered: