Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#197 'remove and replace/jitter combo' - added once-evaluated list of… #204

Merged
merged 3 commits into from
Mar 18, 2022
Merged

#197 'remove and replace/jitter combo' - added once-evaluated list of… #204

merged 3 commits into from
Mar 18, 2022

Conversation

glebsts
Copy link
Contributor

@glebsts glebsts commented Mar 17, 2022

Description

Whitelisting was not possible for replacing. Now fields being replaced or jittered are excluded from removal (either REMOVE ALL or REMOVE SomeField).

It is done using separate property and private variable to keep result (lazy eval).

Made clarification regarding KEEP list - those fields are being removed from field list to process, meaning that KEEP SomeField + REPLACE SomeField ... won't work, as kept field is being excluded from list of fields to process for other actions.

Added tests.
Bumped release candidate number and changelog (+small fix for previous change)

Minor docs update.

Related issues: #197 , #198

Please include a summary of the change(s) and if relevant, any related issues above.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • [?] My code follows the style guidelines of this project
  • tests added
  • documentation updated

Open questions

codestyle on excluded_from_deletion property is discussable, opened for suggestions.

… fields being jittered or replaced to prevent them from being removed, tests, extended description of @keep property, changelog update, typo fix, version bump
@glebsts glebsts mentioned this pull request Mar 17, 2022
@vsoch
Copy link
Member

vsoch commented Mar 17, 2022

This looks perfecto! And it makes total sense - if you have an opinion about changing a field, it doesn't make sense to then remove it later. I'm wondering - is there any reason we wouldn't want to apply this to ADD too? Pinging @wetzelj too if he wants to take a look!

@glebsts
Copy link
Contributor Author

glebsts commented Mar 17, 2022

is there any reason we wouldn't want to apply this to ADD too

https://github.com/pydicom/deid/blob/0.2.29-rc/deid/tests/test_replace_identifiers.py#L749
test_remove_all_add_field_compounding_should_add
So it is kinda applied already, and docs say it :)

Copy link
Member

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. So this LGTM. I'm +1 to merge into the current rc branch, and @wetzelj would you prefer to review here (and then need to look again) or just review once after I merge here? I want your feedback / review because you are another user of deid (and you are super helpful and good with this line of stuffs!)

@wetzelj
Copy link
Contributor

wetzelj commented Mar 18, 2022

If it's okay with you two, I'll review it all once it's merged into 0.2.29-rc and I've been able to upgrade my application to that version.

To confirm my understanding of the functionality changes:

  • With the original changes (currently in 0.2.29-rc), the KEEP command takes precedence over all other commands within the recipe. I'm 100% on board with this.
  • With this PR (#197 'remove and replace/jitter combo' - added once-evaluated list of… #204), there's a change in recipe behavior regarding REMOVE, REPLACE, & JITTER.
    • In the past these were always applied sequentially - a REPLACE (or JITTER or ADD) followed by a REMOVE would remove the field.
    • Now, a REPLACE followed by a REMOVE would ignore the REMOVE. What this means is that once a field is referenced in a recipe, by a ADD, REPLACE, or JITTER, it's "locked" in the header and will persist (regardless of where that command is in relationship to other commands).

Assuming my understanding is correct, I think the second bullet above makes sense, and I'm on board with it too, even though its a change to the way current functional recipes currently work in 0.2.28. This change would help to ensure that we have cleaner recipes with less redundant and/or irrelevant commands.

Is my understanding correct?

@glebsts
Copy link
Contributor Author

glebsts commented Mar 18, 2022

@wetzelj your understood it correctly.
My recipe before was

REMOVE FieldA
REMOVE FieldC
REPLACE FieldB 'foo'

As my task is maximum anonymization and only leaving study data, and there are hundreds of tags, such recipe was very long and error-prone.

Now with this change I can do

REMOVE ALL
REPLACE FieldB 'foo'

only setting to keep what I need. Before it was blacklist, now it it whitelist, which is better,

@wetzelj
Copy link
Contributor

wetzelj commented Mar 18, 2022

Got it... I think this totally makes sense and is a good change to make.

@vsoch - Given the change in behavior for recipes, do you think it would warrant bumping this to 0.3.0 instead of 0.2.29?

@vsoch
Copy link
Member

vsoch commented Mar 18, 2022

yep I'm on board with that!

deid/version.py Outdated Show resolved Hide resolved
Co-authored-by: Vanessasaurus <814322+vsoch@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants