Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ability to specify a filter with REMOVE #124

Merged
merged 1 commit into from
Mar 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ and **Merged pull requests**. Critical items to know are:
Referenced versions in headers are tagged on Github, in parentheses are for pypi.

## [vxx](https://github.com/pydicom/deid/tree/master) (master)
- adding filters (contains through missing) for REMOVE (0.1.41)
- adding support for tag groups (values, fields) (0.1.4)
- Adding option to provide function to remove (must return boolean) (0.1.38)
- removing matplotlib version requirement (0.1.37)
Expand Down
10 changes: 9 additions & 1 deletion deid/config/standards.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
groups = ["values", "fields"]
group_actions = ("FIELD", "SPLIT")

# Valid actions for a filter action
# Valid actions for a field filter action
filters = (
"contains",
"notcontains",
Expand All @@ -45,3 +45,11 @@
"present",
"empty",
)

# valid actions for a value filter
value_filters = (
"contains",
"notcontains",
"equals",
"notequals",
)
21 changes: 17 additions & 4 deletions deid/dicom/actions.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,16 @@
"""

from deid.logger import bot
from deid.config.standards import actions as valid_actions
from deid.config.standards import actions as valid_actions, value_filters

from .fields import expand_field_expression, find_by_values
from deid.dicom.fields import expand_field_expression, find_by_values
from deid.dicom.filter import apply_filter
from deid.dicom.tags import add_tag, update_tag, blank_tag, remove_tag

from deid.utils import get_timestamp, parse_value

from pydicom.dataset import Dataset
from pydicom.sequence import Sequence
from .tags import add_tag, update_tag, blank_tag, remove_tag

import re

Expand Down Expand Up @@ -234,7 +235,7 @@ def _remove_tag(dicom, item, field, value=None):

# The user can optionally provide a function to return a boolean
if re.search("[:]", value):
value_type, value_option = value.split(":")
value_type, value_option = value.split(":", 1)
if value_type.lower() == "func":

# An item must be provided
Expand All @@ -255,6 +256,18 @@ def _remove_tag(dicom, item, field, value=None):
"function %s returned an invalid type %s. Must be bool."
% (value_option, type(do_removal))
)

# A filter such as contains, notcontains, equals, etc.
elif value_type.lower() in value_filters:

# These functions are known to return boolean
do_removal = apply_filter(
dicom=dicom,
field=field,
filter_name=value_type,
value=value_option or None,
)

else:
bot.exit("%s is an invalid variable type for REMOVE." % value_type)

Expand Down
19 changes: 19 additions & 0 deletions deid/tests/test_dicom_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,25 @@ def generate_uid(item, value, field):
updated = perform_action(dicom=dicom, action=ACTION, item=item)
self.assertEqual(updated.PatientID, "pancakes")

# Test each of filters for contains, not contains, equals, etc.
dicom = get_dicom(self.dataset)

print("Testing contains, equals, and empty action with REMOVE")
self.assertTrue("ReferringPhysicianName" in dicom)
REMOVE = {"action": "REMOVE", "field": "ALL", "value": "contains:Dr."}
dicom = perform_action(dicom=dicom, action=REMOVE)
self.assertTrue("ReferringPhysicianName" not in dicom)

self.assertTrue("InstitutionName" in dicom)
REMOVE = {"action": "REMOVE", "field": "ALL", "value": "equals:STANFORD"}
dicom = perform_action(dicom=dicom, action=REMOVE)
self.assertTrue("InstitutionName" not in dicom)

self.assertTrue("StudyID" in dicom)
REMOVE = {"action": "REMOVE", "field": "ALL", "value": "empty"}
dicom = perform_action(dicom=dicom, action=REMOVE)
self.assertTrue("StudyID" not in dicom)

def test_jitter_timestamp(self):

from deid.dicom.actions import jitter_timestamp
Expand Down
2 changes: 1 addition & 1 deletion deid/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

"""

__version__ = "0.1.4"
__version__ = "0.1.41"
AUTHOR = "Vanessa Sochat"
AUTHOR_EMAIL = "vsochat@stanford.edu"
NAME = "deid"
Expand Down
30 changes: 30 additions & 0 deletions docs/_docs/user-docs/recipe-headers.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ True if this is the case. The dicom is the dicom file (read in with Pydicom) tha
with (in the example above we grab the `PatientName`).

#### Header

We know that we are dealing with functions relevant to the header of the image
by way of the `%header` section. This section can have a series of commands called
actions that tell the software how to deal with different fields. For the header
Expand Down Expand Up @@ -224,6 +225,7 @@ JITTER PatientBirthDate -31
```

##### Field Expansion

In some cases, it might be extremely tenuous to list every field ending in the same thing,
to perform the same action for. For example:

Expand Down Expand Up @@ -361,6 +363,34 @@ REPLACE PatientID var:id
REPLACE InstanceSOPUID var:source_id
```


##### Value Expansion

These same filters can also be used with any action that is considered a boolean,
for example, the `REMOVE` tag. As we showed previously, you can remove using
a filter like "contains" to select some subset of fields:

```
REMOVE contains:Patient
```

which would remove all fields that contain "Patient." What if we want to perform
this same kind of check, but with a value? For example, let's say that we have
a regular expression to describe a number, and we want to remove any field
that matches. We could do:

```
REMOVE ALL contains:(\d{7,0})
```

Would parse through ALL fields, and remove those that contain a match to the regular
expression. All supported expanders include:

- contains
- notcontains
- equals
- notequals

Now that you know how configuration works, you have a few options.
You can learn how to define groups of tags based on fields or values in [groups]({{ site.baseurl }}/user-docs/recipe-groups/),
or if you want to write a text file and get going with cleaning your files, you should
Expand Down