Skip to content

Commit

Permalink
adding ability to specify a filter with REMOVE
Browse files Browse the repository at this point in the history
Currently, REMOVE <field> can handle a func:, however since we are
returning a boolean, we can easily also support the already existing
filters that return a boolean (contains, equals, etc.) This
change will allow for those filters.

Signed-off-by: vsoch <vsochat@stanford.edu>
  • Loading branch information
vsoch committed Mar 12, 2020
1 parent 5e7175f commit 633a4bd
Show file tree
Hide file tree
Showing 6 changed files with 77 additions and 6 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ and **Merged pull requests**. Critical items to know are:
Referenced versions in headers are tagged on Github, in parentheses are for pypi.

## [vxx](https://github.com/pydicom/deid/tree/master) (master)
- adding filters (contains through missing) for REMOVE (0.1.41)
- adding support for tag groups (values, fields) (0.1.4)
- Adding option to provide function to remove (must return boolean) (0.1.38)
- removing matplotlib version requirement (0.1.37)
Expand Down
10 changes: 9 additions & 1 deletion deid/config/standards.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
groups = ["values", "fields"]
group_actions = ("FIELD", "SPLIT")

# Valid actions for a filter action
# Valid actions for a field filter action
filters = (
"contains",
"notcontains",
Expand All @@ -45,3 +45,11 @@
"present",
"empty",
)

# valid actions for a value filter
value_filters = (
"contains",
"notcontains",
"equals",
"notequals",
)
21 changes: 17 additions & 4 deletions deid/dicom/actions.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,16 @@
"""

from deid.logger import bot
from deid.config.standards import actions as valid_actions
from deid.config.standards import actions as valid_actions, value_filters

from .fields import expand_field_expression, find_by_values
from deid.dicom.fields import expand_field_expression, find_by_values
from deid.dicom.filter import apply_filter
from deid.dicom.tags import add_tag, update_tag, blank_tag, remove_tag

from deid.utils import get_timestamp, parse_value

from pydicom.dataset import Dataset
from pydicom.sequence import Sequence
from .tags import add_tag, update_tag, blank_tag, remove_tag

import re

Expand Down Expand Up @@ -234,7 +235,7 @@ def _remove_tag(dicom, item, field, value=None):

# The user can optionally provide a function to return a boolean
if re.search("[:]", value):
value_type, value_option = value.split(":")
value_type, value_option = value.split(":", 1)
if value_type.lower() == "func":

# An item must be provided
Expand All @@ -255,6 +256,18 @@ def _remove_tag(dicom, item, field, value=None):
"function %s returned an invalid type %s. Must be bool."
% (value_option, type(do_removal))
)

# A filter such as contains, notcontains, equals, etc.
elif value_type.lower() in filters:

# These functions are known to return boolean
do_removal = apply_filter(
dicom=dicom,
field=field,
filter_name=value_type,
value=value_option or None,
)

else:
bot.exit("%s is an invalid variable type for REMOVE." % value_type)

Expand Down
19 changes: 19 additions & 0 deletions deid/tests/test_dicom_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,25 @@ def generate_uid(item, value, field):
updated = perform_action(dicom=dicom, action=ACTION, item=item)
self.assertEqual(updated.PatientID, "pancakes")

# Test each of filters for contains, not contains, equals, etc.
dicom = get_dicom(self.dataset)

print("Testing contains, equals, and empty action with REMOVE")
self.assertTrue("ReferringPhysicianName" in dicom)
REMOVE = {"action": "REMOVE", "field": "ALL", "value": "contains:Dr."}
dicom = perform_action(dicom=dicom, action=REMOVE)
self.assertTrue("ReferringPhysicianName" not in dicom)

self.assertTrue("InstitutionName" in dicom)
REMOVE = {"action": "REMOVE", "field": "ALL", "value": "equals:STANFORD"}
dicom = perform_action(dicom=dicom, action=REMOVE)
self.assertTrue("InstitutionName" not in dicom)

self.assertTrue("StudyID" in dicom)
REMOVE = {"action": "REMOVE", "field": "ALL", "value": "empty"}
dicom = perform_action(dicom=dicom, action=REMOVE)
self.assertTrue("StudyID" not in dicom)

def test_jitter_timestamp(self):

from deid.dicom.actions import jitter_timestamp
Expand Down
2 changes: 1 addition & 1 deletion deid/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"""

__version__ = "0.1.4"
__version__ = "0.1.41"
AUTHOR = "Vanessa Sochat"
AUTHOR_EMAIL = "vsochat@stanford.edu"
NAME = "deid"
Expand Down
30 changes: 30 additions & 0 deletions docs/_docs/user-docs/recipe-headers.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ True if this is the case. The dicom is the dicom file (read in with Pydicom) tha
with (in the example above we grab the `PatientName`).

#### Header

We know that we are dealing with functions relevant to the header of the image
by way of the `%header` section. This section can have a series of commands called
actions that tell the software how to deal with different fields. For the header
Expand Down Expand Up @@ -224,6 +225,7 @@ JITTER PatientBirthDate -31
```

##### Field Expansion

In some cases, it might be extremely tenuous to list every field ending in the same thing,
to perform the same action for. For example:

Expand Down Expand Up @@ -361,6 +363,34 @@ REPLACE PatientID var:id
REPLACE InstanceSOPUID var:source_id
```


##### Value Expansion

These same filters can also be used with any action that is considered a boolean,
for example, the `REMOVE` tag. As we showed previously, you can remove using
a filter like "contains" to select some subset of fields:

```
REMOVE contains:Patient
```

which would remove all fields that contain "Patient." What if we want to perform
this same kind of check, but with a value? For example, let's say that we have
a regular expression to describe a number, and we want to remove any field
that matches. We could do:

```
REMOVE ALL contains:(\d{7,0})
```

Would parse through ALL fields, and remove those that contain a match to the regular
expression. All supported expanders include:

- contains
- notcontains
- equals
- notequals

Now that you know how configuration works, you have a few options.
You can learn how to define groups of tags based on fields or values in [groups]({{ site.baseurl }}/user-docs/recipe-groups/),
or if you want to write a text file and get going with cleaning your files, you should
Expand Down

0 comments on commit 633a4bd

Please sign in to comment.