Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add checks for valid value #1543

Merged
merged 21 commits into from
Jan 4, 2022

Conversation

mrbean-bremen
Copy link
Member

@mrbean-bremen mrbean-bremen commented Nov 30, 2021

  • add config option to raise on writing invalid values
    (True by default)

This is a go at some validation of values, currently only done for the value length. This could be expanded by checks for the contained character set, the number of values, and some VR-specific validation, but I first wanted to check if this is what we want and will wait for some feedback before going on.

I introduced a separate config option for writing values which is True by default, and had to added the raise_on_error argument to some constructors, as we don't know if we are reading or writing inside the data elements. This introduces some behavior changes (mainly on writing), and some potential performance degradation due to the checks (haven't tested it yet).

Tasks

  • Unit tests added that reproduce the issue or prove feature is working
  • Fix or feature added
  • Code typed and mypy shows no errors
  • Documentation updated (if relevant)
    • No warnings during build
    • Preview link (CircleCI -> Artifacts -> doc/_build/html/index.html)
  • Unit tests passing and overall coverage the same or better

@codecov
Copy link

codecov bot commented Nov 30, 2021

Codecov Report

Merging #1543 (13cd058) into master (01a2fa5) will increase coverage by 0.08%.
The diff coverage is 99.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1543      +/-   ##
==========================================
+ Coverage   97.42%   97.51%   +0.08%     
==========================================
  Files          66       66              
  Lines       10185    10352     +167     
==========================================
+ Hits         9923    10095     +172     
+ Misses        262      257       -5     
Impacted Files Coverage Δ
pydicom/charset.py 96.04% <91.66%> (+1.99%) ⬆️
pydicom/config.py 97.43% <100.00%> (+1.18%) ⬆️
pydicom/dataelem.py 98.45% <100.00%> (+0.02%) ⬆️
pydicom/dataset.py 99.05% <100.00%> (-0.01%) ⬇️
pydicom/dicomdir.py 100.00% <100.00%> (ø)
pydicom/filereader.py 94.34% <100.00%> (ø)
pydicom/filewriter.py 97.82% <100.00%> (+0.02%) ⬆️
pydicom/multival.py 100.00% <100.00%> (ø)
pydicom/uid.py 96.36% <100.00%> (+0.18%) ⬆️
pydicom/valuerep.py 99.34% <100.00%> (+0.14%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01a2fa5...13cd058. Read the comment docs.

@mrbean-bremen mrbean-bremen force-pushed the validate-value branch 3 times, most recently from 7496dd2 to 9764f29 Compare November 30, 2021 20:34
pydicom/uid.py Outdated

Returns
-------
pydicom.uid.UID
The UID object.
"""
if isinstance(val, str):
validate_value("UI", val, raise_on_error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UI can be validated with UID.is_valid

pydicom/valuerep.py Outdated Show resolved Hide resolved
pydicom/valuerep.py Outdated Show resolved Hide resolved
pydicom/valuerep.py Outdated Show resolved Hide resolved
pydicom/valuerep.py Outdated Show resolved Hide resolved
pydicom/valuerep.py Outdated Show resolved Hide resolved
pydicom/valuerep.py Outdated Show resolved Hide resolved
@pep8speaks
Copy link

pep8speaks commented Dec 4, 2021

Hello @mrbean-bremen! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-01-03 19:21:01 UTC

@mrbean-bremen mrbean-bremen force-pushed the validate-value branch 2 times, most recently from ee51ca0 to c789be5 Compare December 4, 2021 12:33

def __init__(self) -> None:
self._reading_validation_mode: int = WARN_ON_ERROR
self.writing_validation_mode = RAISE_ON_ERROR
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going back and forth with how to define this settings, I'm still not sure what the correct way is. I prefer to have the settings in a class, as this makes it possible to change them later using properties, while for module-level attributes this is not possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good but would probably need to test it in practice to see how it all works out. But I also have some ideas to go over, I'll put those in a separate comment because they are not only related to this bit of code.

WARN_ON_ERROR = 1
"""A warning is issued on a value validation error."""

RAISE_ON_ERROR = 2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had these as an enum first, but that caused problems with pickling.

return self.type_constructor(x) if x != '' else cast(_ItemType, x)

return ( # type: ignore[no-any-return]
self.type_constructor( # type: ignore[has-type]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to avoid these warnings.

Copy link
Member

@scaramallion scaramallion Jan 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the optional argument to type_constructor. You could define a Protocol, or just leave it. In 3.10 you can use ParamSpec, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it for the time being (actually missed that comment before...).

pydicom/tests/test_fileset.py Show resolved Hide resolved
@@ -941,7 +941,7 @@ def test_neither_ds_nor_palette_raises(self):
with pytest.raises(ValueError, match=msg):
apply_color_lut(ds.pixel_array)

def test_palette_unknown_raises(self):
def test_palette_unknown_raises(self, disable_value_validation):
"""Test using an unknown `palette` raise an exception."""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabled value validation in all tests that would cause warnings to avoid having them in the test log.

@@ -675,6 +670,13 @@ def test_hash(self):


class TestDSdecimal:
@pytest.fixture
def allow_ds_float(self):
old_value = config.allow_DS_float
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to add this to several tests as they only worked accidentally due to the execution order.

@mrbean-bremen mrbean-bremen force-pushed the validate-value branch 2 times, most recently from e7f9a0f to 72e548c Compare December 11, 2021 18:28
@mrbean-bremen mrbean-bremen changed the title [WIP] Add check for valid length of a value [MRG] Add check for valid length of a value Dec 11, 2021
@mrbean-bremen
Copy link
Member Author

This is ready for review. There are a few things I'm not sure about:

  • the config values - I'm still not happy with that, but I also not sure if adding more boolean attributes is a good idea, especially as it makes upwards compatibility for future changes difficult
  • validation during reading is only done on accessing the values, not while reading the values as RawDataElements - I think this is ok, as it does not affect the reading performance, but will be done for all values that are used are accessed, but I'm not sure
  • validation on writing is done on creating a DataElement, and a second time (only for text values) on encoding the values, as the byte size may exceed the maximum allowed length, even if the string size does not

@mrbean-bremen mrbean-bremen force-pushed the validate-value branch 3 times, most recently from fe2f869 to 399d805 Compare December 12, 2021 12:40
RAISE_ON_ERROR

.. autoclass:: Settings
:exclude-members: __init__, reading_validation_mode
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this here separately to exclude these members, but this makes it visible only in the page for the config module, not in the overview page. Maybe there is a better way...

pydicom/config.py Outdated Show resolved Hide resolved
@mrbean-bremen
Copy link
Member Author

Hm, only just noticed that the typing check failed in the last commit - I didn't pay attention as the code had not changed, but mypy has been updated to 0.920 in that run😒

Copy link
Member

@darcymason darcymason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of quick thought, and more detailed comment to follow...

enforce_valid_values = False
"""Raise exceptions if any value is not allowed by DICOM Standard.
"""Obsolete.
Use :attr:`Settings.reading_validation_mode` instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "Deprecated" here rather than "Obsolete". Sends a stronger message that it may be removed later. And can add a __getattr__ module method to warn on deprecation like is used in uid.py and other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, "deprecated" is what I actually meant - just my usual struggle with wording... Also __getattr__ makes sense to warn at least for the getters (as I mentioned, this does not work for Python 3.6, but this will be EOL in a few days anyway), though I see no way to do this for the setters, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I checked before I made my comment - it still does not appear possible to do module-level setters in any Python version yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a second thought, it probably doesn't really help to warn on the getters, as a user would not use them, just the setters. The getters are only used by our own code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right - perhaps it can be warned in a generic upward-compatibility direction through a function like I list in my latest comment below? Perhaps then it can go away in pydicom code, but still be triggered in user code.


def __init__(self) -> None:
self._reading_validation_mode: int = WARN_ON_ERROR
self.writing_validation_mode = RAISE_ON_ERROR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good but would probably need to test it in practice to see how it all works out. But I also have some ideas to go over, I'll put those in a separate comment because they are not only related to this bit of code.

@darcymason
Copy link
Member

As I noted in review comments, I also some general thoughts: I wonder about generalizing the disable_value_validation idea to a validation context manager to give users temporary fine control over settings:

with config.validation(reading=RAISE_ON_ERROR):
   ...

Here I've used 'reading' as a shorter argument name which I think would make this more succinct.

Also, perhaps some simpler flag names, since these are user-facing. I would think STRICT could be a synonym for RAISE... and WARN for warn. So could have something like

with config.validation(reading=STRICT, writing=WARN):
   ...

However, I don't know if that can fit easily into the multiple flags we have now.

More random thoughts:
Perhaps validation could also accept a dict to allow user to store different sets of settings to switch back and forth more easily. Of course, users could always do use ** dict unpacking to do that, but perhaps that is less well known.

Also, perhaps passing a single bool could turn all validation on or off.

Just throwing some ideas out there to see what you think. This sounds somewhat familiar, though, so I may be revisiting ideas that didn't work out...

@darcymason
Copy link
Member

I also meant to mention another thought process. As is here, the Settings class ties back to the existing flags - I wonder if we could make a path away from them. If they were somewhat distinct, perhaps any location in pydicom where we use the config flags could check if the new system has been set, and use that if so, else fall back to the old. Likely this could be done through a helper function/functions that would be easy to use everywhere we need it. Again, apologies because I haven't thought it through, so don't know how realistic it is.

pydicom/valuerep.py Outdated Show resolved Hide resolved
pydicom/valuerep.py Outdated Show resolved Hide resolved
- fix regex for TM
- add DICOM standard URL to warning messages
pydicom/valuerep.py Outdated Show resolved Hide resolved
- really fix regex for TM/DT
- documentation rewording
@mrbean-bremen mrbean-bremen changed the title [MRG] Add check for valid length of a value [MRG] Add checks for valid value Jan 1, 2022
Copy link
Member

@darcymason darcymason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with everything - not sure if @scaramallion had anything else...?

@scaramallion
Copy link
Member

I'm still looking through, I'll try and finish today

"""1.2.840.10008.1.2.4.108"""
RLELossless = UID('1.2.840.10008.1.2.5')
"""1.2.840.10008.1.2.5"""
with disable_value_validation():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, it's more of a debugging aid for the validation itself - to avoid running into the validation of each predefined UID in each test. It's also a usage example... Performacne-wise it shouldn't matter, though.

pydicom/valuerep.py Outdated Show resolved Hide resolved
A tuple of a boolean validation result and the error message.
"""
if not re.match(regex, value):
return False, f"Invalid value for VR {vr}: '{value!r}'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll end up with two single apostrophe marks '' here, as the {value!r} already gives 'value'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks!

return is_valid_len and is_valid_expr, msg


def validate_ae(vr: str, value: Union[str, bytes]) -> Tuple[bool, str]:
Copy link
Member

@scaramallion scaramallion Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows "\n" (encoded to 0x0A), although not "\r" or "\t" weirdly

>>> from pydicom import Dataset
>>> ds = Dataset()
>>> ds.MoveDestination = "\n"
>>> ds.MoveDestination
'\n'
>>> ds["MoveDestination"].VM
1
>>> ds.MoveDestination = "\t"
.../pydicom/valuerep.py:356: UserWarning: Invalid value for VR AE: ''\t'' Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
  warnings.warn(msg)

>>>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, turns out that a single newline at the end of a string is ignored by the regex matching - I wasn't aware of this, and didn't find documentation about this yet, especially about how to change this behavior. In the worst case, I can check this separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the doc:

$ Matches the end of the string or just before the newline at the end of the string

This sounds like the problem...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't get it to work with regex alone, so I added an extra check. Maybe somebody has a better idea...

@scaramallion
Copy link
Member

I'm getting UserWarning for ds.SOPInstanceUID = "", which is probably not what we want.

@mrbean-bremen
Copy link
Member Author

I'm getting UserWarning for ds.SOPInstanceUID = ""

Hm, I think I used the existing regex here - but that seems not to allow an empty string. Will check this tonight, also the other used regexes.

@NPann NPann mentioned this pull request Jan 18, 2022
4 tasks
return True, ""


def validate_regex(vr: str, value: Union[str, bytes],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just testing this out, not sure how important this is but if you pass in a byte-string you get a TypeError, should this be changed to value: str?

Ex.:

/w/f/s/pydicom git:(master) ✗ $ python
Python 3.8.8 (default, Oct 11 2021, 09:15:09)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pydicom.valuerep import validate_valu
>>> validate_value('DS', b"3000.0",1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/work/flywheel/sse/pydicom/pydicom/valuerep.py", line 342, in validate_value
    is_valid, msg = validator(vr, value)
  File "/work/flywheel/sse/pydicom/pydicom/valuerep.py", line 181, in validate_ds
    return validate_length_and_regex(vr, value, DS_REGEX)
  File "/work/flywheel/sse/pydicom/pydicom/valuerep.py", line 143, in validate_length_and_regex
    is_valid_expr, msg2 = validate_regex(vr, value, regex)
  File "/work/flywheel/sse/pydicom/pydicom/valuerep.py", line 119, in validate_regex
    if value and (not re.match(regex, value) or value and value[-1] == "\n"):
  File "/home/naterichman/.pyenv/versions/3.8.8/lib/python3.8/re.py", line 191, in match
    return _compile(pattern, flags).match(string)
TypeError: cannot use a string pattern on a bytes-like object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this code gets indeed only called for strings inside pydicom, and I think I had some trouble with typing here. But you are right, this should probably be typed for str only, or adapted to work for both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I should have clarified, this isn't a problem with the normal pydicom flow. But I saw this and assumed I could pass a byte string, more just user-error or needing docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand, and as this in the "public" API it should work as advertised. If it makes sense to have this work for byte strings, we should probably change it accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fair to require convert_values to be called beforehand, then it could be only str inputs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in this case this should probably be documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think I will change this to support both str and byte, as for some values it has to support bytes.
@darcymason : I would like to change this before the release, if that is ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed replying to this one - all good since merged now anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants