-
-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add checks for valid value #1543
Conversation
e11eef3
to
4109917
Compare
Codecov Report
@@ Coverage Diff @@
## master #1543 +/- ##
==========================================
+ Coverage 97.42% 97.51% +0.08%
==========================================
Files 66 66
Lines 10185 10352 +167
==========================================
+ Hits 9923 10095 +172
+ Misses 262 257 -5
Continue to review full report at Codecov.
|
7496dd2
to
9764f29
Compare
pydicom/uid.py
Outdated
|
||
Returns | ||
------- | ||
pydicom.uid.UID | ||
The UID object. | ||
""" | ||
if isinstance(val, str): | ||
validate_value("UI", val, raise_on_error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UI can be validated with UID.is_valid
Hello @mrbean-bremen! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-01-03 19:21:01 UTC |
ee51ca0
to
c789be5
Compare
pydicom/config.py
Outdated
|
||
def __init__(self) -> None: | ||
self._reading_validation_mode: int = WARN_ON_ERROR | ||
self.writing_validation_mode = RAISE_ON_ERROR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going back and forth with how to define this settings, I'm still not sure what the correct way is. I prefer to have the settings in a class, as this makes it possible to change them later using properties, while for module-level attributes this is not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good but would probably need to test it in practice to see how it all works out. But I also have some ideas to go over, I'll put those in a separate comment because they are not only related to this bit of code.
pydicom/config.py
Outdated
WARN_ON_ERROR = 1 | ||
"""A warning is issued on a value validation error.""" | ||
|
||
RAISE_ON_ERROR = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had these as an enum first, but that caused problems with pickling.
return self.type_constructor(x) if x != '' else cast(_ItemType, x) | ||
|
||
return ( # type: ignore[no-any-return] | ||
self.type_constructor( # type: ignore[has-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to avoid these warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the optional argument to type_constructor
. You could define a Protocol
, or just leave it. In 3.10 you can use ParamSpec
, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it for the time being (actually missed that comment before...).
@@ -941,7 +941,7 @@ def test_neither_ds_nor_palette_raises(self): | |||
with pytest.raises(ValueError, match=msg): | |||
apply_color_lut(ds.pixel_array) | |||
|
|||
def test_palette_unknown_raises(self): | |||
def test_palette_unknown_raises(self, disable_value_validation): | |||
"""Test using an unknown `palette` raise an exception.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabled value validation in all tests that would cause warnings to avoid having them in the test log.
@@ -675,6 +670,13 @@ def test_hash(self): | |||
|
|||
|
|||
class TestDSdecimal: | |||
@pytest.fixture | |||
def allow_ds_float(self): | |||
old_value = config.allow_DS_float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to add this to several tests as they only worked accidentally due to the execution order.
e7f9a0f
to
72e548c
Compare
This is ready for review. There are a few things I'm not sure about:
|
fe2f869
to
399d805
Compare
doc/reference/config.rst
Outdated
RAISE_ON_ERROR | ||
|
||
.. autoclass:: Settings | ||
:exclude-members: __init__, reading_validation_mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this here separately to exclude these members, but this makes it visible only in the page for the config
module, not in the overview page. Maybe there is a better way...
b38a34a
to
6ac8bfa
Compare
Hm, only just noticed that the typing check failed in the last commit - I didn't pay attention as the code had not changed, but mypy has been updated to 0.920 in that run😒 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of quick thought, and more detailed comment to follow...
enforce_valid_values = False | ||
"""Raise exceptions if any value is not allowed by DICOM Standard. | ||
"""Obsolete. | ||
Use :attr:`Settings.reading_validation_mode` instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "Deprecated" here rather than "Obsolete". Sends a stronger message that it may be removed later. And can add a __getattr__
module method to warn on deprecation like is used in uid.py and other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, "deprecated" is what I actually meant - just my usual struggle with wording... Also __getattr__
makes sense to warn at least for the getters (as I mentioned, this does not work for Python 3.6, but this will be EOL in a few days anyway), though I see no way to do this for the setters, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I checked before I made my comment - it still does not appear possible to do module-level setters in any Python version yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a second thought, it probably doesn't really help to warn on the getters, as a user would not use them, just the setters. The getters are only used by our own code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right - perhaps it can be warned in a generic upward-compatibility direction through a function like I list in my latest comment below? Perhaps then it can go away in pydicom code, but still be triggered in user code.
pydicom/config.py
Outdated
|
||
def __init__(self) -> None: | ||
self._reading_validation_mode: int = WARN_ON_ERROR | ||
self.writing_validation_mode = RAISE_ON_ERROR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good but would probably need to test it in practice to see how it all works out. But I also have some ideas to go over, I'll put those in a separate comment because they are not only related to this bit of code.
As I noted in review comments, I also some general thoughts: I wonder about generalizing the with config.validation(reading=RAISE_ON_ERROR):
... Here I've used 'reading' as a shorter argument name which I think would make this more succinct. Also, perhaps some simpler flag names, since these are user-facing. I would think STRICT could be a synonym for RAISE... and WARN for warn. So could have something like with config.validation(reading=STRICT, writing=WARN):
... However, I don't know if that can fit easily into the multiple flags we have now. More random thoughts: Also, perhaps passing a single bool could turn all validation on or off. Just throwing some ideas out there to see what you think. This sounds somewhat familiar, though, so I may be revisiting ideas that didn't work out... |
I also meant to mention another thought process. As is here, the Settings class ties back to the existing flags - I wonder if we could make a path away from them. If they were somewhat distinct, perhaps any location in pydicom where we use the config flags could check if the new system has been set, and use that if so, else fall back to the old. Likely this could be done through a helper function/functions that would be easy to use everywhere we need it. Again, apologies because I haven't thought it through, so don't know how realistic it is. |
- fix regex for TM - add DICOM standard URL to warning messages
- really fix regex for TM/DT - documentation rewording
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with everything - not sure if @scaramallion had anything else...?
I'm still looking through, I'll try and finish today |
"""1.2.840.10008.1.2.4.108""" | ||
RLELossless = UID('1.2.840.10008.1.2.5') | ||
"""1.2.840.10008.1.2.5""" | ||
with disable_value_validation(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not, it's more of a debugging aid for the validation itself - to avoid running into the validation of each predefined UID in each test. It's also a usage example... Performacne-wise it shouldn't matter, though.
pydicom/valuerep.py
Outdated
A tuple of a boolean validation result and the error message. | ||
""" | ||
if not re.match(regex, value): | ||
return False, f"Invalid value for VR {vr}: '{value!r}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll end up with two single apostrophe marks ''
here, as the {value!r}
already gives 'value'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks!
return is_valid_len and is_valid_expr, msg | ||
|
||
|
||
def validate_ae(vr: str, value: Union[str, bytes]) -> Tuple[bool, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows "\n"
(encoded to 0x0A
), although not "\r"
or "\t"
weirdly
>>> from pydicom import Dataset
>>> ds = Dataset()
>>> ds.MoveDestination = "\n"
>>> ds.MoveDestination
'\n'
>>> ds["MoveDestination"].VM
1
>>> ds.MoveDestination = "\t"
.../pydicom/valuerep.py:356: UserWarning: Invalid value for VR AE: ''\t'' Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
warnings.warn(msg)
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, turns out that a single newline at the end of a string is ignored by the regex matching - I wasn't aware of this, and didn't find documentation about this yet, especially about how to change this behavior. In the worst case, I can check this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the doc:
$ Matches the end of the string or just before the newline at the end of the string
This sounds like the problem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't get it to work with regex alone, so I added an extra check. Maybe somebody has a better idea...
a819afe
to
13cd058
Compare
I'm getting |
Hm, I think I used the existing regex here - but that seems not to allow an empty string. Will check this tonight, also the other used regexes. |
return True, "" | ||
|
||
|
||
def validate_regex(vr: str, value: Union[str, bytes], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just testing this out, not sure how important this is but if you pass in a byte-string you get a TypeError, should this be changed to value: str
?
Ex.:
➜ /w/f/s/pydicom git:(master) ✗ $ python
Python 3.8.8 (default, Oct 11 2021, 09:15:09)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pydicom.valuerep import validate_valu
>>> validate_value('DS', b"3000.0",1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/work/flywheel/sse/pydicom/pydicom/valuerep.py", line 342, in validate_value
is_valid, msg = validator(vr, value)
File "/work/flywheel/sse/pydicom/pydicom/valuerep.py", line 181, in validate_ds
return validate_length_and_regex(vr, value, DS_REGEX)
File "/work/flywheel/sse/pydicom/pydicom/valuerep.py", line 143, in validate_length_and_regex
is_valid_expr, msg2 = validate_regex(vr, value, regex)
File "/work/flywheel/sse/pydicom/pydicom/valuerep.py", line 119, in validate_regex
if value and (not re.match(regex, value) or value and value[-1] == "\n"):
File "/home/naterichman/.pyenv/versions/3.8.8/lib/python3.8/re.py", line 191, in match
return _compile(pattern, flags).match(string)
TypeError: cannot use a string pattern on a bytes-like object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this code gets indeed only called for strings inside pydicom
, and I think I had some trouble with typing here. But you are right, this should probably be typed for str
only, or adapted to work for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I should have clarified, this isn't a problem with the normal pydicom flow. But I saw this and assumed I could pass a byte string, more just user-error or needing docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand, and as this in the "public" API it should work as advertised. If it makes sense to have this work for byte strings, we should probably change it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fair to require convert_values
to be called beforehand, then it could be only str inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in this case this should probably be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think I will change this to support both str
and byte
, as for some values it has to support bytes
.
@darcymason : I would like to change this before the release, if that is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed replying to this one - all good since merged now anyway
(True by default)
This is a go at some validation of values, currently only done for the value length. This could be expanded by checks for the contained character set, the number of values, and some VR-specific validation, but I first wanted to check if this is what we want and will wait for some feedback before going on.
I introduced a separate config option for writing values which is
True
by default, and had to added theraise_on_error
argument to some constructors, as we don't know if we are reading or writing inside the data elements. This introduces some behavior changes (mainly on writing), and some potential performance degradation due to the checks (haven't tested it yet).Tasks
doc/_build/html/index.html
)