Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] IS float #1720

Merged
merged 6 commits into from
Nov 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/reference/elem.valuerep.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ or TM and utilities.
DSfloat
DT
IS
ISfloat
MultiString
PersonName
PersonNameUnicode
Expand Down
2 changes: 2 additions & 0 deletions doc/release_notes/v2.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ Enhancements
* CLI commands now accept *pydicom* charset test files and CLI help shows
Python Version (:pr:`1674`)
* Added support for Python 3.11 (:issue:`1658`)
* Added :class:`~pydicom.valuerep.ISfloat` to allow non-strict reading of
existing files with float IS values (:issue:`1661`)

Fixes
-----
Expand Down
12 changes: 12 additions & 0 deletions pydicom/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,18 @@ def disable_value_validation() -> Generator:
settings._writing_validation_mode = writing_mode


@contextmanager
def strict_reading() -> Generator:
"""Context manager to temporarily enably strict value validation
darcymason marked this conversation as resolved.
Show resolved Hide resolved
for reading."""
original_reading_mode = settings._reading_validation_mode
try:
settings.reading_validation_mode = RAISE
yield
finally:
settings._reading_validation_mode = original_reading_mode


convert_wrong_length_to_UN = False
"""Convert a field VR to "UN" and return bytes if bytes length is invalid.
Default ``False``.
Expand Down
30 changes: 18 additions & 12 deletions pydicom/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
"""
import copy
from bisect import bisect_left
from contextlib import nullcontext
import io
from importlib.util import find_spec as have_package
import inspect # for __dir__
Expand Down Expand Up @@ -2490,18 +2491,23 @@ def to_json_dict(
:class:`Dataset` representation based on the DICOM JSON Model.
"""
json_dataset = {}
for key in self.keys():
json_key = '{:08X}'.format(key)
try:
data_element = self[key]
json_dataset[json_key] = data_element.to_json_dict(
bulk_data_element_handler=bulk_data_element_handler,
bulk_data_threshold=bulk_data_threshold
)
except Exception as exc:
logger.error(f"Error while processing tag {json_key}")
if not suppress_invalid_tags:
raise exc
context = (
config.strict_reading() if suppress_invalid_tags
else nullcontext()
)
with context:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing ISfloat by default caused some json tests to fail. Added this strict context to try to keep previous behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some trouble understanding the logic here: if suppress_invalid_tags is set, we raise an exception, and will be cught and ignored here? I'll probably have another look tomorrow with a fresh brain...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the failing test (test_suppress_invalid_tags_with_failed_dataelement) - I wrote that, and I think it just should be changed to use another invalid value that would still raise, for example:

        ds[0x00082128] = RawDataElement(
            Tag(0x00082128), 'IS', 4, b'5:25', 0, True, True
        )

        with pytest.warns(UserWarning):
            ds_json = ds.to_json_dict(suppress_invalid_tags=True)
        assert "00082128" not in ds_json

and revert the change here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...I think it just should be changed to use another invalid value that would still raise

I thought about that too, but that would still change existing behavior for IS - ones that did not get written before would now be accepted and written out. Probably not a big deal, but still something a bit different. I'm not sure just skipping values in the output is the best idea anyway. I'll think about this a bit before this is merged. I'd still like to have a "bigger picture" in mind for how to handle settings in v3.X... I'll start an issue and post some thoughts on that in the coming days.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that would still change existing behavior for IS - ones that did not get written before would now be accepted and written out

That is true, but I assumed that this is a change we want to make. Though maybe I haven't thought this through... I will wait your thoughts then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a look at this again ... This change has no effect (nullcontext()) for the default case where suppress_invalid_tags is False. So I think I was right, if suppress... is True (as seen only in tests probably at the moment), then we will catch the exception and ignore it, as is currently done (the new ISFloat is not allowed with strict_reading() turned on).

It is still messy, but I'd rather push real fixes to this messy config (and all the other messy config) to v3.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I leave the decision to you. I find it a bit ugly, but you are probably right - better defer the real fix to 3.0, where behavior changes are expected.

for key in self.keys():
json_key = '{:08X}'.format(key)
try:
data_element = self[key]
json_dataset[json_key] = data_element.to_json_dict(
bulk_data_element_handler=bulk_data_element_handler,
bulk_data_threshold=bulk_data_threshold
)
except Exception as exc:
logger.error(f"Error while processing tag {json_key}")
if not suppress_invalid_tags:
raise exc

return json_dataset

Expand Down
8 changes: 3 additions & 5 deletions pydicom/tests/test_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import pytest

from pydicom import config
from pydicom import dcmread
darcymason marked this conversation as resolved.
Show resolved Hide resolved
from pydicom.data import get_testdata_file
from pydicom.dataelem import DataElement, RawDataElement
Expand Down Expand Up @@ -293,13 +294,10 @@ def test_suppress_invalid_tags_with_failed_dataelement(self):
# we have to add a RawDataElement as creating a DataElement would
# already raise an exception
ds[0x00082128] = RawDataElement(
Tag(0x00082128), 'IS', 4, b'5.25', 0, True, True)

with pytest.raises(TypeError):
ds.to_json_dict()
Tag(0x00082128), 'IS', 4, b'5.25', 0, True, True
)

ds_json = ds.to_json_dict(suppress_invalid_tags=True)

assert "00082128" not in ds_json


Expand Down
32 changes: 27 additions & 5 deletions pydicom/tests/test_valuerep.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from datetime import datetime, date, time, timedelta, timezone
from decimal import Decimal
from itertools import chain
from io import BytesIO
import pickle
import math
import sys
Expand All @@ -19,9 +20,10 @@
from pydicom.data import get_testdata_file
from pydicom.dataset import Dataset
from pydicom._dicom_dict import DicomDictionary, RepeatersDictionary
from pydicom.filereader import read_dataset
from pydicom.tag import Tag
from pydicom.valuerep import (
DS, IS, DSfloat, DSdecimal, PersonName, VR, STANDARD_VR,
DS, IS, DSfloat, DSdecimal, ISfloat, PersonName, VR, STANDARD_VR,
AMBIGUOUS_VR, STR_VR, BYTES_VR, FLOAT_VR, INT_VR, LIST_VR
)
from pydicom.values import convert_value
Expand Down Expand Up @@ -889,11 +891,31 @@ def test_valid_value(self, disable_value_validation):
assert 42 == IS("42.0")
assert 42 == IS(42.0)

def test_float_value(self):
"""Read binary value of IS that is actually a float"""
# from issue #1661
# Create BytesIO with single data element for Exposure (0018,1152)
# length 4, value "14.5"
bin_elem = b"\x18\x00\x52\x11\x04\x00\x00\x0014.5"
with BytesIO(bin_elem) as bio:
ds = read_dataset(bio, True, True)
assert isinstance(ds.Exposure, ISfloat)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly necessary to go back to binary like this rather than just creating an IS, but I thought it a good idea in case of future hooks into reading invalid values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

assert ds.Exposure == 14.5

# Strict checking raises an error
with pytest.raises(ValueError):
_ = IS("14.5", validation_mode=config.RAISE)
with pytest.raises(TypeError):
_ = IS(14.5, validation_mode=config.RAISE)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit odd to me to have two different Errors for the same value, but that is the way it is done currently. Perhaps to be revisited for the grand redesign of validation settings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that I was not happy about this either, though forgot why I didn't change it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In part I guess there are two branches, so to speak, because some are by string processing (regex) and others are by value. Another thing to think about in harmonizing validation.

def test_float_init(self):
"""New ISfloat created from another behaves correctly"""
is1 = IS("14.5", validation_mode=config.IGNORE)
is2 = IS(is1)
assert is1 == is2
assert is2.original_string == is1.original_string

def test_invalid_value(self, disable_value_validation):
with pytest.raises(TypeError, match="Could not convert value"):
IS(0.9)
with pytest.raises(TypeError, match="Could not convert value"):
IS("0.9")
with pytest.raises(ValueError, match="could not convert string"):
IS("foo")

Expand Down
54 changes: 48 additions & 6 deletions pydicom/valuerep.py
Original file line number Diff line number Diff line change
Expand Up @@ -1248,6 +1248,48 @@ def DS(
return DSfloat(val, auto_format, validation_mode)


class ISfloat(float):
"""Store value for an element with VR **IS** as :class:`float`.

Stores original integer string for exact rewriting of the string
originally read or stored.

Note: By the DICOM standard, IS can only be an :class:`int`,
however, it is not uncommon to see float IS values. This class
is used if the config settings allow non-strict reading.

Generally, use :class:`~pydicom.valuerep.IS` to create IS values,
this is returned instead if the value cannot be represented as an
:class:`int`. See :class:`~pydicom.valuerep.IS` for details of the
parameters and return values.
"""
def __new__( # type: ignore[misc]
cls: Type["ISfloat"], val: Union[str, float, Decimal],
validation_mode: int = None
) -> float:
return super().__new__(cls, val)

def __init__(self, val: Union[str, float, Decimal],
validation_mode: int = None) -> None:
# If a string passed, then store it
if isinstance(val, str):
self.original_string = val.strip()
elif isinstance(val, (IS, ISfloat)) and hasattr(val, 'original_string'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently learned that black uses 88 instead of 79 characters as a default maximum line length, and I tend to agree with it... I think we had this discussion before, and it is not really important, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to raise this point as well. I find 79 very limiting, it would really help to have even 9 more characters, and useful to just let black do its thing on new code before posting a PR.

We can change to 88 in a separate PR maybe, and update the documentation for contributors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course - this was not meant for this PR, just wanted to mention it.

self.original_string = val.original_string
if validation_mode:
msg = (
f'Value "{str(self)}" is not valid for elements with a VR '
'of IS'
)
if validation_mode == config.WARN:
warnings.warn(msg)
elif validation_mode == config.RAISE:
msg += (
"\nSet reading_validation_mode to WARN or IGNORE to bypass"
)
raise TypeError(msg)


class IS(int):
"""Store value for an element with VR **IS** as :class:`int`.

Expand All @@ -1258,7 +1300,7 @@ class IS(int):
def __new__( # type: ignore[misc]
cls: Type["IS"], val: Union[None, str, int, float, Decimal],
validation_mode: int = None
) -> Optional[Union[str, "IS"]]:
) -> Optional[Union[str, "IS", "ISfloat"]]:
"""Create instance if new integer string"""
if val is None:
return val
Expand All @@ -1272,16 +1314,16 @@ def __new__( # type: ignore[misc]
validate_value("IS", val, validation_mode)

try:
newval = super().__new__(cls, val)
newval: Union[IS, ISfloat] = super().__new__(cls, val)
except ValueError:
# accept float strings when no integer loss, e.g. "1.0"
newval = super().__new__(cls, float(val))

# check if a float or Decimal passed in, then could have lost info,
# and will raise error. E.g. IS(Decimal('1')) is ok, but not IS(1.23)
# IS('1.23') will raise ValueError
# If a float or Decimal was passed in, check for non-integer,
# i.e. could lose info if converted to int
# If so, create an ISfloat instead (if allowed by settings)
if isinstance(val, (float, Decimal, str)) and newval != float(val):
raise TypeError("Could not convert value to integer without loss")
newval = ISfloat(val, validation_mode)

# Checks in case underlying int is >32 bits, DICOM does not allow this
if (not -2**31 <= newval < 2**31 and
Expand Down