Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential idea for further typing #1485

Open
SimonBiggs opened this issue Aug 24, 2021 · 2 comments
Open

Potential idea for further typing #1485

SimonBiggs opened this issue Aug 24, 2021 · 2 comments

Comments

@SimonBiggs
Copy link
Contributor

SimonBiggs commented Aug 24, 2021

I have been fleshing out a TypedDataset which makes it so that depending on the tag literal value the dataelement returned is aware of its own VR within mypy:

class SequenceDataElement(pydicom.DataElement):
    value: pydicom.Sequence


class BytesDataElement(pydicom.DataElement):
    value: bytes


class TagDataElement(pydicom.DataElement):
    value: pydicom.tag.BaseTag


class StringDataElement(pydicom.DataElement):
    value: str


class IntDataElement(pydicom.DataElement):
    value: int


class TypedDataset(pydicom.Dataset):
    @overload  # type: ignore
    def __getitem__(self, key: slice) -> pydicom.Dataset:
        pass

    @overload
    def __getitem__(self, key: _tags.SequenceTag) -> SequenceDataElement:
        pass

    @overload
    def __getitem__(self, key: _tags.BytesTag) -> BytesDataElement:
        pass

    @overload
    def __getitem__(self, key: _tags.TagTag) -> TagDataElement:
        pass

    @overload
    def __getitem__(self, key: _tags.StringTag) -> StringDataElement:
        pass

    @overload
    def __getitem__(self, key: _tags.IntTag) -> IntDataElement:
        pass

    @overload
    def __getitem__(self, key: TagType) -> pydicom.DataElement:
        pass

    def __getitem__(
        self, key: Union[slice, TagType]
    ) -> Union[pydicom.Dataset, pydicom.DataElement]:
        return super().__getitem__(key)

It has made the typing experience with pydicom be quite nice. I wanted to present the idea here to spark discussion.

Cheers,
Simon

@scaramallion
Copy link
Member

scaramallion commented Aug 24, 2021

It depends how accurate you want to be

For VR DS, probably the worst, the type can be:

Union[None, str, float, Decimal, valuerep.DSfloat, valuerep.DSDecimal, numpy.ndarray, numpy.float64, MutableSequence[str], MutableSequence[float], ..., MutableSequence[valuerep.DSDecimal]]

which is super unfun

But most elements are either Union[None, T, MutableSequence[T]] or Union[None, T].

You'd still need to cast the results from __getitem__ wouldn't you? How are _tags.SequenceTag, etc, defined?

Another option is to have a giant stub file for Dataset with all the elements given type hints:

StrType = Union[None, str, MutableSequence[str]]
PersonNameType = Union[None, str, MutableSequence[str], PersonName, MutableSequence[PersonName]]
BytesType = Optional[bytes]

class Dataset:
    PatientName: PersonNameType 
    PatientID: StrType
    ...
    PixelData: BytesType

@SimonBiggs
Copy link
Contributor Author

Another option is to have a giant stub file for Dataset with all the elements given type hints:

Yup, that'd certainly help, and it would best be an autogenerated file built from the datadict.


How are _tags.SequenceTag, etc, defined?

For now I've just been utilising it for a block of private tags that I am utilising. Here is an example of how I have built out SequenceTag and BytesTag. I would expect that a file like this would be autogenerated.

STORAGE_SEQUENCE_OFFSET = 0x00
DATASET_GROUPING_OFFSET = 0x80

StorageSequenceTag = tuple[TagGroup, Literal[0x1000]]
STORAGE_SEQUENCE_TAG = cast(
    StorageSequenceTag, (TAG_GROUP, 0x1000 + STORAGE_SEQUENCE_OFFSET)
)

ParentSequenceTag = tuple[TagGroup, Literal[0x1010]]
PARENT_SEQUENCE_TAG: ParentSequenceTag = (TAG_GROUP, 0x1010)

SearchIndexSequenceTag = tuple[TagGroup, Literal[0x1060]]
SEARCH_INDEX_SEQUENCE_TAG: SearchIndexSequenceTag = (TAG_GROUP, 0x1060)

EncryptedDataTag = tuple[TagGroup, Literal[0x1050]]
ENCRYPTED_DATA_TAG: EncryptedDataTag = (TAG_GROUP, 0x1050)

EncryptedIndexHashTag = tuple[TagGroup, Literal[0x1070]]
ENCRYPTED_INDEX_HASH_TAG = (TAG_GROUP, 0x1070)

DatasetGroupingTag = tuple[TagGroup, Literal[0x1080]]
DATASET_GROUPING_TAG = cast(
    DatasetGroupingTag, (TAG_GROUP, 0x1000 + DATASET_GROUPING_OFFSET)
)

SequenceTag = Union[StorageSequenceTag, ParentSequenceTag, SearchIndexSequenceTag]
BytesTag = Union[EncryptedDataTag, EncryptedIndexHashTag, DatasetGroupingTag]

Also, I don't actually know how to remove the duplication with Literal types and their values. Having to write each tag value twice bugs me...

eg:

EncryptedIndexHashTag = tuple[TagGroup, Literal[0x1070]]  # 0x1070 here
ENCRYPTED_INDEX_HASH_TAG = (TAG_GROUP, 0x1070)  # and 0x1070 written again here...

You'd still need to cast the results from __getitem__ wouldn't you?

No actually, I haven't had to do that. It's been working a treat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants