Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of structured reporting #824

Merged
merged 17 commits into from Dec 30, 2019
Merged

Implementation of structured reporting #824

merged 17 commits into from Dec 30, 2019

Conversation

hackermd
Copy link
Contributor

@hackermd hackermd commented Mar 17, 2019

This pull request is intended to facilitate the creation of DICOM Structured Reports (SR).

It adds the sr package, which implements

A few things still need to be completed. However, I was hoping to already get your feedback on the approach in general and whether you would be interested in including this code into the pydicom package.

I suggest @pieper, @fedorov, @seandoyle and @dclunie as reviewers. We had several discussions on how to implement SR at the 2019 NA-MIC project week and beyond.

Below is an example for creating a SR document using pydicom.sr (the resulting DICOM PS3.10 file is also included in the repository: pydicom/data/test_files/SR_comprehensive3d.dcm:

from pydicom.filereader import dcmread
from pydicom.uid import generate_uid
from pydicom.sr.context_groups.cid_4 import CERVICOTHORACIC_SPINE
from pydicom.sr.context_groups.cid_100 import CT_UNSPECIFIED_BODY_REGION
from pydicom.sr.context_groups.cid_218 import AREA_OF_DEFINED_REGION
from pydicom.sr.context_groups.cid_220 import NOT_SIGNIFICANT
from pydicom.sr.context_groups.cid_222 import NORMAL
from pydicom.sr.context_groups.cid_270 import PERSON, DEVICE
from pydicom.sr.context_groups.cid_6115 import VERTEBRAL_FORAMEN
from pydicom.sr.context_groups.cid_7151 import SPINAL_CORD
from pydicom.sr.context_groups.cid_7461 import SQUARE_CENTIMETER
from pydicom.sr.document import Comprehensive3DSR
from pydicom.sr.templates import (
    DeviceObserverIdentifyingAttributes,
    FindingSite,
    Measurement,
    MeasurementProperties,
    MeasurementReport,
    ObservationContext,
    ObserverContext,
    PersonObserverIdentifyingAttributes,
    PlanarROIMeasurementsAndQualitativeEvaluations,
    ReferencedRegion,
    ReferencedVolume,
    ROIMeasurements,
    SourceImageForRegion,
    SourceImageForSegmentation,
    SubjectContext,
    TrackingIdentifier,
    VolumetricROIMeasurementsAndQualitativeEvaluations,
)
from pydicom.sr.value_types import GraphicTypes

if __name__ == '__main__':

    ref_filename = 'pydicom/data/test_files/CT_small.dcm'
    ref_dataset = dcmread(ref_filename)

    observer_person_context = ObserverContext(
        observer_type=PERSON,
        observer_identifying_attributes=PersonObserverIdentifyingAttributes(
            name='Foo'
        )
    )
    observer_device_context = ObserverContext(
        observer_type=DEVICE,
        observer_identifying_attributes=DeviceObserverIdentifyingAttributes(
            uid=generate_uid()
        )
    )
    observation_context = ObservationContext(
        observer_person_context=observer_person_context,
        observer_device_context=observer_device_context,
    )

    region_reference = ReferencedRegion(
        graphic_type=GraphicTypes.CIRCLE,
        graphic_data=((58.0, 52.0), (58.0, 41.0)),
        source_image=SourceImageForRegion(
            sop_class_uid=ref_dataset.SOPClassUID,
            sop_instance_uid=ref_dataset.SOPInstanceUID
        )
    )
    finding_sites = [
        FindingSite(
            anatomic_location=CERVICOTHORACIC_SPINE,
            topographical_modifier=VERTEBRAL_FORAMEN
        ),
    ]
    measurements = [
        Measurement(
            name=AREA_OF_DEFINED_REGION,
            tracking_identifier=TrackingIdentifier(uid=generate_uid()),
            value=1.7,
            unit=SQUARE_CENTIMETER,
            properties=MeasurementProperties(
                normality=NORMAL,
                level_of_significance=NOT_SIGNIFICANT
            )
        )
    ]
    region_measurements = ROIMeasurements(
        measurements=measurements,
        finding_sites=finding_sites
    )
    imaging_measurements = PlanarROIMeasurementsAndQualitativeEvaluations(
        tracking_identifier=TrackingIdentifier(
            uid=generate_uid(),
            identifier='Planar ROI Measurements'
        ),
        referenced_region=region_reference,
        finding_type=SPINAL_CORD,
        measurements=region_measurements
    )
    measurement_report = MeasurementReport(
        observation_context=observation_context,
        procedure_reported=CT_UNSPECIFIED_BODY_REGION,
        imaging_measurements=imaging_measurements
    )

    document = Comprehensive3DSR(
        evidence=[ref_dataset],
        content=measurement_report,
        series_instance_uid=generate_uid(),
        series_number=1,
        series_description='Measurement Reports',
        sop_instance_uid=generate_uid(),
        instance_number=1,
        institution_name='Institution',
        institution_department_name='Institution Department',
        manufacturer='Manufacturer'
    )
    document_filename = '/tmp/{}.dcm'.format(document.SOPInstanceUID)
    document.save_as(document_filename)

@scaramallion
Copy link
Member

This seems like it'd be better as a separate package since its facilitating the creation of SR datasets rather than the core functionality of reading/modifying/writing DICOM files, but I'd be interested to see what @darcymason thinks.

@dclunie
Copy link

dclunie commented Mar 18, 2019

I heard from a lot of AI/ML researchers at the C-MIMI meeting
last year that many of them use pydicom, and the main barrier
to reading or writing labels/annotations in a standardized form
is the lack of support for DICOM SR in pydicom.

So the contribution from Markus comes at an opportune time.

David

@pieper
Copy link
Contributor

pieper commented Mar 18, 2019

I agree, this is very timely. To me this is core dicom read/write/modify functionality, but I could see it being kept in a different repository under pydicom to help keep things manageable.

I'd also like to loop in @swederik since we have been working on related code in dcmjs. It would be great if the context_group.py code could also be used for JavaScript and dcmqi/C++ purposes for interoperability (unless the json can be used directly there?).

@hackermd
Copy link
Contributor Author

This PR provides an interface for creating SR content items and documents; however, I have also started to work on an interface for accessing and searching SR content more conveniently and efficiently.
At the moment, every DICOM data set is represented in pydicom as a Dataset object. However, some properties and methods of the Dataset class only apply to image instances (e.g. pixel_array). Structured report instances would benefit from other methods for accessing individual content items or walking the content tree.

High-level Python interfaces for creation and access of DICOM SR document content would be very helpful in the research domain and may promote the wider adoption of SR for ML research.
We could place the code in a separate repository/package. However, I think this functionality would be most useful as a higher level pydicom interface and thus argue in favor of including the code in the library.

@fedorov
Copy link
Contributor

fedorov commented Mar 18, 2019

Indeed, I hope this is helpful to the AI/ML community.

Just a few comments from the quick look:

  • did you consider having another higher-level simpler API (e.g., "make_document" style) that would be parameterized with JSON? Looking at the example, I can see how it can be appreciated by the folks in the DICOM community, but may be too much for AI/ML oriented developers
  • for the codes, I would suggest using CODE_<coding scheme designator>_<code meaning> prefix, following the pattern established in DCMTK.

@fedorov
Copy link
Contributor

fedorov commented Mar 18, 2019

@hackermd I see you already thought about the first point! 👍

@scaramallion @darcymason please let us know what is your verdict, and I will definitely make the time to review this PR.

@hackermd
Copy link
Contributor Author

@fedorov could you elaborate on your second point?

Let's consider CID 7021 MeasurementReportDocumentTitles as an example. Do you suggest renaming cid_7021.IMAGING_MEASUREMENT_REPORT to cid_7021.126002_DCM_IMAGINGMEASUREMENTREPORT? This would be a syntax error in Python.

DCMTK implements each context group as an enum (e.g.,CID7021_MeasurementReportDocumentTitles).
Initially, I followed a similar approach and implemented each context group as an Enum and included all classes in a single context_group Python module:

class MeasurementReportDocumentTitles(Enum):

    IMAGING_MEASUREMENT_REPORT = CodedConcept(
        value="126000",
        meaning="Imaging Measurement Report",
        scheme_designator="DCM"
    )

   ...

However, the resulting module was too large and importing classes from the module took way too long. Further, the names of context groups can be quite long and the long class names were clunky to use.

Therefore, I decided to implement each context group as a separate Python module using only the cids as module names to keep names short following PEP 8 guidelines for package and module names (e.g., cid_7021) and implemented each coded concept as a Python constant using all capital letters with underscores separating words following PEP 8 guidelines for constants (e.g., IMAGING_MEASUREMENT_REPORT).

This simplifies importing and using coded concepts:

from pydicom.sr.context_groups.cid_7021 import IMAGING_MEASUREMENT_REPORT

coded_concept = IMAGING_MEASUREMENT_REPORT

instead of

from pydicom.sr.context_groups.cid_7021 import MeasurementReportDocumentTitles

coded_concept = MeasurementReportDocumentTitles.IMAGING_MEASUREMENT_REPORT.value

However, I admit that it would be useful to have an Enum for each context group; for example to test whether a provided coded concept is part of a context group:

coded_concept = CodedConcept(
    value="126000",
    meaning="Imaging Measurement Report",
    scheme_designator="DCM"
)
coded_concept = MeasurementReportDocumentTitles(coded_concept)

@fedorov
Copy link
Contributor

fedorov commented Mar 18, 2019

Markus, I meant to consider replacing names of the items following this pattern:

IMAGING_MEASUREMENT_REPORT --> CODE_DCM_ImagingMeasurementReport

I just think this way it is more clear that this is a code, and where it is coming from.

Just a suggestion, not the requirement.

@pieper
Copy link
Contributor

pieper commented Mar 18, 2019

IMAGING_MEASUREMENT_REPORT --> CODE_DCM_ImagingMeasurementReport

Yes, the less SCREAM CASE the better IMO.

@hackermd
Copy link
Contributor Author

CODE_DCM_ImagingMeasurementReport may confuse Python programmers, since CamelCase is generally reserved for class names. I would stick to PEP 8 unless there is a really compelling reason not to.

I just think this way it is more clear that this is a code, and where it is coming from.

The import statement will make clear where constants are coming from. Prefixing each constant with CODE_ thus seems redundant.
However, there is another reason why I would consider this pattern. Several of the code meanings start with a number and are thus invalid Python constants. At the moment, meanings are transformed such that leading numbers are removed and appended at the end (e.g., "4 months to 1 year ago" becomes MONTHS_TO_1_YEAR_AGO_4). This is suboptimal and would be an argument in favor of using the CODE_ prefix. However, this would still not fully solve this issue, since some meanings contain other special characters that would lead to a Python syntax error (e.g., "> 1 year ago").

@fedorov @pieper How about CODE_DCM_IMAGING_MEASUREMENT_REPORT or CODE_IMAGING_MEASUREMENT_REPORT, depending on whether we would like to include the coding scheme designator?

Do you think it would be necessary to include the coding scheme designator in the constant name?

Including the scheme designator may help avoiding name collisions. However, this is mitigated by the fact that constants reside in different module namespaces. If there would be a name collision, it could be resolved upon import by assigning a different name:

from pydicom.sr.context_groups.cid_7021 import IMAGING_MEASUREMENT_REPORT as DCM_IMAGING_MEASUREMENT_REPORT

I think we all agree that this high level interface should be simple. To me, the scheme designator feels like a coding detail. I would argue it should either be {meaning} if we want to keep it simple and human friendly or {scheme}_{value} if we want to use unique codes instead (e.g., IMAGING_MEASUREMENT_REPORT or DCM_126000).

@fedorov
Copy link
Contributor

fedorov commented Mar 18, 2019

IMHO, readability and ease of use take precedence here, given the complexity of the topic.

Yes, import will make it clear where the constants are coming from, but only on the line where they are imported explicitly. If I need to find the constant before I import it, I need to search outside of the code editor. If I can do from pydicom.sr.context_groups.cid_7021 import *, and then type CODE_ or CODE_DCM_, and leave that search to my auto-completion (given I have it!), that makes my life easier. Having constants which do not indicate by their name what part of toolkit they belong to inevitably will make their use more difficult. "CODE" + "coding scheme designator" introduce an implicit hierarchy to organization, and makes code more readable.

Now that we talk about it, I would actually be very tempted to have a helper python code that imports codes from all CIDs. Having to look up which CID I need to import introduces yet another layer.

I wish this naming convention was the panacea to all difficulties in using codes with DICOM. All I am saying is that I personally prefer the naming scheme used in DCMTK.

Anyhow, I don't think this is a super-critical decision to make at this point. Let's wait for the pydicom gatekeepers to let us know if this PR belongs here at all.

@pieper
Copy link
Contributor

pieper commented Mar 18, 2019

Yes, being pep8 compatible is a good thing, but let's not forget it's just a set of guidelines. : D

Personally I like seeing some of the details exposed since we will see them again in other contexts like a dcmdump. So I'd prefer to see CODE_DCM_ImagingMeasurementReport because ImagingMeasurementReport matches what I would see in the standard and it 'looks like' DICOM. Similarly keeping the coding scheme designator as part of the identifier helps remind me what's in DICOM, what's in RadLex, what's in UCUM, etc. To my inner python programmer, the fact that it starts with CODE_ is enough to make me think it's a constant and the trailing CamelCase doesn't make me think it's a class.

For the special characters like in > than 1 year ago I hope that each one has a unique meaning and we could always map > to greater than and then follow the standard rules.

I'd really want to avoid the possibility of namespace clashes and the need to do special case coding to work around them if they appear in practice.

@fedorov
Copy link
Contributor

fedorov commented Mar 18, 2019

it'd be better as a separate package since its facilitating the creation of SR datasets rather than the core functionality of reading/modifying/writing DICOM files

@scaramallion I have to say I do not exactly understand the argument here. SR corresponds to an important class of DICOM files. pydicom does not currently provide functionality to read/modify/write DICOM files at the level of SR content tree. Why would you single out DICOM SR from DICOM?

@rhaxton
Copy link
Contributor

rhaxton commented Mar 18, 2019

I think this would be great to add to pydicom's core. Right now, when I create structured reports, I have to hunt down the value/scheme/meaning manually. If just to have a single place to look for them, this would be worth it. This would also help prevent cut-n-paste/typing errors for codes and meanings.

I would like to echo other comments here about naming, etc.

For cid_23.py, I think I would prefer something more like:

"""CID 23 CranioCaudadAngulation
auto-generated by generate_context_groups.py.
"""

concepts_in_cid23 = {
    # internal name: (code value, scheme, dicom meaning)
    "Caudal":   ("G-A108", "SRT", "Caudal"),
    "Cephalic": ("G-A107", "SRT", "Cephalic"),
}


class Concepts(object):
    def __getattr__(self, name):
        try:
            concept = concepts_in_cid23[name]
            self.__dict__[name] = CodedConcept(
                value=concept[0],
                meaning=concept[2],
                scheme_designator=concept[1]
            )
            return self.__dict__[name]
        except Exception as e:
            raise AttributeError(e)

This way, when I do a text search for a meaning, I will end up in a table of meanings for that cid and I won't have to scroll too much to find all the related concepts I might wish to include. Like others have mentioned, I would prefer mixed case names with underscores when I refer to these in my code.

I don't know if the lazy initialization is worth it at all, but it might save some resources for the larger cid's. You might also consider having the lazy init on its own and have it import all the CID dictionaries and then search through each one until it finds the requested meaning. However, this might interfere with auto-completion in editors, so maybe it is not worth it....

Thanks for this, I would absolutely use this and having it in the core would help.

@darcymason
Copy link
Member

Hello all,
Sorry for the delay in responding, I've been away and then 2FA wasn't working ... anyway, I think I'm good with SR abilities being added to pydicom. It does sound like there is a lot of enthusiasm for this, and I see it as, well, just adding some convenience onto creating files using pydicom.

I'd like to look through it in more detail, but my connectivity is somewhat limited until the weekend. I see there is some debate about naming conventions. As someone who doesn't (and hasn't) used SR, I'll weigh in when I can, but hopefully some kind of consensus about the details can emerge.

@darcymason
Copy link
Member

So I've had some time to review some SR concepts and look through the code.

Clearly a lot of thought and great work has gone into this. I'd like to see it in core pydicom. But before a detailed code review, I think the syntax debate should be further discussed.

The cid_xxx files contain CodedConcept class instantiations with keyword parameters: value, meaning, scheme_designator. There is lot of repeated code there. Would it not be simpler to instead use dictionaries like:

concepts = {
"SRT": {
   # mapping keyword: (value, meaning)
   'CervicothoracicSpine': ("T-D00F7", "Cervico-thoracic spine"),
   'Esophagus': ("T-56000", "Esophagus"),
   ....
   }
 "DCM": {
     "Person": ("121006", "Person"),
     ...
   }
}

For performance reasons, there could be multiple dicts, only loaded as needed.

To me, it would be simpler and more readable to have a class or factory function create the CodedContext instances when needed from dictionaries as above. How about a syntax something like (here I'm excerpting just the named parameters to constructors in @hackermd's original example):

from pydicom.sr import code # `code` is a class with special attribute access methods
observer_type=code.dcm.Person
anatomic_location = code.srt.CervicothoracicSpine

This is similar to what people were asking for in terms of dcmtk style, and is similar to ds.PatientName, etc., also not PEP8 compliant, and also using some behind-the-scenes methods to make dictionary access look like attributes. The class could also add methods for auto-completion, just like Dataset does.

Can the scheme designator be assumed in some cases, only used if necessary to disambiguate? E.g. code.Person just works if Person is only defined in DCM? Or, perhaps the scheme be set as a default, or set in a python context manager way.

For names starting with numbers as @hackermd mentioned, could we not e.g. prepend an underscore, or make a method to extract with a string passed:

x = code._4MonthsTo1YearAgo
# OR
x = code["4 months to 1 year ago"]

In all my above comments I have left out structuring things by the cid_x... because I don't understand if/what that adds. Some CodedConcepts appear to be repeated many times across these files with identical definitions. But if there is some need to refer to cid_x, they could be created from code.dcm.Person, etc. through the mechanism above. Perhaps code.cid_270.Person could be another form that the class or methods to make these instances could handle.

I'm just putting some ideas out there, to help the discussion (or get me better informed). I think I'd prefer to see dictionaries with an associated class or method that generates CodedContext instances as needed. That would seem more like pydicom's style, and in any case the intermediary could allow different ways to get at the SR concepts in the most natural way for the situation (or according to people's preferences), while still remaining unambiguous to read.

@fedorov
Copy link
Contributor

fedorov commented Mar 21, 2019

I like @darcymason proposal. It is similar to the DCMTK style in readability, but arguably achieves more in usability. Those dictionaries can grow really large though. How would you propose splitting them?

I would consider including coding scheme designator in the tuples. It is redundant (as many things in DICOM!), but might make things easier to use (potentially).

@darcymason
Copy link
Member

Those dictionaries can grow really large though. How would you propose splitting them?

Yeah, that's an issue, but it would be handled by the intermediary, transparent to the user, so there are lots of options. I would suggest the most common concepts are loaded the first time the intermediary is called. Then any results found could be cached for quicker return next time. If not in the cache, then check the dictionary, if not in the dictionary, load other dictionaries. Those could be divided in multiple ways - alphabetic, by cid_x, etc. However, another aspect of this is that dictionaries are compact and fast. The dicom dictionary in pydicom is quite large but once the file is byte-compiled python, loading is pretty fast.

@hackermd
Copy link
Contributor Author

@darcymason I am fine with using dictionaries, as long as we don't "expose" them directly to users. The CID coded concepts are auto-generated from FHIR value set resources in JSON format. So we could access these resources more dynamically. This may also be a good idea for extension, since some users may want to use custom codes.

I would argue in favor of keeping codes for CIDs separate - not only for performance reasons, but also because some of the concept groups are not extensible and there are use cases where we want to check whether a provided code is a member of a context group, i.e. represents an allowed value for a coded content item.

In terms of syntax, I would prefer if we would either use scheme_designator + value OR meaning but not mix the two, i.e. either code.ObserverType.Person or code.cid_270.dcm_121006.
I would definitely avoid code["4 months to 1 year ago"], since we want to use codes to not have to deal with free text! We should keep in mind that meaning is only intended for display purposes and there is not guarantee that its value will be stable. If we want to use meaning for representation of coded concepts, we should thus use a Python constant or enumerated item. The more I think about it, the more I am getting convinced that we should use code.cid_270.dcm_121006 or something similar and use meaning only for display (as a return value of a __str__ method).

@pieper
Copy link
Contributor

pieper commented Mar 21, 2019

We should keep in mind that meaning is only intended for display purposes and there is not guarantee that its value will be stable.

While I agree this is technically true, in practice I would much rather see people writing code.dcm.Person and not code.cid_270.dcm_121006. Using the text is more consistent with the rest of pydicom, and I'm sure that if cone meaning strings change we can maintain a backwards mapping so that nothing would break.

@hackermd
Copy link
Contributor Author

While I agree this is technically true, in practice I would much rather see people writing code.dcm.Person and not code.cid_270.dcm_121006. Using the text is more consistent with the rest of pydicom, and I'm sure that if cone meaning strings change we can maintain a backwards mapping so that nothing would break.

@pieper I agree that this would improve readability and would be more consist with the rest of pydicom. This is why I started to use PERSON in the first place.

However, the keyword of an attribute and the meaning of a coded concept are two different beasts. Therefore, I think we need to be careful with this approach.

The standard is also pretty clear about this (see Part 3 Section 8.3):

It should be noted that for a particular Coding Scheme Designator (0008,0102) and Code Value (0008,0100) or Long Code Value (0008,0119), or URN Code Value (0008,0120), several alternative values for Code Meaning (0008,0104) may be defined. These may be synonyms in the same language or translations of the Coding Scheme into other languages. Hence the value of Code Meaning (0008,0104) shall never be used as a key, index or decision value, rather the combination of Coding Scheme Designator (0008,0102) and Code Value (0008,0100), Long Code Value (0008,0119), or URN Code Value (0008,0120) may be used. Code Meaning (0008,0104) is a purely annotative, descriptive Attribute.

@pieper
Copy link
Contributor

pieper commented Mar 22, 2019

It's true, we shouldn't be depending on 'meaning' for interoperability, but here we are discussing what are effectively variable names (in the form of enums or class members).

I do think the mapping between the 'descriptive' variable names and the corresponding codes should be explicit. I understand that @dclunie will be proposing something to the standards committee related to the use of imported business names' (like our variable names) and corresponding definitions like in the examples linked below. This allows you to stick to use the 'meaning' or other shorthand if it makes more sense. Perhaps the python api can use similar convention to maintain high level readability.

ftp://d9-workgrps:Private15@medical.nema.org/MEDICAL/Private/Dicom/WORKGRPS/WG23/2019/2019-01-25/simpleSRinJSON_TID1500_CrowsCureCancerLinearMeasurement_20190125/measurementWithImportedBusinessNames.json

ftp://d9-workgrps:Private15@medical.nema.org/MEDICAL/Private/Dicom/WORKGRPS/WG23/2019/2019-01-25/simpleSRinJSON_TID1500_CrowsCureCancerLinearMeasurement_20190125/businessNameDeclarationsForMeasurement.json

@fedorov
Copy link
Contributor

fedorov commented Mar 22, 2019

The standard is also pretty clear about this (see Part 3 Section 8.3):

[...] Hence the value of Code Meaning (0008,0104) shall never be used as a key, index or decision value [...]

Right, but I think in this particular case, the reason we have this discussion is to make the code more usable and readable by the developers. Where the standard applies is the end object that comes out of the implementation, or how one interprets the attributes extracted from that object. I do not think it is appropriate to apply constraints on the object use and interpretation as the constraints on the implementation.

@hackermd
Copy link
Contributor Author

It's true, we shouldn't be depending on 'meaning' for interoperability, but here we are discussing what are effectively variable names (in the form of enums or class members).

My main point is that we should avoid codes['Person'], but use enums or class members instead. This is why I decided to auto-generate code (modules, constants and enumerated items) rather than dynamically accessing dictionaries using meaning as a key.
However, when we auto-generate variable names from other documents, such as FHIR ValueSet resources, we still need to be careful, since the value of meaning (or display as it's called in FHIR) may not remain stable over different versions or releases. We may thus unintentionally introduce backwards incompatible changes into the library when we update the underlying context group (value set) definitions.

Maybe we shouldn't provide any predefined codes in pydicom and let users of the library decide how they want to implement context groups at the application level based on CodedConcept.

This is how the above example would then look like:

from pydicom.filereader import dcmread
from pydicom.uid import generate_uid

from pydicom.sr.document import Comprehensive3DSR
from pydicom.sr.templates import (
    DeviceObserverIdentifyingAttributes,
    FindingSite,
    Measurement,
    MeasurementProperties,
    MeasurementReport,
    ObservationContext,
    ObserverContext,
    PersonObserverIdentifyingAttributes,
    PlanarROIMeasurementsAndQualitativeEvaluations,
    ReferencedRegion,
    ReferencedVolume,
    ROIMeasurements,
    SourceImageForRegion,
    SourceImageForSegmentation,
    SubjectContext,
    TrackingIdentifier,
    VolumetricROIMeasurementsAndQualitativeEvaluations,
)
from pydicom.sr.value_types import CodedConcept, GraphicTypes

if __name__ == '__main__':

    ref_filename = 'pydicom/data/test_files/CT_small.dcm'
    ref_dataset = dcmread(ref_filename)

    observer_person_context = ObserverContext(
        observer_type=CodedConcept(
            value="121006",
            meaning="Person",
            scheme_designator="DCM"
        ),
        observer_identifying_attributes=PersonObserverIdentifyingAttributes(
            name='Foo'
        )
    )
    observer_device_context = ObserverContext(
        observer_type=CodedConcept(
            value="121007",
            meaning="Device",
            scheme_designator="DCM"
        ),
        observer_identifying_attributes=DeviceObserverIdentifyingAttributes(
            uid=generate_uid()
        )
    )
    observation_context = ObservationContext(
        observer_person_context=observer_person_context,
        observer_device_context=observer_device_context,
    )

    region_reference = ReferencedRegion(
        graphic_type=GraphicTypes.CIRCLE,
        graphic_data=((58.0, 52.0), (58.0, 41.0)),
        source_image=SourceImageForRegion(
            sop_class_uid=ref_dataset.SOPClassUID,
            sop_instance_uid=ref_dataset.SOPInstanceUID
        )
    )
    finding_sites = [
        FindingSite(
            anatomic_location=CodedConcept(
                value="T-D00F7",
                meaning="Cervico-thoracic spine",
                scheme_designator="SRT"
            ),
            topographical_modifier=CodedConcept(
                value="T-11531",
                meaning="Vertebral foramen",
                scheme_designator="SRT"
            )
        ),
    ]
    measurements = [
        Measurement(
            name=CodedConcept(
                value="G-A16A",
                meaning="Area of defined region",
                scheme_designator="SRT"
            ),
            tracking_identifier=TrackingIdentifier(uid=generate_uid()),
            value=1.7,
            unit=CodedConcept(
                value="mm2",
                meaning="square millimeter",
                scheme_designator="UCUM"
            ),
            properties=MeasurementProperties(
                normality=CodedConcept(
                    value="G-A460",
                    meaning="Normal",
                    scheme_designator="SRT"
                ),
                level_of_significance=CodedConcept(
                    value="R-00345",
                    meaning="Not significant",
                    scheme_designator="SRT"
                )
            )
        )
    ]
    region_measurements = ROIMeasurements(
        measurements=measurements,
        finding_sites=finding_sites
    )
    imaging_measurements = PlanarROIMeasurementsAndQualitativeEvaluations(
        tracking_identifier=TrackingIdentifier(
            uid=generate_uid(),
            identifier='Planar ROI Measurements'
        ),
        referenced_region=region_reference,
        finding_type=CodedConcept(
            value="T-A7010",
            meaning="Spinal cord",
            scheme_designator="SRT"
        ),
        measurements=region_measurements
    )
    measurement_report = MeasurementReport(
        observation_context=observation_context,
        procedure_reported=CodedConcept(
            value="25045-6",
            meaning="CT unspecified body region",
            scheme_designator="LN"
        ),
        imaging_measurements=imaging_measurements
    )

    document = Comprehensive3DSR(
        evidence=[ref_dataset],
        content=measurement_report,
        series_instance_uid=generate_uid(),
        series_number=1,
        series_description='Measurement Reports',
        sop_instance_uid=generate_uid(),
        instance_number=1,
        institution_name='Institution',
        institution_department_name='Institution Department',
        manufacturer='Manufacturer'
    )
    document_filename = '/tmp/{}.dcm'.format(document.SOPInstanceUID)
    document.save_as(document_filename)

Under the hood, we could still use context group definitions to check value set constraints for coded content items, i.e. assert that the values of value and scheme_designator attributes of a providedCodedConcept object are valid in the given context.

What do think @pieper @fedorov @darcymason?

@pieper
Copy link
Contributor

pieper commented Mar 23, 2019

@hackermd did you look at the business names json example from @dclunie? I like the idea of putting some of the encoding details into a configuration document so the python level can be more at the conceptual level.

@darcymason
Copy link
Member

I've been travelling again, just getting back to following up on this discussion...

I would definitely avoid code["4 months to 1 year ago"], since we want to use codes to not have to deal with free text! We should keep in mind that meaning is only intended for display purposes and there is not guarantee that its value will be stable.

My main point is that we should avoid codes['Person'], but use enums or class members instead. This is why I decided to auto-generate code (modules, constants and enumerated items) rather than dynamically accessing dictionaries using meaning as a key.

I only showed that first example to answer your point about invalid python identifiers (starting with a number). My preferred option for that case would be the leading underscore that I also showed could be used. I certainly agree to forget about codes['Person'] as an option, and just discuss whether codes.Person, or codes.dcm.Person, or codes.cidXX.Person are options.

I 'm sad that you seem to be backing off from the latter uses. Pydicom is a programmers' toolkit, not an end-user program and it has to be the programmers' responsibility to adhere to the DICOM standard (or not). IMO the main value of an SR subpackage (as others have pointed out) is to abstract away a lot of the boilerplate and make the code easier to create, more readable, and more maintainable. I don't think that is well served by someone having to look up "G-A16A" etc, any more than they should use dataset[0x300A0020].

Sorry if I've belabored that point; its tough to convey nuances in these kinds of discussion. Perhaps the best thing is to have some code to specifically discuss.

I've copied your generating code and modified it to produce the kind of intermediary dictionaries I was talking about. I haven't pushed into someone else's PR before, but I believe that it is possible and we can always unwind anything with a later push. If that doesn't work I'll post it in some other way and point to it.

My current version creates two files, one about 1.5 MB, and another of 0.8 MB. Compare this with pydicom's _dicom_dict.py at a little under 0.5 MB, so I don't these are too bad. And the pyc byte-code-compiled files load in ~< 10 microseconds on my machine.

I'm just going to make a last couple of edits, then I will post something for discussion. And I'll try to re-read some of the previous discussion and comment on any other issues that have been raised.

@darcymason
Copy link
Member

Well, my changes are done, but I'm having trouble figuring out the push and I'll have to come back to this after work.

@hackermd, it may work if you give permission to push to your fork. But this is new to me. If you would rather not do that, I will push to a new branch.

But here are some examples of the output on my local branch:

>>> from pydicom.sr import codes
>>> codes.Person
(0008, 0100) Code Value                          SH: '121006'
(0008, 0102) Coding Scheme Designator            SH: 'DCM'
(0008, 0104) Code Meaning                        LO: 'Person'

>>> codes.cid270.dir()
['Device', 'Person']

>>> codes.cid270.Person
(0008, 0100) Code Value                          SH: '121006'
(0008, 0102) Coding Scheme Designator            SH: 'DCM'
(0008, 0104) Code Meaning                        LO: 'Person'

>>> codes.dir("Lumen")
['FalseLumen', 'Lumen', 'LumenAreaStenosis', 'LumenDiameterRatio', 'LumenDiamete
rStenosis', 'LumenEccentricityIndex', 'LumenOfArtery', 'LumenOfBloodVessel', 'Lu
menPerimeter', 'LumenShapeIndex', 'LumenVolume', 'SiteOfLumenMaximum', 'SiteOfLu
menMinimum', 'TrueLumen', 'VesselLumenCrossSectionalArea', 'VesselLumenDiameter'
]

>>> codes.DCM.dir("marker")
['CoilMarker', 'CylinderMarker', 'FalseMarkersPerCase', 'FalseMarkersPerImage',
'InfraredReflectorMarker', 'InterMarkerDistance', 'MarkerPlacement', 'MrMarker',
 'OtherMarker', 'PostProcedureMammogramsForMarkerPlacement', 'RtPatientPositionR
egistration3dCtMarkerBased', 'TransponderMarker', 'ViewAndLateralityMarkerDoesNo
tHaveApprovedCodes', 'ViewAndLateralityMarkerDoesNotHaveBothViewAndLaterality',
'ViewAndLateralityMarkerIsIncorrect', 'ViewAndLateralityMarkerIsMissing', 'ViewA
ndLateralityMarkerIsNotNearTheAxilla', 'ViewAndLateralityMarkerIsOffImage', 'Vie
wAndLateralityMarkerIsPartiallyObscured', 'ViewAndLateralityMarkerOverlapsBreast
Tissue', 'VisibleReflectorMarker', 'WireMarker']

and autocompletion works for a first-level object. E.g.

codes.Pers<tab>

in ipython shows Person, as well as PersianCat, etc. codes.DCM.Pers<tab> does not work, but dcm=codes.DCM; dcm.Pers<tab> does. That seems to be an ipython issue for dynamically generated atttributes.

@hackermd
Copy link
Contributor Author

@darcymason The examples look really nice. Excited to see the code. I added you as a collaborator to the fork. You should have received an invite and should be able to push to the repository.

@hackermd
Copy link
Contributor Author

If we follow through on the logic of using names instead of numeric identifiers, we may also want to consider using codes.ObserverType.Person instead of codes.cid270.Person.

@darcymason
Copy link
Member

we may also want to consider using codes.ObserverType.Person instead of codes.cid270.Person

Sounds good. I'll look into that too.

Use name consistent with corresponding Information Entity (IE)
@hackermd
Copy link
Contributor Author

hackermd commented Aug 27, 2019

I would still like to address the previous question I had about all the "pass" sub-classes in media_storage. It's not a big deal, but it seems wasteful, unless there is a good reason to expect these to have their own methods in future.

Yes, these classes should implement different Information Entities (IE), which are associated with different modules and thus require different methods/properties:

  • ImageStorage -> pixel_array of type numpy.ndarray (representing value of Pixel Data attribute)
  • SRDocumentStorage -> content_items of type pydicom.sequence.Sequence[pydicom.dataset.Dataset] (representing value of Content Sequence attribute). The PR previously contained the classes ContentSequence and ContentItem to facilitate querying the content tree. We could bring them back.

Shall we take this on in this PR or in a separate one?
This could be a larger endeavor and may require a bit of refactoring, which is why I was reluctant to implement this functionality in the first place.

@hackermd
Copy link
Contributor Author

In terms of testing, coverage for codedict is not that high (~60%) ... we could deal with that in a separate PR if you like (its getting difficult to navigate in this one). Important ones to add would be the _CID_Dict getattr 'misses' - no matches, more than 1 match, and some errors raised like the KeyError for unknown code name for scheme.

Good point. I agree, we should expand upon tests. It's probably best we add the proposed tests now.

@darcymason
Copy link
Member

@hackermd, I left this hanging for a while, but am looking at it again, you should hear more from me later today or tomorrow.

@hackermd
Copy link
Contributor Author

@darcymason thanks for looking into it

@darcymason
Copy link
Member

Just an update ... still mulling over the ImageStorage classes etc., coming around to the idea to some extent but trying to think through if there are alternatives. This could lead to a pretty fundamental conceptual framework for pydicom so want to think it over carefully.

@edmcdonagh
Copy link
Contributor

Does this functionality have an expected target release version allocated?

@darcymason
Copy link
Member

darcymason commented Dec 4, 2019

Does this functionality have an expected target release version allocated?

I'm hoping still to get this out in the upcoming release this month, (not as official, but provisional). I'm still struggling (apologies @hackermd) to get enough time to decide on the subclasses of Dataset and whether to go that route or something a little different. Actually, @mrbean-bremen, @scaramallion I would appreciate any thoughts you might have about that, if you can dig into the code a little for that part.

@darcymason
Copy link
Member

Actually, @mrbean-bremen, @scaramallion I would appreciate any thoughts you might have about that, if you can dig into the code a little for that part.

Specifically here, talking about the classes in the media_storage.py file. In this PR, these placeholder classes are created in filereader.py based on the MediaStorageSOPClassUID.

@mrbean-bremen
Copy link
Member

@darcymason: I have largely ignored this PR so far - there are lots of people with more SR experience here - so it will take some time for me to get up to date. I will see if can get to it over the weekend - having a preview version in the next release would indeed be nice. No promises, though...

@scaramallion
Copy link
Member

Actually, @mrbean-bremen, @scaramallion I would appreciate any thoughts you might have about that, if you can dig into the code a little for that part.

I'll try and go over it in the next week.

@@ -0,0 +1,57 @@
from collections import namedtuple

from pydicom.dataset import Dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import

@hackermd
Copy link
Contributor Author

hackermd commented Dec 6, 2019

Yeah, the subclassing of Dataset seems unnecessary.

How would you suggest implementing the functionality of exposing different methods depending on the SOP Class?

@scaramallion
Copy link
Member

scaramallion commented Dec 6, 2019

I can see some advantages in creating something like a BaseDataset class which would contain the dict-like behaviour and reading/writing functionality we expect and then having subclasses (DICOMDIR, Image Storage, SR, etc) which would actually implement specific methods for the type of dataset (such as pixel_array). We already do this to a limited extent with DicomDir...

But on the other hand, increasing the complexity of the user interface by having methods that appear for one SOP Class and not another seems counterproductive. Functionality tends to get added and rarely removed so it would only get worse over time. It would also increase the maintenance load because changes to the base class would have to be tested across multiple subclass types.

And Dataset currently only has 4 public Image Storage specific methods/properties anyway (compared to ~50 base-class like methods) so its not like there's a huge overhead. Mismatching of functionality and SOP Class could be dealt with just by raising an exception.

class Dataset():
    def pixel_array(self):
        if self.SOPClassUID.name.endswith("Image Storage"):
            # do stuff
        else:
            raise SomeException("DICOM says no")

Eh, I don't really know about this one. I think keeping the single large Dataset implementation will probably end up being less work and easier to maintain (and use!), but my OCD wants everything split up.

@hackermd
Copy link
Contributor Author

hackermd commented Dec 7, 2019

But on the other hand, increasing the complexity of the user interface by having methods that appear for one SOP Class and not another seems counterproductive.

We could just add additional methods/properties, which would map to particular Attributes (e.g., pixel_data() -> Pixel Data) and raise an exception (e.g., AttributeError/KeyError) if the corresponding Data Element is not contained in the Data Set. However, not sure this would really make things easier - from both a usage and a maintenance perspective.

@darcymason
Copy link
Member

darcymason commented Dec 7, 2019

I've been considering the concept of "handlers" registered to Dataset again, and by complete coincidence, I just happened to come across a solution that pandas uses: registering custom accessors.

As they use it, it would lead to something like:

ds.sr.some_property
ds.sr.some_function()

I'm not sure how this would work with nested datasets (i.e. sequences)... in this form it seems it would have the accessor named at each level - cumbersome particularly with SR. So perhaps it makes more sense to register a handler for certain tags/keywords or tag group numbers, or the SOPClassUID; a handler like that would automatically flow through to all nested datasets.

I haven't looked in detail yet, but in effect the pandas extender is creating a class which wraps the object (a dataset in our case), but then also makes it accessible through the standard class using a decorator.

Needs some more thought, but it looks interesting. It would at least provide a consistent method for different developers to add custom dataset behavior. And, as in the pandas example, a validator call in __init__ could ensure that the dataset contains the right information.

@darcymason
Copy link
Member

Needs some more thought, but it looks interesting. It would at least provide a consistent method for different developers to add custom dataset behavior

I'm going to take a crack at this over the next few days and see what I can come up with.

@mrbean-bremen
Copy link
Member

mrbean-bremen commented Dec 10, 2019

Just a quick note: I dug through all the comments (that took some time...) and the code changes over the weekend, and I think the writing part is ready to merge as a preliminary version - it looks quite good to me (though I have used SR very little, as already stated, so I can't really judge the usabilty too much). The test coverage is not bad (there are some tests missing for error handling, but these can be added later). We just have to state in the release notes that this is not stable yet, same as with the JSON stuff, so that we may get some early feedback.
I would even merge this if the reading part is not ready (probably remove these stub classes first, as already mentioned), as this seems to be quite a valuable addition to me.

@darcymason darcymason mentioned this pull request Dec 21, 2019
25 tasks
@darcymason darcymason added this to the v1.4 milestone Dec 22, 2019
@darcymason
Copy link
Member

Okay, here's the plan ... I'll merge this in shortly, then open a new issue with changes to make prior to 1.4 release. A new issue and PRs will let us focus on a few things at a time.

I do think we should drop the subclassing of Dataset until actually needed. We also need some documentation of what is in place so far. I will set out a list in the new issue.

Thanks @hackermd and everyone for this very good discussion and code. And apologies again for letting this slide for so long. We will have something "alpha" out with v1.4, and we can see how we can go forward with it.

@darcymason darcymason merged commit 0c9bb3c into pydicom:master Dec 30, 2019
@darcymason darcymason deleted the feature/structured-reporting branch December 30, 2019 15:57
@darcymason darcymason mentioned this pull request Dec 30, 2019
5 tasks
@darcymason darcymason mentioned this pull request Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet