New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of structured reporting #824
Implementation of structured reporting #824
Conversation
This seems like it'd be better as a separate package since its facilitating the creation of SR datasets rather than the core functionality of reading/modifying/writing DICOM files, but I'd be interested to see what @darcymason thinks. |
I heard from a lot of AI/ML researchers at the C-MIMI meeting So the contribution from Markus comes at an opportune time. David |
I agree, this is very timely. To me this is core dicom read/write/modify functionality, but I could see it being kept in a different repository under pydicom to help keep things manageable. I'd also like to loop in @swederik since we have been working on related code in dcmjs. It would be great if the context_group.py code could also be used for JavaScript and dcmqi/C++ purposes for interoperability (unless the json can be used directly there?). |
This PR provides an interface for creating SR content items and documents; however, I have also started to work on an interface for accessing and searching SR content more conveniently and efficiently. High-level Python interfaces for creation and access of DICOM SR document content would be very helpful in the research domain and may promote the wider adoption of SR for ML research. |
Indeed, I hope this is helpful to the AI/ML community. Just a few comments from the quick look:
|
@hackermd I see you already thought about the first point! 👍 @scaramallion @darcymason please let us know what is your verdict, and I will definitely make the time to review this PR. |
@fedorov could you elaborate on your second point? Let's consider CID 7021 MeasurementReportDocumentTitles as an example. Do you suggest renaming DCMTK implements each context group as an class MeasurementReportDocumentTitles(Enum):
IMAGING_MEASUREMENT_REPORT = CodedConcept(
value="126000",
meaning="Imaging Measurement Report",
scheme_designator="DCM"
)
... However, the resulting module was too large and importing classes from the module took way too long. Further, the names of context groups can be quite long and the long class names were clunky to use. Therefore, I decided to implement each context group as a separate Python module using only the cids as module names to keep names short following PEP 8 guidelines for package and module names (e.g., This simplifies importing and using coded concepts: from pydicom.sr.context_groups.cid_7021 import IMAGING_MEASUREMENT_REPORT
coded_concept = IMAGING_MEASUREMENT_REPORT instead of from pydicom.sr.context_groups.cid_7021 import MeasurementReportDocumentTitles
coded_concept = MeasurementReportDocumentTitles.IMAGING_MEASUREMENT_REPORT.value However, I admit that it would be useful to have an coded_concept = CodedConcept(
value="126000",
meaning="Imaging Measurement Report",
scheme_designator="DCM"
)
coded_concept = MeasurementReportDocumentTitles(coded_concept) |
Markus, I meant to consider replacing names of the items following this pattern:
I just think this way it is more clear that this is a code, and where it is coming from. Just a suggestion, not the requirement. |
Yes, the less SCREAM CASE the better IMO. |
The import statement will make clear where constants are coming from. Prefixing each constant with @fedorov @pieper How about Do you think it would be necessary to include the coding scheme designator in the constant name? Including the scheme designator may help avoiding name collisions. However, this is mitigated by the fact that constants reside in different module namespaces. If there would be a name collision, it could be resolved upon import by assigning a different name: from pydicom.sr.context_groups.cid_7021 import IMAGING_MEASUREMENT_REPORT as DCM_IMAGING_MEASUREMENT_REPORT I think we all agree that this high level interface should be simple. To me, the scheme designator feels like a coding detail. I would argue it should either be |
IMHO, readability and ease of use take precedence here, given the complexity of the topic. Yes, import will make it clear where the constants are coming from, but only on the line where they are imported explicitly. If I need to find the constant before I import it, I need to search outside of the code editor. If I can do Now that we talk about it, I would actually be very tempted to have a helper python code that imports codes from all CIDs. Having to look up which CID I need to import introduces yet another layer. I wish this naming convention was the panacea to all difficulties in using codes with DICOM. All I am saying is that I personally prefer the naming scheme used in DCMTK. Anyhow, I don't think this is a super-critical decision to make at this point. Let's wait for the pydicom gatekeepers to let us know if this PR belongs here at all. |
Yes, being pep8 compatible is a good thing, but let's not forget it's just a set of guidelines. : D Personally I like seeing some of the details exposed since we will see them again in other contexts like a dcmdump. So I'd prefer to see For the special characters like in I'd really want to avoid the possibility of namespace clashes and the need to do special case coding to work around them if they appear in practice. |
@scaramallion I have to say I do not exactly understand the argument here. SR corresponds to an important class of DICOM files. pydicom does not currently provide functionality to read/modify/write DICOM files at the level of SR content tree. Why would you single out DICOM SR from DICOM? |
I think this would be great to add to pydicom's core. Right now, when I create structured reports, I have to hunt down the value/scheme/meaning manually. If just to have a single place to look for them, this would be worth it. This would also help prevent cut-n-paste/typing errors for codes and meanings. I would like to echo other comments here about naming, etc. For cid_23.py, I think I would prefer something more like:
This way, when I do a text search for a meaning, I will end up in a table of meanings for that cid and I won't have to scroll too much to find all the related concepts I might wish to include. Like others have mentioned, I would prefer mixed case names with underscores when I refer to these in my code. I don't know if the lazy initialization is worth it at all, but it might save some resources for the larger cid's. You might also consider having the lazy init on its own and have it import all the CID dictionaries and then search through each one until it finds the requested meaning. However, this might interfere with auto-completion in editors, so maybe it is not worth it.... Thanks for this, I would absolutely use this and having it in the core would help. |
Hello all, I'd like to look through it in more detail, but my connectivity is somewhat limited until the weekend. I see there is some debate about naming conventions. As someone who doesn't (and hasn't) used SR, I'll weigh in when I can, but hopefully some kind of consensus about the details can emerge. |
So I've had some time to review some SR concepts and look through the code. Clearly a lot of thought and great work has gone into this. I'd like to see it in core pydicom. But before a detailed code review, I think the syntax debate should be further discussed. The cid_xxx files contain CodedConcept class instantiations with keyword parameters: value, meaning, scheme_designator. There is lot of repeated code there. Would it not be simpler to instead use dictionaries like: concepts = {
"SRT": {
# mapping keyword: (value, meaning)
'CervicothoracicSpine': ("T-D00F7", "Cervico-thoracic spine"),
'Esophagus': ("T-56000", "Esophagus"),
....
}
"DCM": {
"Person": ("121006", "Person"),
...
}
} For performance reasons, there could be multiple dicts, only loaded as needed. To me, it would be simpler and more readable to have a class or factory function create the CodedContext instances when needed from dictionaries as above. How about a syntax something like (here I'm excerpting just the named parameters to constructors in @hackermd's original example): from pydicom.sr import code # `code` is a class with special attribute access methods
observer_type=code.dcm.Person
anatomic_location = code.srt.CervicothoracicSpine This is similar to what people were asking for in terms of dcmtk style, and is similar to Can the scheme designator be assumed in some cases, only used if necessary to disambiguate? E.g. For names starting with numbers as @hackermd mentioned, could we not e.g. prepend an underscore, or make a method to extract with a string passed: x = code._4MonthsTo1YearAgo
# OR
x = code["4 months to 1 year ago"] In all my above comments I have left out structuring things by the cid_x... because I don't understand if/what that adds. Some CodedConcepts appear to be repeated many times across these files with identical definitions. But if there is some need to refer to cid_x, they could be created from code.dcm.Person, etc. through the mechanism above. Perhaps I'm just putting some ideas out there, to help the discussion (or get me better informed). I think I'd prefer to see dictionaries with an associated class or method that generates CodedContext instances as needed. That would seem more like pydicom's style, and in any case the intermediary could allow different ways to get at the SR concepts in the most natural way for the situation (or according to people's preferences), while still remaining unambiguous to read. |
I like @darcymason proposal. It is similar to the DCMTK style in readability, but arguably achieves more in usability. Those dictionaries can grow really large though. How would you propose splitting them? I would consider including coding scheme designator in the tuples. It is redundant (as many things in DICOM!), but might make things easier to use (potentially). |
Yeah, that's an issue, but it would be handled by the intermediary, transparent to the user, so there are lots of options. I would suggest the most common concepts are loaded the first time the intermediary is called. Then any results found could be cached for quicker return next time. If not in the cache, then check the dictionary, if not in the dictionary, load other dictionaries. Those could be divided in multiple ways - alphabetic, by cid_x, etc. However, another aspect of this is that dictionaries are compact and fast. The dicom dictionary in pydicom is quite large but once the file is byte-compiled python, loading is pretty fast. |
@darcymason I am fine with using dictionaries, as long as we don't "expose" them directly to users. The CID coded concepts are auto-generated from FHIR value set resources in JSON format. So we could access these resources more dynamically. This may also be a good idea for extension, since some users may want to use custom codes. I would argue in favor of keeping codes for CIDs separate - not only for performance reasons, but also because some of the concept groups are not extensible and there are use cases where we want to check whether a provided code is a member of a context group, i.e. represents an allowed value for a coded content item. In terms of syntax, I would prefer if we would either use |
While I agree this is technically true, in practice I would much rather see people writing |
@pieper I agree that this would improve readability and would be more consist with the rest of pydicom. This is why I started to use However, the keyword of an attribute and the meaning of a coded concept are two different beasts. Therefore, I think we need to be careful with this approach. The standard is also pretty clear about this (see Part 3 Section 8.3):
|
It's true, we shouldn't be depending on 'meaning' for interoperability, but here we are discussing what are effectively variable names (in the form of enums or class members). I do think the mapping between the 'descriptive' variable names and the corresponding codes should be explicit. I understand that @dclunie will be proposing something to the standards committee related to the use of imported business names' (like our variable names) and corresponding definitions like in the examples linked below. This allows you to stick to use the 'meaning' or other shorthand if it makes more sense. Perhaps the python api can use similar convention to maintain high level readability.
|
Right, but I think in this particular case, the reason we have this discussion is to make the code more usable and readable by the developers. Where the standard applies is the end object that comes out of the implementation, or how one interprets the attributes extracted from that object. I do not think it is appropriate to apply constraints on the object use and interpretation as the constraints on the implementation. |
My main point is that we should avoid Maybe we shouldn't provide any predefined codes in pydicom and let users of the library decide how they want to implement context groups at the application level based on This is how the above example would then look like: from pydicom.filereader import dcmread
from pydicom.uid import generate_uid
from pydicom.sr.document import Comprehensive3DSR
from pydicom.sr.templates import (
DeviceObserverIdentifyingAttributes,
FindingSite,
Measurement,
MeasurementProperties,
MeasurementReport,
ObservationContext,
ObserverContext,
PersonObserverIdentifyingAttributes,
PlanarROIMeasurementsAndQualitativeEvaluations,
ReferencedRegion,
ReferencedVolume,
ROIMeasurements,
SourceImageForRegion,
SourceImageForSegmentation,
SubjectContext,
TrackingIdentifier,
VolumetricROIMeasurementsAndQualitativeEvaluations,
)
from pydicom.sr.value_types import CodedConcept, GraphicTypes
if __name__ == '__main__':
ref_filename = 'pydicom/data/test_files/CT_small.dcm'
ref_dataset = dcmread(ref_filename)
observer_person_context = ObserverContext(
observer_type=CodedConcept(
value="121006",
meaning="Person",
scheme_designator="DCM"
),
observer_identifying_attributes=PersonObserverIdentifyingAttributes(
name='Foo'
)
)
observer_device_context = ObserverContext(
observer_type=CodedConcept(
value="121007",
meaning="Device",
scheme_designator="DCM"
),
observer_identifying_attributes=DeviceObserverIdentifyingAttributes(
uid=generate_uid()
)
)
observation_context = ObservationContext(
observer_person_context=observer_person_context,
observer_device_context=observer_device_context,
)
region_reference = ReferencedRegion(
graphic_type=GraphicTypes.CIRCLE,
graphic_data=((58.0, 52.0), (58.0, 41.0)),
source_image=SourceImageForRegion(
sop_class_uid=ref_dataset.SOPClassUID,
sop_instance_uid=ref_dataset.SOPInstanceUID
)
)
finding_sites = [
FindingSite(
anatomic_location=CodedConcept(
value="T-D00F7",
meaning="Cervico-thoracic spine",
scheme_designator="SRT"
),
topographical_modifier=CodedConcept(
value="T-11531",
meaning="Vertebral foramen",
scheme_designator="SRT"
)
),
]
measurements = [
Measurement(
name=CodedConcept(
value="G-A16A",
meaning="Area of defined region",
scheme_designator="SRT"
),
tracking_identifier=TrackingIdentifier(uid=generate_uid()),
value=1.7,
unit=CodedConcept(
value="mm2",
meaning="square millimeter",
scheme_designator="UCUM"
),
properties=MeasurementProperties(
normality=CodedConcept(
value="G-A460",
meaning="Normal",
scheme_designator="SRT"
),
level_of_significance=CodedConcept(
value="R-00345",
meaning="Not significant",
scheme_designator="SRT"
)
)
)
]
region_measurements = ROIMeasurements(
measurements=measurements,
finding_sites=finding_sites
)
imaging_measurements = PlanarROIMeasurementsAndQualitativeEvaluations(
tracking_identifier=TrackingIdentifier(
uid=generate_uid(),
identifier='Planar ROI Measurements'
),
referenced_region=region_reference,
finding_type=CodedConcept(
value="T-A7010",
meaning="Spinal cord",
scheme_designator="SRT"
),
measurements=region_measurements
)
measurement_report = MeasurementReport(
observation_context=observation_context,
procedure_reported=CodedConcept(
value="25045-6",
meaning="CT unspecified body region",
scheme_designator="LN"
),
imaging_measurements=imaging_measurements
)
document = Comprehensive3DSR(
evidence=[ref_dataset],
content=measurement_report,
series_instance_uid=generate_uid(),
series_number=1,
series_description='Measurement Reports',
sop_instance_uid=generate_uid(),
instance_number=1,
institution_name='Institution',
institution_department_name='Institution Department',
manufacturer='Manufacturer'
)
document_filename = '/tmp/{}.dcm'.format(document.SOPInstanceUID)
document.save_as(document_filename) Under the hood, we could still use context group definitions to check value set constraints for coded content items, i.e. assert that the values of What do think @pieper @fedorov @darcymason? |
I've been travelling again, just getting back to following up on this discussion...
I only showed that first example to answer your point about invalid python identifiers (starting with a number). My preferred option for that case would be the leading underscore that I also showed could be used. I certainly agree to forget about codes['Person'] as an option, and just discuss whether codes.Person, or codes.dcm.Person, or codes.cidXX.Person are options. I 'm sad that you seem to be backing off from the latter uses. Pydicom is a programmers' toolkit, not an end-user program and it has to be the programmers' responsibility to adhere to the DICOM standard (or not). IMO the main value of an SR subpackage (as others have pointed out) is to abstract away a lot of the boilerplate and make the code easier to create, more readable, and more maintainable. I don't think that is well served by someone having to look up "G-A16A" etc, any more than they should use dataset[0x300A0020]. Sorry if I've belabored that point; its tough to convey nuances in these kinds of discussion. Perhaps the best thing is to have some code to specifically discuss. I've copied your generating code and modified it to produce the kind of intermediary dictionaries I was talking about. I haven't pushed into someone else's PR before, but I believe that it is possible and we can always unwind anything with a later push. If that doesn't work I'll post it in some other way and point to it. My current version creates two files, one about 1.5 MB, and another of 0.8 MB. Compare this with pydicom's _dicom_dict.py at a little under 0.5 MB, so I don't these are too bad. And the pyc byte-code-compiled files load in ~< 10 microseconds on my machine. I'm just going to make a last couple of edits, then I will post something for discussion. And I'll try to re-read some of the previous discussion and comment on any other issues that have been raised. |
Well, my changes are done, but I'm having trouble figuring out the push and I'll have to come back to this after work. @hackermd, it may work if you give permission to push to your fork. But this is new to me. If you would rather not do that, I will push to a new branch. But here are some examples of the output on my local branch:
and autocompletion works for a first-level object. E.g.
in ipython shows Person, as well as PersianCat, etc. |
@darcymason The examples look really nice. Excited to see the code. I added you as a collaborator to the fork. You should have received an invite and should be able to push to the repository. |
If we follow through on the logic of using names instead of numeric identifiers, we may also want to consider using |
Sounds good. I'll look into that too. |
Use name consistent with corresponding Information Entity (IE)
Yes, these classes should implement different Information Entities (IE), which are associated with different modules and thus require different methods/properties:
Shall we take this on in this PR or in a separate one? |
Good point. I agree, we should expand upon tests. It's probably best we add the proposed tests now. |
@hackermd, I left this hanging for a while, but am looking at it again, you should hear more from me later today or tomorrow. |
@darcymason thanks for looking into it |
Just an update ... still mulling over the ImageStorage classes etc., coming around to the idea to some extent but trying to think through if there are alternatives. This could lead to a pretty fundamental conceptual framework for pydicom so want to think it over carefully. |
Does this functionality have an expected target release version allocated? |
I'm hoping still to get this out in the upcoming release this month, (not as official, but provisional). I'm still struggling (apologies @hackermd) to get enough time to decide on the subclasses of Dataset and whether to go that route or something a little different. Actually, @mrbean-bremen, @scaramallion I would appreciate any thoughts you might have about that, if you can dig into the code a little for that part. |
Specifically here, talking about the classes in the media_storage.py file. In this PR, these placeholder classes are created in filereader.py based on the MediaStorageSOPClassUID. |
@darcymason: I have largely ignored this PR so far - there are lots of people with more SR experience here - so it will take some time for me to get up to date. I will see if can get to it over the weekend - having a preview version in the next release would indeed be nice. No promises, though... |
I'll try and go over it in the next week. |
@@ -0,0 +1,57 @@ | |||
from collections import namedtuple | |||
|
|||
from pydicom.dataset import Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import
How would you suggest implementing the functionality of exposing different methods depending on the SOP Class? |
I can see some advantages in creating something like a BaseDataset class which would contain the dict-like behaviour and reading/writing functionality we expect and then having subclasses (DICOMDIR, Image Storage, SR, etc) which would actually implement specific methods for the type of dataset (such as But on the other hand, increasing the complexity of the user interface by having methods that appear for one SOP Class and not another seems counterproductive. Functionality tends to get added and rarely removed so it would only get worse over time. It would also increase the maintenance load because changes to the base class would have to be tested across multiple subclass types. And class Dataset():
def pixel_array(self):
if self.SOPClassUID.name.endswith("Image Storage"):
# do stuff
else:
raise SomeException("DICOM says no") Eh, I don't really know about this one. I think keeping the single large Dataset implementation will probably end up being less work and easier to maintain (and use!), but my OCD wants everything split up. |
We could just add additional methods/properties, which would map to particular Attributes (e.g., |
I've been considering the concept of "handlers" registered to Dataset again, and by complete coincidence, I just happened to come across a solution that pandas uses: registering custom accessors. As they use it, it would lead to something like: ds.sr.some_property
ds.sr.some_function() I'm not sure how this would work with nested datasets (i.e. sequences)... in this form it seems it would have the accessor named at each level - cumbersome particularly with SR. So perhaps it makes more sense to register a handler for certain tags/keywords or tag group numbers, or the SOPClassUID; a handler like that would automatically flow through to all nested datasets. I haven't looked in detail yet, but in effect the pandas extender is creating a class which wraps the object (a dataset in our case), but then also makes it accessible through the standard class using a decorator. Needs some more thought, but it looks interesting. It would at least provide a consistent method for different developers to add custom dataset behavior. And, as in the pandas example, a validator call in |
I'm going to take a crack at this over the next few days and see what I can come up with. |
Just a quick note: I dug through all the comments (that took some time...) and the code changes over the weekend, and I think the writing part is ready to merge as a preliminary version - it looks quite good to me (though I have used SR very little, as already stated, so I can't really judge the usabilty too much). The test coverage is not bad (there are some tests missing for error handling, but these can be added later). We just have to state in the release notes that this is not stable yet, same as with the JSON stuff, so that we may get some early feedback. |
Okay, here's the plan ... I'll merge this in shortly, then open a new issue with changes to make prior to 1.4 release. A new issue and PRs will let us focus on a few things at a time. I do think we should drop the subclassing of Dataset until actually needed. We also need some documentation of what is in place so far. I will set out a list in the new issue. Thanks @hackermd and everyone for this very good discussion and code. And apologies again for letting this slide for so long. We will have something "alpha" out with v1.4, and we can see how we can go forward with it. |
This pull request is intended to facilitate the creation of DICOM Structured Reports (SR).
It adds the
sr
package, which implementspydicom.sr.templates
pydicom.sr.value_types
pydicom.sr.context_groups
A few things still need to be completed. However, I was hoping to already get your feedback on the approach in general and whether you would be interested in including this code into the
pydicom
package.I suggest @pieper, @fedorov, @seandoyle and @dclunie as reviewers. We had several discussions on how to implement SR at the 2019 NA-MIC project week and beyond.
Below is an example for creating a SR document using
pydicom.sr
(the resulting DICOM PS3.10 file is also included in the repository:pydicom/data/test_files/SR_comprehensive3d.dcm
: