Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a custom seqinfo to extract from DICOMs any additional metadata desired for a heuristic #581

Merged
merged 18 commits into from
Feb 28, 2024

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Jul 19, 2022

Getting back to #333 (attn @bpinsard ) and recent #580 (attn @kcho) decided may be to add to the army of custom ad-hoc callbacks etc heuristics could provide. So, WDYT if heuristic could provide custom_seqinfo callable (crude example is below in PR) where any information could be extracted from already loaded nibabel's dicom wrapper or just a full list of files for that sequence. I think it would be a slightly better solution to #333 since would not always store potentially sensitive path in the extracted metadata record and potentially could operate on already loaded DICOM metadata thus making it faster to extract extra metadata.

@bpinsard @kcho -- would smth like that work for your cases?

TODOs

  • replacecompliment hashing check with some more adequate, may be round trip check through json + verification that no \t is present. if \t -- kaboom. Add to documentation that \t must be sanitized, or add such sanitization (and document it)
  • add basic test. Since convertall already got that custom field filled, test should just check if it is there.

@yarikoptic yarikoptic requested a review from pvelasco July 19, 2022 20:13
@bpinsard
Copy link
Contributor

That's way better than what I had PR-ed years back and should perfectly fit our needs.
Thanks!

@kcho
Copy link

kcho commented Jul 20, 2022

This will also work for me. Thanks for the PR!

@bpinsard
Copy link
Contributor

bpinsard commented Oct 4, 2022

I finally tested the PR. I think that group_dicoms_into_seqinfos call in parser.py should also be given the custom_seqinfo for it to be accessible in infotoids. Apart from that it fit the needs.

@satra
Copy link
Member

satra commented Oct 4, 2022

@yarikoptic - perhaps the return from custom_seqinfo could be a dictionary to add multiple columns instead of a single column named custom. would allow more generalizability to fields.

i would suggest decoupling anonymity from this though, since any custom return could potentially return things that are identifiable. i would leave that part to heuristic developers.

@yarikoptic
Copy link
Member Author

@yarikoptic - perhaps the return from custom_seqinfo could be a dictionary to add multiple columns instead of a single column named custom. would allow more generalizability to fields.

ATM it indeed can be anything (e.g. that tuple in the convertall example. But indeed it would then be a single field custom, potentially not reloadable as a data structure in the round trip to that .tsv where it would be dumped -- I am yet to check what would happen/how it would work.

Indeed we could make it return a dict to formalize such extension, but I don't think we would be able to "dynamically" expand that seqinfo_fields list of fields defined at the module level https://github.com/nipy/heudiconv/pull/581/files#diff-2aa4d95adec423a8c619a45d9676cb573bfdcbdb53d84af3d815ec34aee99d75R27 . Pairing with aforementioned round trip we might though consider it to be limited to a data structure serializable through json and have round trip that way... (but because of ".tsv" might need special handling of tabs, uff)

i would suggest decoupling anonymity from this though, since any custom return could potentially return things that are identifiable. i would leave that part to heuristic developers.

as far as I see it -- it is "decoupled" in a sense "it doesn't care". What did you have in mind?

@codecov
Copy link

codecov bot commented Mar 3, 2023

Codecov Report

Attention: Patch coverage is 82.85714% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 82.04%. Comparing base (153d522) to head (41b22d7).

Files Patch % Lines
heudiconv/dicoms.py 72.72% 3 Missing ⚠️
heudiconv/heuristics/convertall_custom.py 75.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #581      +/-   ##
==========================================
- Coverage   82.05%   82.04%   -0.01%     
==========================================
  Files          41       42       +1     
  Lines        4173     4205      +32     
==========================================
+ Hits         3424     3450      +26     
- Misses        749      755       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

heudiconv/dicoms.py Outdated Show resolved Hide resolved
hash(custom_seqinfo_data)
except TypeError:
raise RuntimeError("Data returned by the heuristics custom_seqinfo is not hashable. "
"See https://heudiconv.readthedocs.io/en/latest/heuristics.html#custom_seqinfo for more details.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpinsard so WDYT about adding some check for roundtrip serialization into/from .tsv? a "weaker" alternative could be to test for presence of \t or \n symbols which might break it, or they wouldn't ?

apparently we did not add any test yet here, should have that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any part of heudiconv that might load the dicominfo*.tsv that get written in .heudiconv, in a second run with overwrite for instance?

For now it convert custom values to a str, eg. when using frozendict for hashable dict:
frozendict.frozendict({'patient_name': 'p02_ana012', 'pe_dir': 'ROW', 'pe_dir_pos': 1, 'body_part': 'BRAIN', 'scan_options': "['IR', 'WE']", 'image_comments': '', 'slice_orient': 'Sag', 'echo_number': 'None', 'rescale_slope': None, 'receive_coil': 'HC1-7;NC1,2', 'image_history': 'ChannelMixing:ND=true_CMM=1_CDM=1;ACCAlgo:9', 'ice_dims': 'X_1_1_1_1_1_4_1_1_1_1_1_35'})

So not necessarily easily unserializable depending on what the custom_seqinfo is returning.

@satra
Copy link
Member

satra commented Mar 30, 2023

the main intent is to give access to other fields in the dicom file as part of the heuristic. the only reason the tsv file existed is because that was produced by one of the early conversion tools we used, and acted as a cache between having to read all the dicom files and the heuristic. it is also the reason it had weird unused fields.

one could read a lot of dicom files very quickly now, much quicker than the original dicomconvert.py script :)

  • one could make the process completely dynamic and allow any custom fields beyond the ones hardcoded
  • allow for arbitrary expansion of the columns knowing that only the heuristic can use them.

you could add custom_ prefix to any column name if the custom extractor returned a dictionary of fields and values. this would also force the heuristic writer to use the custom_ prefix and provide round trip capabilities.

docs/heuristics.rst Outdated Show resolved Hide resolved
@bpinsard
Copy link
Contributor

Happy new year!
Is there a chance this will get merged?
What can I do to help finalize that feature that we have been using in production for a while now.
Thanks!

@yarikoptic
Copy link
Member Author

eh, we need to update (resolve conflicts) and follow original description to add tests

@yarikoptic yarikoptic changed the title Enh custom seqinfo Add support for a custom seqinfo to extract from DICOMs any additional metadata desired for a heuristic Feb 28, 2024
@yarikoptic yarikoptic added minor Increment the minor version when merged release Create a release when this pr is merged labels Feb 28, 2024
@yarikoptic yarikoptic merged commit 424dcdc into master Feb 28, 2024
15 checks passed
@yarikoptic yarikoptic deleted the enh-custom-seqinfo branch February 28, 2024 15:21
Copy link

🚀 PR was released in v1.1.0 🚀

Copy link

🚀 PR was released in v1.1.0 🚀

@bpinsard
Copy link
Contributor

Fantastic ! Many thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor Increment the minor version when merged release Create a release when this pr is merged released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants