Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetaElementGroupLength not found #53

Open
thavlik opened this issue Jan 1, 2020 · 8 comments
Open

MetaElementGroupLength not found #53

thavlik opened this issue Jan 1, 2020 · 8 comments
Labels
enhancement New feature or request

Comments

@thavlik
Copy link

thavlik commented Jan 1, 2020

The dicoms provided by https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection are not standardized to have MetaElementGroupLength come first. The pydicom library, which the dataset was probably intended for use with, accounts for this by reading until the first non-0x0002 tag then rewinding.

Example error output with the RSNA dataset (error is returned from dicom.NewParserFromBytes()):

2020/01/01 21:39:41 error processing stage_2_train/ID_014d9a502.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014ddb831.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014dfa44a.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014e0a593.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014e24cfc.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
@suyashkumar
Copy link
Owner

Thanks for raising this @thavlik, I think it makes sense for us to make a fix here to account for DICOMs like this!

I'll look deeper into this either this weekend or later next week when I'm back from vacation. The solution you indicated that PyDicom does seem workable, but it would be great to avoid rewinding if possible.

I think that we might be able to get by without rewinding (which would be awesome) so long as the correct MetaElementGroupLength exists somewhere before the end of the metadata tags (and if we keep track of metadata bytes read so far)--will look into this a bit more closely when I'm back from vacation!

We'll want to implement a fix for the current version and also the s/1.0-rewrite rewrite/refactor that I'm working on to update/fix many things from upstream branches.

Are you able to share an example DICOM?

Thank you so much for raising this!

@thavlik
Copy link
Author

thavlik commented Jan 3, 2020

I think that we might be able to get by without rewinding (which would be awesome) so long as the correct MetaElementGroupLength exists somewhere before the end of the metadata tags (and if we keep track of metadata bytes read so far)

That's a great idea! I'm unable to confirm the eventual existence of MetaElementGroupLength for each item, so it might be missing entirely. Maybe they are present - I haven't checked.

Here is an example DICOM:
ID_0015ba0f2.zip

EDIT: can confirm ID_0015ba0f2.dcm only has:
(0x0002, 0x0002)
(0x0002, 0x0003)
(0x0002, 0x0010)
(0x0002, 0x0012)
(0x0002, 0x0013)

@suyashkumar suyashkumar added the enhancement New feature or request label Jan 6, 2020
@suyashkumar
Copy link
Owner

@thavlik Thank you for the investigation and the sample dicom! To support unstandardized dicoms like this, it appears that we may indeed need to investigate a rewinding mechanism, or some sort of peek ahead capability (maybe a small buffer to allow something like this).

I think it will be a couple weeks before I have bandwidth to really dig into this. PRs are welcome so let me know if you might be interested in proposing a fix (no worries if not).

Will keep this issue updated as I revisit and in mind for the 1.0 rewrite too!

@thavlik
Copy link
Author

thavlik commented Jan 6, 2020

One possibility is to read until the first non-metadata tag (this much is unavoidable) and then store a pointer to said tag within internal state. Parsing the body then entails considering this initialTag if it is non-nil, and subsequently setting it to nil - resuming current behavior. Kinda hacky but it's all I've got.

@thavlik
Copy link
Author

thavlik commented May 23, 2020

@suyashkumar I'm looking to implement this fix now.

@thavlik
Copy link
Author

thavlik commented May 23, 2020

I added an example DCM and made some changes at thavlik@15f3120.

The tests do not pass. Currently digging into:

$ go test -v -run TestAllFiles
=== RUN   TestAllFiles
    TestAllFiles: parse_test.go:35: Reading CT-MONO2-16-ort.dcm
2020/05/23 18:23:50 Encountered odd length (vl=71042085) when reading implicit VR 'UN' for tag (042f,044f)[private] (file offset 328068)
--- FAIL: TestAllFiles (0.06s)
panic: Encountered odd length (vl=71042085) when reading implicit VR 'UN' for tag (042f,044f)[private] (file offset 328068) [recovered]
        panic: Encountered odd length (vl=71042085) when reading implicit VR 'UN' for tag (042f,044f)[private] (file offset 328068)

goroutine 19 [running]:
testing.tRunner.func1.1(0x7e3040, 0xc000811c90)
        /usr/local/go/src/testing/testing.go:941 +0x3d0
testing.tRunner.func1(0xc0002d8480)
        /usr/local/go/src/testing/testing.go:944 +0x3f9
panic(0x7e3040, 0xc000811c90)
        /usr/local/go/src/runtime/panic.go:967 +0x15d
log.Panic(0xc000307ec8, 0x1, 0x1)
        /usr/local/go/src/log/log.go:351 +0xac
github.com/suyashkumar/dicom_test.mustReadFile(0xc0001fe200, 0x1c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x456de9)
        /mnt/c/Users/tlhavlik/go/src/github.com/suyashkumar/dicom/parse_test.go:24 +0x12f
github.com/suyashkumar/dicom_test.TestAllFiles(0xc0002d8480)
        /mnt/c/Users/tlhavlik/go/src/github.com/suyashkumar/dicom/parse_test.go:36 +0x22d
testing.tRunner(0xc0002d8480, 0x8923e0)
        /usr/local/go/src/testing/testing.go:992 +0xdc
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:1043 +0x357
exit status 2
FAIL    github.com/suyashkumar/dicom    0.086s

@suyashkumar
Copy link
Owner

Hey @thavlik thanks for taking a look at this! I'll need to look a little closer at the test failure on your branch to help debug.

In terms of the general approach, off the top of my head I think it might be sufficient to read until the first non group 0x0002 group tag and then proceeding with typical parsing after that. I'll have to double check though if we need to enforce that certain metadata tags be present. To make the interface a little less hacky, we can have parseFileHeader return two []*element.Element slices--one containing the metadata tags, and one containing the extra element we read, to start off the elements to be read downstream. It'll still be a little hacky, but maybe an okay place to start.

Will need to double check that there isn't anything else special we need to consider when reading that extra non-metadata tag (I think it should be okay, but we need to look up and see if metadata tags have any interaction with reading some of the downstream non-metadata tags and how we might want to deal with it if so)

@thavlik
Copy link
Author

thavlik commented May 24, 2020

My branch is, functionally, doing exactly this. It's unclear why the rest of the file fails to parse as normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants