Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FellowOakDicom 5.0.1 - Invalid DICOM files are produced when the initial DicomFile is opened from a stream #1264

Closed
amoerie opened this issue Nov 15, 2021 · 18 comments · Fixed by #1266
Labels

Comments

@amoerie
Copy link
Collaborator

amoerie commented Nov 15, 2021

Describe the bug
DICOM file is corrupt/unreadable when DicomFile.Save is called after creating that DicomFile from a stream

To Reproduce
DicomFile.Open(a stream)
DicomFile.Save(a path)
Trying to open the resulting DICOM file fails

Expected behavior
The DICOM file should be valid

Environment
Fellow Oak DICOM version: 5.0.1

@amoerie amoerie added the new label Nov 15, 2021
@amoerie
Copy link
Collaborator Author

amoerie commented Nov 15, 2021

Already fixed by this commit: b9bdd43

If urgent (I consider this a pretty bad bug), we can cherry pick and create a new release.
If not urgent, we can wait until the memory improvements PR is merged and just release then.

@amoerie
Copy link
Collaborator Author

amoerie commented Nov 15, 2021

@gofal This is instant karma for wanting to have quick release cycles :-)

@amoerie amoerie added bug and removed new labels Nov 15, 2021
@amoerie amoerie changed the title Invalid DICOM files are produced when the initial DicomFile is opened from a stream FellowOakDicom 5.0.1 - Invalid DICOM files are produced when the initial DicomFile is opened from a stream Nov 15, 2021
@mrbean-bremen
Copy link
Collaborator

As I understand, the memory improvement is almost done, so it may be ok to wait.

@gofal
Copy link
Contributor

gofal commented Nov 15, 2021

ok this must be super urgent. a storescp always receives images via stream and then stores to file. then all the files received via storescp are invalid ??
I have to mark this release as "critical bug" on nuget, don't i have to ?

@mrbean-bremen
Copy link
Collaborator

@gofal - If that were true, a lot of the tests would fail, wouldn't they?

@mrbean-bremen
Copy link
Collaborator

Or maybe the files are correctly received, but not checked afterwards in the tests... in that case it is indeed super-urgent.

@mrbean-bremen
Copy link
Collaborator

mrbean-bremen commented Nov 15, 2021

I justed extended the test for the stream I used:

        [Theory]
        [InlineData("test_SR.dcm")]
        ... more data
        public void Open_FileFromStream_YieldsValidDicomFile(string fileName)
        {
            var file = DicomFile.Open(TestData.Resolve(fileName));
            var stream = new MemoryStream();
            file.Save(stream);
            stream.Seek(0, SeekOrigin.Begin);
            var openFile = DicomFile.Open(stream);
            Assert.True(new DicomDatasetComparer().Equals(file.Dataset, openFile.Dataset));
            var newFilename = Path.GetTempFileName();
            openFile.Save(newFilename);
            openFile = DicomFile.Open(newFilename);
            Assert.True(new DicomDatasetComparer().Equals(file.Dataset, openFile.Dataset));
            File.Delete(newFilename);
        }

e.g. create a stream from a file, open the stream and save it back to the file. The tests all pass, maybe I'm doing something wrong to reproduce this? Or is a MemoryStream not the stream that is problematic?

@amoerie
Copy link
Collaborator Author

amoerie commented Nov 15, 2021

ok this must be super urgent. a storescp always receives images via stream and then stores to file. then all the files received via storescp are invalid ??
I have to mark this release as "critical bug" on nuget, don't i have to ?

I don't think it is that critical. The root issue is that StreamByteBuffer does not take into account the Size property, so it will write whatever data the stream has to offer. In our scenario, we were using Memory streams that contained more than one file.

This resulted in DICOM files with a corrupted ending.

I'll check tomorrow if I can determine the real severity, but I think the general impact will be limited. However, being conservative, I think it might be good idea to prepare a 5.0.2 release anyway so the impact is minimal.

@mrbean-bremen
Copy link
Collaborator

However, being conservative, I think it might be good idea to prepare a 5.0.2 release anyway so the impact is minimal.

I agree. Your memory optimizations have at least the potential to break something (though I'm sure they don't), so they may wait a bit longer.

@gofal
Copy link
Contributor

gofal commented Nov 15, 2021

agree. I anyway marked the nugetpackage 5.0.1 as "critical bug", because a version that creates some invalid files (even only under certain circumstances) is not acceptable. having some runtime bugs, or issues when reading a file are annoying, but they do not lead to data loss.

@gofal
Copy link
Contributor

gofal commented Nov 15, 2021

i will review and test the memory optimization PR

@amoerie
Copy link
Collaborator Author

amoerie commented Nov 15, 2021

There's two remaining issues that I'm aware of:

  • the temp file does not exist in the TempFileBufferTests in .net framework and i still can't figure out why.
  • sometimes, when a DicomService is disposed, _writeStream.Dispose crashes because it tries to flush to the underlying NetworkStream which is sometimes already disposed. I think we can catch and ignore this error.

@gofal
Copy link
Contributor

gofal commented Nov 16, 2021

Just wanted to mention some personal thoughts:

When talking about "Optimization" then many people only think about reducing the number of cpu-cycles or bytes allocated in memory. But that may never be the main driver. it is good to reduce unnecessary memory consumption and to get rid of useless calculations. but the main focus should be to optimize the source code, not the machine code. Meaning: Stability of code, Readability of the source, Separation of concerns even if then some data copying is done extra, etc are at leas as important as having code that executes fast and with low memory consumptions.

When I see you are doing optimizations and show the bytes you saved, but on the other hand now we have a data-loss-version in nuget, you are talking about weird failing tests where you have to dig in to find the reason, so the whole is now unstable, then I start myself if it is worth.
The first "obvious" and simple optimizations like using rented arrays from a arraypool, or coping data directly from stream and not buffering in extra streams, are effective and did not bring instability.
But then when going further and further it seemed to be too much focus on runtime optimization and forgot about source code optimization.

I now start reviewing the code and maybe I will find some bug or typo. But i feel little bit unconfortable with a PR that is dealing with the very inner hard of the code and where you admit that there are some not-easy-to-find issues and some issues that you are aware of. And all this without any need like changing the architecture to provide more features, or some performance problems have been reported.
If we dropped the PR about adding a value to a VM>1 element with the reason that the problems we are introducing are not worth the use, while this is was a very low-risky code, then i have to say the same argument should apply here where we are talking about a cruical part of the whole library.

@amoerie are you absolute sure, that the current state of the PR is completely stable and there are no issues introduced with this PR? If you cannot answer this question with "yes", then lets drop the PR and do a new one with the first arraypool optimizations that seemd to be effective and stable.

@amoerie
Copy link
Collaborator Author

amoerie commented Nov 16, 2021

You are right @gofal in playing the safety card here. I am already in the process of making a separate pull request that demonstrates ONLY the byte issue we have.

I'll finish that first and then we can see about splitting up the memory optimizations PR into smaller chunks that are more easily reviewable.

@gofal
Copy link
Contributor

gofal commented Nov 16, 2021

thanks for your understanding.

@amoerie
Copy link
Collaborator Author

amoerie commented Nov 16, 2021

No worries, we're all here to help and make a positive impact. As I like to say, I think I am not a difficult person. :-)

@gofal
Copy link
Contributor

gofal commented Nov 16, 2021

so now I see, this PR is mainly fixing the issues, that have been introduced in #1258 (rented arrays from arraypool may be larger than the requested length, removing deprecated method). So do you think the other still open issues like the tempfile issue or the disposed dicomservice have also been introduced there?
So should we consider, reverting that 1285 PR and creating a new with less memory optimiations?

@amoerie
Copy link
Collaborator Author

amoerie commented Nov 16, 2021

No I think #1258 is actually very conservative and I'm comfortable with releasing it. In fact, the bug has nothing to do with the arraypool story at all.

I just implemented the StreamByteBuffer.CopyToStream(Async) method poorly. This bug could have occurred without ArrayPools at all. I also wrongly assumed that this class would be covered better by the existing tests, but it looks like I was wrong. The new GH1264 test adds coverage there now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants