Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SkipPixelData option to Parser #243

Closed

Conversation

tcheever
Copy link

@tcheever tcheever commented Jun 3, 2022

  • For heavy imaging modalities, it is expensive to load the pixel
    data into memory, and for many use cases, unnecessary. This new
    option allows users to parse the other attributes without the
    resource hit.

Testing

$ go test -v -run TestNewParser ./...
=== RUN   TestNewParserSkipMetadataReadOnNewParserInit
--- PASS: TestNewParserSkipMetadataReadOnNewParserInit (0.00s)
=== RUN   TestNewParserSkipPixelData
--- PASS: TestNewParserSkipPixelData (0.01s)
=== RUN   TestNewParserPixelData
--- PASS: TestNewParserPixelData (0.62s)
PASS
ok      github.com/suyashkumar/dicom    0.677s
?       github.com/suyashkumar/dicom/cmd/dicomutil      [no test files]
?       github.com/suyashkumar/dicom/mocks/pkg/dicomio  [no test files]
?       github.com/suyashkumar/dicom/pkg/charset        [no test files]
testing: warning: no tests to run
PASS
ok      github.com/suyashkumar/dicom/pkg/dcmtime        0.011s [no tests to run]
?       github.com/suyashkumar/dicom/pkg/debug  [no test files]
?       github.com/suyashkumar/dicom/pkg/dicomio        [no test files]
testing: warning: no tests to run
PASS
ok      github.com/suyashkumar/dicom/pkg/frame  0.003s [no tests to run]
testing: warning: no tests to run
PASS
ok      github.com/suyashkumar/dicom/pkg/personname     0.017s [no tests to run]
testing: warning: no tests to run
PASS
ok      github.com/suyashkumar/dicom/pkg/tag    0.007s [no tests to run]
?       github.com/suyashkumar/dicom/pkg/uid    [no test files]
?       github.com/suyashkumar/dicom/pkg/vrraw  [no test files]

 - For heavy imaging modalities, it is expensive to load the pixel
   data into memory, and for many use cases, unnecessary. This new
   option allows users to parse the other attributes without the
   resource hit.
@tcheever
Copy link
Author

tcheever commented Jun 3, 2022

Formal commit message aside, thank you for creating this library 😃. It has been very useful!

@suyashkumar suyashkumar self-requested a review August 1, 2022 01:06
@suyashkumar suyashkumar self-assigned this Aug 1, 2022
@suyashkumar suyashkumar added the enhancement New feature or request label Aug 1, 2022
Copy link
Owner

@suyashkumar suyashkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Overall makes sense to have a feature like this (we can link it to the open feature request #162). Had a couple thoughts below for you to consider. There also appear to be some merge conflicts with the latest code on main--if you could resolve those in your latest changes that'd be great (but no worries I can do it later too if needed).

Thanks!

Comment on lines +74 to +75
// SetSkipPixelData sets whether or not the reader will skip pixel data
SetSkipPixelData(bool)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we omit this setter and just add this as an option to NewReader? Does it ever need to change after we init the reader?

Copy link
Author

@tcheever tcheever Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we [...] just add this as an option to NewReader?

Are you suggesting another arg? Would it be optional/variable to maintain backwards-compatibility?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above, largely I don't think this belongs here in the first place. However, if we do keep it in the reader then I don't think we need a setter. We should not worry about backwards compatibility in this package (in /pkg) because it is intended for internal use. It will not go live until we tag a new version with this, and such a change will be included in the changelog. We can further consider using the newer go modules internal/ package namspacing to help with enforcing internal-only use, but by keeping it in pkg it allows those who really want to use it to have access without the compiler complaining. We can clarify this in the package comment as well.

Copy link
Author

@tcheever tcheever Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I should've better clarified my question. I was asking if you're suggesting another arg in the NewReader() constructor in place of the setter. My concern with that was just the backwards incompat change to the constructor, but if you don't mind that, I'll make that change.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, understood, though I think my point in the thread above is I'm not sure if this arg should exist in the dicomio.Reader at all (thus probably doesn't need to be in NewReader either). Please let me know if I'm missing anything. What do you think?

Comment on lines +72 to +73
// SkipPixelData returns whether or not the reader will skip pixel data
SkipPixelData() bool
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I like the model of adding this option to the dicomio.Reader for the sole purpose of passing it along to where it's needed later in read.go (e.g the dicomio.Reader is just serving to carry this option to readNativeFrames--the dicomio.Reader itself is not concerned with whether SkipPixelData is true or not).

I'll need to think about a better way of doing this (we can leave that to a later refactor), but I wonder if for now we should just pass a skipPixelData bool to readElement from the parsing logic in parse.go and pass it through to readPixelData.

WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I like to avoid constructors that have a long list of fixed position arguments, but I'm happy to remove the setter and add a fourth parameter to the constructor. In the beginning, I probably would've created a Config or Opts struct for the constructor that could be extended over time without adding more arguments to the constructor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although as I'm making the requested change now, I realize this is a backwards-incompat change to NewReader. You probably don't want that? That's another advantage of having an extensible struct arg for the constructor.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my more important point is that I don't think this value belongs in dicomio.Reader in the first place, largely because it has nothing to do with any logic the dicomio.Reader is currently responsible for if I understand correctly. e.g my comment:

I wonder if for now we should just pass a skipPixelData bool to readElement from the parsing logic in parse.go and pass it through to readPixelData.

Longer term, it may be better to have something like a readOptions variadic parameters, or just pass along the ParseOption set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought your first comment suggested I add a new arg to NewReader()?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it makes sense for it to be an option in dicomio.Reader because it determines how the reader reads the DICOM data.

I don't think this is true, at least of the dicomio.Reader (which is my whole issue with this approach). The job of the dicomio.Reader is to perform low-level binary read options over the dicom data. The option you've passed into the dicomio.Reader is not used anywhere inside the dicomio.Reader struct or methods.

The option is used in the read* methods, which are distinct and seperate from the dicomio.Reader struct itself. The dicomio.Reader struct is used by the read* methods in read.go for sure.

I guess what I don't like about this is that it seems we are tagging on an extra property to dicomio.Reader (that the dicomio.Reader itself doesn't use/care about) just because the dicomio.Reader struct is passed around to some of the read* methods elsewhere so it's convinient.

Instead, what I think should be done is the read* methods should directly take a set of options, maybe in the form of a specific configuration struct (whose sole role it is to hold a collection of config options) or maybe a set of options similar to ParseOption... if that makes sense. I'd rather not collect these things into dicomio.Reader and just use dicomio.Reader to "hold onto" stuff we want to pass around elsewhere.

What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there's an interesting approach in #245. There, the whole Parse Option Set is added to the Dataset. At first it seems a bit similar to this, but what I like about it is that anyone who holds a Dataset struct can quickly see what ParseOptions it was constructed with. This seems useful in of itself as a property of the output Dataset.

It happens to also be useful for passing around the option set to the read* methods, though I still feel something is a bit odd with it. But maybe that's an okay way to go for now, and I'll take another pass at some refactors to option passing later if needed.

WDYT about doing a similar thing here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The dataset should just be the dataset. Parsing options should be the concern of the parser, not the underlying dataset.
  2. Read options should be the concern of the reader.
  3. The reader interface should offer and implement the read function via receivers, much like the various reader interfaces in the stdlib.
  4. When you combine 2 & 3, it all comes together nicely, I think.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataset should just be the dataset. Parsing options should be the concern of the parser, not the underlying dataset.

That is the ethos behind my objection in this comment on this PR, however there is one very very appealing feature in the case of the Dataset. The Dataset is an output, and any caller code handling the Dataset may find it useful to know what parameters (options) it was generated with. But like I said, other than that I still don't love it and would like a different long term solution along the lines of my first comment in this thread.

The reader interface should offer and implement the read function via receivers, much like the various reader interfaces in the stdlib.

I think there's still some conflation of multiple things going on. There's the dicomio.Reader struct, which is responsible for reading dicom primitives from the underlying byte stream. Then there's the separate code in read.go that uses this primitive reader to do much more complicated things (this "primitive" is similar to the "primitive" io.Reader that allows one to read bytes). The primitive reader has no need for nor any concern for the SkipPixelData option you added here. In this PR, it's attached to the dicomio.Reader for the sole purpose of being "passed around" to other functions. This is my concern.

The read.go functions do way more than what a primitive reader might (and note they're unexported). Semantics of "read" and "parse" aside this is what happens today (one may say the functions in read.go are doing "parsing" but imo it's a bit immaterial to this discussion. maybe one day we'll refactor these internal names, but not critical).

As for a short term solution, like I indicated in my prior comment #243 (comment), simply passing through a set of options (maybe in the form of a config struct) to functions in the read.go file seems acceptable to me.

Whether we should attach the read.go functions into a struct to me isn't a big deal either way. It's all an internal impl detail since they're unexported, and the only upside is saving maybe 1-2 parameters in some functions. for now, I'd prefer just adding a config struct that holds onto some of these options.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read.go functions do way more than what a primitive reader might (and note they're unexported). Semantics of "read" and "parse" aside this is what happens today (one may say the functions in read.go are doing "parsing" but imo it's a bit immaterial to this discussion. maybe one day we'll refactor these internal names, but not critical).

I'd be a proponent of combining the parser and reader into a single interface to simplify everything here. From there, the interface could demand that a Read() function be implemented. The implementer then has free reign on how to read/parse any of the DICOM data, primitives and all. The library could offer a default reader implementation via receiver functions for folks that don't want to create their own reader (probably the large majority of library users).

Having said all of that, in the interest of avoiding massive scope creep for such a small addition, I can move forward with what you prefer. I respect that this is your library.

@@ -166,6 +169,20 @@ func readPixelData(r dicomio.Reader, t tag.Tag, vr string, vl uint32, d *Dataset
return nil, errors.New("the Dataset context cannot be nil in order to read Native PixelData")
}

if r.SkipPixelData() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test for this in read_test.go? It can be a similar pattern to TestReadNativeFrames, just that with the option set we expect there to be no data parsed.

See my comment below, but perhaps we can just test at the readPixelData level (as this behavior should be the same for native and encapsulated pixeldata)

Link:

func TestReadNativeFrames(t *testing.T) {

Comment on lines +174 to +184
return &pixelDataValue{
PixelDataInfo: PixelDataInfo{
Frames: []frame.Frame{
{
Encapsulated: false,
NativeData: frame.NativeFrame{},
},
},
},
}, nil
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider, should we still emit an empty PixelData element? I suppose it may still be useful to know the PixelData element did exist. In this case, it may also be useful to have a marker inside PixelDataInfo that indicates that pixel data was intentionally not parsed.

Consider adding PixelDataInfo.DataIntentionallyNotParsed bool or something.

You may want to handle this behavior directly in readPixelData instead of in readRawItem and readNativePixelData. Feels a bit odd to have readRawItem concerned with this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually knowing the PixelData element did exist is very important in my case. I just don't care about the content.
Thank you both for creating the library and adding this option to optimize performance. In my case 97% of the memory that my application is using stems from the Pixel Data that I don't need.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suyashkumar these changes are already in readPixelData?

@suyashkumar suyashkumar linked an issue Aug 1, 2022 that may be closed by this pull request
@suyashkumar
Copy link
Owner

Hi @tcheever just wanted to check in and see if you had a chance to consider the review comments? Thanks for the contribution! There may be some interplay with #245 as well as a heads up based on how these PRs progress.

@tcheever
Copy link
Author

Hi @suyashkumar, thanks for the bump and the review. I did not see your comments and suggestions. I will get back to you as soon as possible.

@suyashkumar
Copy link
Owner

Hi @tcheever thanks for the responses, added a couple more comments for your consideration!

@tcheever
Copy link
Author

@suyashkumar, if I'm understanding correctly, you do not want the option in the reader at all and you do not want the arg added to NewReader?

@NMerzVerily
Copy link

Having an option like this would also be useful for us.

@rronan
Copy link

rronan commented Feb 6, 2023

We are also interested in that option, I'd be happy to help if needed.

@suyashkumar
Copy link
Owner

suyashkumar commented Feb 20, 2023

Thanks folks for the interest! I've circled back to this in #255, including making structural changes I wanted in #254 to support better option availability to read* methods in read.go (which was the subject of my comments in the review discussion here above).

Thanks @tcheever and others for the discussion and interest in this useful feature!

suyashkumar added a commit that referenced this pull request Feb 20, 2023
This introduces an option to SkipPixelData processing, making use of the new reader struct introduced in #254.

See also, review discussion in #243 that helped motivate the changes I wanted to make in #254.
@suyashkumar
Copy link
Owner

Going to close this for now, since this feature was introduced in #255 with the new options availability (merged to main)! @NMerzVerily @rronan @b3n3d17 @tcheever please feel free to kick the tires on it and let us know how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to not read PixelData
5 participants