Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting object file Reader will load content of even large blobs into memory #104

Closed
lafriks opened this issue Jun 19, 2020 · 12 comments
Closed
Labels
help wanted Extra attention is needed no-autoclose Issues/PRs to be ignored by stale bot performance

Comments

@lafriks
Copy link

lafriks commented Jun 19, 2020

When calling Reader method of object.File it will read all content into memory that can easily drain memory. Would be nice if there was option that it would return wrapper to FS reader limited to specific object pos & length

Ref:

obj, err := p.getNextMemoryObject(h)

@rgalanakis
Copy link

Was coming here to report the same issue... The blob Reader is a stream so it was disappointing to see that it loads the entire blob into memory.

@zeripath
Copy link
Contributor

in storage/filesystem/object.go I would guess it would be sensible to do a size check of the object that is being decoded and if it is greater than some reasonable limit the reader should not just read the object into memory but instead has a slightly more complex reader.

Now the question is how big should we reasonably be able to read in directly?

@rgalanakis
Copy link

My example use case was sniffing the first few hundred bytes to do content type sniffing, so anything more than that would be wasteful. If there needs to be a cutoff, it seems best to give callers some option to specify the limit. Why not always stream it/never load it into memory? (I can't remember the implementation of this code so I suspect I'm missing a reason)

@lafriks
Copy link
Author

lafriks commented Nov 12, 2020

@zeripath it could just return custom reader implementation that is limited by specific positions, this way it would not need to read anything into memory and would not need any limitations

@zeripath
Copy link
Contributor

My suspicion is that for very small blobs that actually could use more memory.

The majority of items in a git repo are probably less than 4kb which will fit in most default buffers and probably be already read completely in to memory by the time you've found them in a pack file.

It's really a question of at what point is it worth switching to a more complex mechanism.

@lafriks
Copy link
Author

lafriks commented Nov 13, 2020

depending on size it could fall back to different readers - memory or custom reader implementation from pack file

Copy link

github-actions bot commented Nov 5, 2023

To help us keep things tidy and focus on the active tasks, we've introduced a stale bot to spot issues/PRs that haven't had any activity in a while.

This particular issue hasn't had any updates or activity in the past 90 days, so it's been labeled as 'stale'. If it remains inactive for the next 30 days, it'll be automatically closed.

We understand everyone's busy, but if this issue is still important to you, please feel free to add a comment or make an update to keep it active.

Thanks for your understanding and cooperation!

@github-actions github-actions bot added the stale Issues/PRs that are marked for closure due to inactivity label Nov 5, 2023
@rgalanakis
Copy link

This can be a significant memory and performance issue for users (and because we expect most things to be streams, is unexpected). If it's too cumbersome to fix then close this issue but it's still relevant and important.

@lafriks
Copy link
Author

lafriks commented Nov 5, 2023

This is still an problem

@github-actions github-actions bot removed the stale Issues/PRs that are marked for closure due to inactivity label Nov 6, 2023
@pjbgf
Copy link
Member

pjbgf commented Nov 6, 2023

We should be able to resolve this for seekable storage. Around similar lines to what #799 proposes.

@pjbgf pjbgf added help wanted Extra attention is needed no-autoclose Issues/PRs to be ignored by stale bot performance labels Nov 6, 2023
@zeripath
Copy link
Contributor

This should have mostly been ameliorated by #330

@pjbgf
Copy link
Member

pjbgf commented Mar 8, 2024

Closing due to both #799 and #330 being merged and for lack of activity in this issue.

@pjbgf pjbgf closed this as completed Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed no-autoclose Issues/PRs to be ignored by stale bot performance
Projects
None yet
Development

No branches or pull requests

4 participants