Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cramjam to provide compression utilities #1089

Closed
nsmith- opened this issue Jan 19, 2024 · 6 comments · Fixed by #1090
Closed

Use cramjam to provide compression utilities #1089

nsmith- opened this issue Jan 19, 2024 · 6 comments · Fixed by #1090
Assignees
Labels
feature New feature or request

Comments

@nsmith-
Copy link
Collaborator

nsmith- commented Jan 19, 2024

@lgray mentioned https://pypi.org/project/cramjam/ to me and it looks like a nice solution to provide many of the ROOT compression algorithms in a single dependency-free package. Additionally, it allows to declare the output length so it can pre-allocate the buffer, which may provide some speedup for algorithms other than lz4, which is the only one currently using the uncompressed size hint:

return lz4_block.decompress(data, uncompressed_size=uncompressed_bytes)

This is an internal feature and would not provide any user enhancement other than a potential speed-up

@nsmith- nsmith- added the feature New feature or request label Jan 19, 2024
@lgray
Copy link
Contributor

lgray commented Jan 19, 2024

Just to finish the chain this was mentioned to me by @martindurant.

@martindurant
Copy link

it allows to declare the output length so it can pre-allocate the buffer

You can also allocate buffers yourself and decompress_into - I don't know if there's a use case for that.

@nsmith-
Copy link
Collaborator Author

nsmith- commented Jan 19, 2024

Actually, yes! One thing uproot is often doing is decompressing many small chunks and then concatenating them into a larger contiguous buffer. We could save some additional allocation and copy time if we can decompress into a buffer at an arbitrary offset.

@martindurant
Copy link

if we can decompress into a buffer at an arbitrary offset.

yes, certainly you can, I think by just slicing the base numpy array

@jpivarski
Copy link
Member

Since we want Uproot to work in Pyodide, it's important to note that cramjam works in Pyodide.

image

Writing into a single, contiguous buffer with decompress_into would require some rearchitecting—possible, but a major project. Also, it could only work for non-ragged data (or only the outer indexes of ragged data). It could perhaps be an extension of uproot.AsDtypeInPlace.

@jpivarski
Copy link
Member

I've split out the request to use cramjam's decompress-in-place into a new issue; this will be closed when #1090 is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants