Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement snappy-framed #138

Open
nemequ opened this issue Oct 10, 2015 · 5 comments
Open

Reimplement snappy-framed #138

nemequ opened this issue Oct 10, 2015 · 5 comments

Comments

@nemequ
Copy link
Member

nemequ commented Oct 10, 2015

The current snappy-framed parser is fragile, so I'm disabling it. I threw it together quickly without putting much effort into making it robust. It would be pretty easy to do it using the new splicing interface, though it would require at least one memcpy. Another option might be to use Ragel (which is already required for the ini stuff, so it doesn't add a dependency).

@AlaskaJoslin
Copy link

Hi, I really like this library so far, but I want to use streaming for large files (larger than memory). Although I can't find explicit documentation saying so (maybe this is what #180 is for?), most of the plugins seem to try to fit the whole compressed file in memory even in streaming mode (even when calling flush). From the benchmarks it looks like snappy-framed is a perfect fit for these large files, so I'm considering fixing the plugin and at least using it locally. Is the current version of snappy-framed broken, does it not handle files from other versions well, or is it just stylistically bad code?

@nemequ
Copy link
Member Author

nemequ commented Jul 17, 2019

I think most of the plugins actually support streaming, but you're right that snappy isn't one of them.

If memory serves, snappy-framed was just broken and couldn't safely handle invalid/malicious files.

Instead of reimplementing snappy-framed, you might want to consider adding streaming support to LZ4. IIRC LZ4 didn't support streaming when I wrote the plugin, but it does now… the API is a bit different so implementing it might require some creativity, but you should get better performance than snappy-framed.

@nemequ
Copy link
Member Author

nemequ commented Jul 17, 2019

Sorry, LZ4 does support streaming; it's the lz4-raw codec that doesn't.

@AlaskaJoslin
Copy link

So my main goal is to improve the memory usage. For instance, with Brotli I get very large heaps even with flushing and streaming. I can see from the benchmark, that only about a dozen plugins support streaming and flushing (mostly zlib and zlib-ng right?). If I can lower the memory usage for large files and get decent r/w speeds and tolerable compression with an existing plugin, then I will probably go that route.

As far as LZ4 streaming goes, I'm thinking an approach similar to this might offer good performance.

@nemequ
Copy link
Member Author

nemequ commented Jul 17, 2019

In master, LZ4 supports streaming and flushing. So does Brotli. That table reflects the version used for the benchmark, support in git is better. You can query each codec with something like

const SquashCodecInfo required_caps =
    (SquashCodecInfo) (SQUASH_CODEC_INFO_NATIVE_STREAMING & SQUASH_CODEC_INFO_CAN_FLUSH);
bool streaming_and_flushing =
    ((squash_codec_get_info(codec) & required_caps) == required_caps);

It's also worth noting that, depending on how you define of memory usage, streaming may actually increase it. With a streaming codec the codec will often create a large internal buffer to hold (potential) matches as it scans. With a buffer-to-buffer codec it can just refer to the actual input data as it has a complete copy. Then, if you use memory mapped buffers for your input and output, the OS will transparently handle swapping stuff into and out of memory. Of course if you're working with large files a 64-bit pointer size is pretty much required, but that's unlikely to be a problem these days.

Also, to be clear, in git the names for LZ4 have changed (based on talks with the developer of LZ4). lz4f and lz4 have been changed to lz4 and lz4-raw to try to push people to use the framed codec, which is compatible with the lz4 CLI tool. IIRC when I initially wrote the lz4 plugin the framed codec didn't exist yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants