-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include an even-higher level interface? #115
Comments
Not sure if you want to go there, but it would also be useful to have an interface for deflating an async input, like: val inflate_pipe : string Async.Pipe.t -> string Async.Pipe.t |
Hi, thanks to try to use This question about an even simpler interface is well know (see #25). I agree with you that documentation is not the best and we definitely need to show a simple example like However, I would not like to propose such function in the package when responsibilities come. Indeed, the triptych memory/ratio/speed remains for any kind of inputs. For me, it's important to let the end-user to control such details.
So, several questions comes and none of them can perfectly fit into any uses. If we provide such function, we take the responsibility of its implementation - and, as I explained, such implementation leads several choices. We strictly knows that these choices can be really bad for some use-cases and to avoid any problems about that, we prefer to respect, at least, the interface provided by The current Another aspect is about time, And about I will try to improve documentation next week and if you want to participate, I will happy to introduce your incomes 👍. |
This comment says there's documentation somewhere on how to write a simple inflate/deflate function: #25 (comment) I think just linking to where that is from this section of the readme would help a lot: https://github.com/mirage/decompress#higher-interface I can understand the difficulty of making tradeoffs here, but it think it's generally fine to just pick something "good enough" for a high level interface and in the documentation/comments explain to people why they might want to use a lower-level version. I can understanding leaving that for a higher level library though, although you might want to mention that people looking for a higher-level interface could try [insert library here] in the docs too. Note: Based on what you said above, I ended up making my high-level interface look like: let inflate_bigstring_to_bigbuffer
?(buffer_size_bytes = 10000)
?(expected_compression_ratio = 3)
input
=
let buffer = Bigstring.create buffer_size_bytes
and output = Bigbuffer.create (Bigstring.length input * expected_compression_ratio) in
let refill s = Bigstring.length s
and flush s n = Bigbuffer.add_bigstring output (Bigstring.sub_shared s ~pos:0 ~len:n) in
Gz.Higher.uncompress ~refill ~flush input buffer |> Result.map ~f:(fun _ -> output)
;;
let inflate_bigstring ?buffer_size_bytes ?expected_compression_ratio input =
inflate_bigstring_to_bigbuffer ?buffer_size_bytes ?expected_compression_ratio input
|> Result.map ~f:Bigbuffer.big_contents
;;
let inflate_string ?buffer_size_bytes ?expected_compression_ratio input =
Bigstring.of_string input
|> inflate_bigstring_to_bigbuffer ?buffer_size_bytes ?expected_compression_ratio
|> Result.map ~f:Bigbuffer.contents
;; I think the biggest downside of this is that it uses Bigbuffer from Core_kernel, but that lets it be fairly efficient except for a copy at the end to get the contents. |
and flush s n = Bigbuffer.add_bigstring output (Bigstring.sub_shared s ~pos:0 ~len:n) in I'm not sure about Then, I added a documentation, add examples and clear some aspects of |
So I think what I was doing make sense (I'm trying to let the function do a copy of the substring without doing an additional copy). Thanks for adding more documentation to this! |
Thanks for the feedback 👍. |
CHANGES: - Add a well-know limitation about the encoding on the documentation, the output buffer must be upper than 2 bytes in any cases. (@dinosaure, mirage/decompress#114) - Improve the documentation (@dinosaure, @brendanlong, mirage/decompress#115) - **breaking changes**, the type of the window used to deflate according RFC 1951 was updated to `De.Lz77.window` (@dinosaure, mirage/decompress#116 & mirage/decompress#115) - Fix a bug when we want to add the EOB _op-code_ into the queue. The deflation takes care about that. Note that the queue must be larger than 2. (@dinosaure, @kluvin, mirage/decompress#117) - Improve the documentation a bit (@mseri, @dinosaure, mirage/decompress#119) - Fix the use of `optint` and how we handle large files with `Gz`. Fix an error when we encode the `isize` into the deflated stream. (@dinosaure, @igarnier, mirage/decompress#121 & mirage/decompress#120) - Add a _non-stream_ implementation (@clecat, @dinosaure, mirage/decompress#102 & mirage/decompress#92) The non-blocking stream API has a cost to maintain a _state_ across _syscall_ such as `read` and `write`. It's useful when we want to plug `decompress` behind something like a `socket` and care about memory-consumption but it has a big cost when we want to compress/decompress an object saved into one and unique buffer. The _non-stream_ API gives an opportunity to inflate/deflate one and unique buffer without the usual plumbing required by the non-blocking stream API. However, we are limited to compute only objects which can fit into a `bigarray`. About performance, the non-stream API is better than the non-blocking stream API. See the PR for more details about performances. On the `book2` (from the Calgary corpus) file: - `decompress` (stream): 15 Mb/s (deflation), 76 Mb/s (inflation), ratio: 42.46 % - `decompress` (non-stream): 17 Mb/s (deflation), 105 Mb/s (inflation), ratio: 34.66 % Even if we checked the implementation with our tests (we ran `ocaml-git` and `irmin` with this path), the implementation is young and we probably miss some details/bugs. So we advise the user to compare, at least, the non-stream implementation with the non-blocking stream implementation if something is wrong.
I had some trouble using this library since the higher-level interface is undocumented and still too-low-level for what I'm trying to do (decompress some data that I have in-memory).
I eventually came up with:
Would it make sense to include something like this in the code, or at least give an example in the documentation?
The text was updated successfully, but these errors were encountered: