New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster binary encoding/decoding #1030
Conversation
4a7f75a
to
ba5675f
Compare
I've removed the confusing |
I can review on Monday, if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Reviews are mostly nits. I think the staging + explicit Unboxed
APIs are much nicer to use (also has the advantage that we no longer rely on optional argument erasure in a way that breaks ppx_bisect
).
src/irmin/node.ml
Outdated
(* OK: ASSERT node to_bin_string: | ||
" | ||
\000\000\000\000\000\000\000\002 | ||
\000\000\000\000\000\000\000\003foo | ||
\255\000\000\000\000\000\000\000 | ||
\020\005\254@WS\022o\018UY\231\201\172U\134T\241\007\199\233 | ||
\000\000\000\000\000\000\000\000\000\003bar | ||
\000\000 | ||
\255\000\000\000\000\000\000\000 | ||
\020:\245\172\213`vT\179\167\031\015\158\018\195\026\023i\025s&" | ||
|
||
KO: ASSERT node to_bin_string: | ||
" | ||
\000\000\000\000\000\000\000\002 | ||
\000\000\000\000\000\000\000\003foo | ||
\255\000\000\000\000\000\000\000 | ||
\020\2189\163\238^kK\r2U\191\239\149`\024\144\175\216\007\t | ||
\000\000\000\000\000\000\000\000\000\003bar | ||
\000\000 | ||
\255\000\000\000\000\000\000\000 | ||
\020b\205\183\002\015\249\229\170d,=@f\149\r\209\240\031M" *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental comment slipped in here?
test/irmin/test_type.ml
Outdated
@@ -70,7 +111,8 @@ let ok x = Alcotest.result x error | |||
let test_json () = | |||
let s = T.to_json_string hex "foo" in | |||
Alcotest.(check string) "JSON hex" "\"666f6f\"" s; | |||
let s = T.to_bin_string hex "foo" in | |||
let s = to_bin_string hex "foo" in | |||
Fmt.epr "XXX %S\n%!" s; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a stray debug statement.
@@ -446,37 +460,49 @@ val size_of : 'a t -> 'a size_of | |||
(** [size_of t x] is either the size of [encode_bin t x] or the binary encoding | |||
of [x], if the backend is not able to pre-compute serialisation lengths. *) | |||
|
|||
module Unboxed : sig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps some documentation here explaining the optimisation being performed.
if ofs = 0 && len = String.length buf then ok len (Bytes.of_string buf) | ||
let mk_boxed of_string of_bytes n = | ||
let sub len buf ofs = | ||
if ofs = 0 && len = String.length buf then (len, of_string buf) | ||
else | ||
let str = Bytes.create len in | ||
String.blit buf ofs str 0 len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but it might be worth failing more gracefully here in the case len > String.length buf
(which corresponds to trying to read a header that doesn't exist / malformed length header). That's quite easy to enounter as a user of Type.Unboxed
.
let unboxed_string = function | ||
| `Fixed len -> fun _ -> len (* fixed-size strings are never boxed *) | ||
| _ -> String.length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but I think it's unintuitive that Type.(string_of (`Fixed 8))` and `Type.(string_of `Int64)
are not the same w.r.t. this function (since the corresponding string doesn't need to be of the correct length). I think we should make a consistent choice here, but we could also just say that it's undefined.
test/irmin/test_type.ml
Outdated
let check msg ty = | ||
let msg f = Fmt.strf "%s: %s" msg f in | ||
let buf = with_buf (encode_bin ty foo) in | ||
Alcotest.(check string) (msg "boxed") buf "\003foo"; | ||
let buf = with_buf (Unboxed.encode_bin ty foo) in | ||
Alcotest.(check string) (msg "unboxed") buf "foo"; | ||
let buf = with_buf (Unboxed.encode_bin (T.boxed ty) foo) in | ||
Alcotest.(check string) (msg "force boxed") buf "\003foo" | ||
in | ||
check "string" T.string; | ||
check "like" (T.like T.string); | ||
check "map" (T.map T.string id id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just duplicate these tests for T.bytes
?
35d864e
to
8190345
Compare
I've tried to take all your comments into account (but the 2 about the |
- add staging to all binary encoding/decoding operations - replace the confusing ?headers argument by a specialized Type.Unboxed module - pre-alocate closures in generic operation as early as possible, whenever the type is known
8190345
to
41b471c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge when ready.
I'll create a separate issue about the length issues.
CI restarted... |
/cc @let-def