Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster binary encoding/decoding #1030

Merged
merged 4 commits into from Jul 15, 2020
Merged

Conversation

samoht
Copy link
Member

@samoht samoht commented Jul 10, 2020

/cc @let-def

@samoht
Copy link
Member Author

samoht commented Jul 11, 2020

I've removed the confusing headers option and added an explicit Unboxed sub-module for unboxed binary operations. I'll merge once the CI is green.

@craigfe
Copy link
Member

craigfe commented Jul 11, 2020

I can review on Monday, if needed

Copy link
Member

@craigfe craigfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Reviews are mostly nits. I think the staging + explicit Unboxed APIs are much nicer to use (also has the advantage that we no longer rely on optional argument erasure in a way that breaks ppx_bisect).

CHANGES.md Outdated Show resolved Hide resolved
CHANGES.md Outdated Show resolved Hide resolved
CHANGES.md Outdated Show resolved Hide resolved
Comment on lines 405 to 425
(* OK: ASSERT node to_bin_string:
"
\000\000\000\000\000\000\000\002
\000\000\000\000\000\000\000\003foo
\255\000\000\000\000\000\000\000
\020\005\254@WS\022o\018UY\231\201\172U\134T\241\007\199\233
\000\000\000\000\000\000\000\000\000\003bar
\000\000
\255\000\000\000\000\000\000\000
\020:\245\172\213`vT\179\167\031\015\158\018\195\026\023i\025s&"

KO: ASSERT node to_bin_string:
"
\000\000\000\000\000\000\000\002
\000\000\000\000\000\000\000\003foo
\255\000\000\000\000\000\000\000
\020\2189\163\238^kK\r2U\191\239\149`\024\144\175\216\007\t
\000\000\000\000\000\000\000\000\000\003bar
\000\000
\255\000\000\000\000\000\000\000
\020b\205\183\002\015\249\229\170d,=@f\149\r\209\240\031M" *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental comment slipped in here?

@@ -70,7 +111,8 @@ let ok x = Alcotest.result x error
let test_json () =
let s = T.to_json_string hex "foo" in
Alcotest.(check string) "JSON hex" "\"666f6f\"" s;
let s = T.to_bin_string hex "foo" in
let s = to_bin_string hex "foo" in
Fmt.epr "XXX %S\n%!" s;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a stray debug statement.

@@ -446,37 +460,49 @@ val size_of : 'a t -> 'a size_of
(** [size_of t x] is either the size of [encode_bin t x] or the binary encoding
of [x], if the backend is not able to pre-compute serialisation lengths. *)

module Unboxed : sig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps some documentation here explaining the optimisation being performed.

if ofs = 0 && len = String.length buf then ok len (Bytes.of_string buf)
let mk_boxed of_string of_bytes n =
let sub len buf ofs =
if ofs = 0 && len = String.length buf then (len, of_string buf)
else
let str = Bytes.create len in
String.blit buf ofs str 0 len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but it might be worth failing more gracefully here in the case len > String.length buf (which corresponds to trying to read a header that doesn't exist / malformed length header). That's quite easy to enounter as a user of Type.Unboxed.

src/irmin/type/type_binary.ml Outdated Show resolved Hide resolved
Comment on lines +53 to +55
let unboxed_string = function
| `Fixed len -> fun _ -> len (* fixed-size strings are never boxed *)
| _ -> String.length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but I think it's unintuitive that Type.(string_of (`Fixed 8))` and `Type.(string_of `Int64) are not the same w.r.t. this function (since the corresponding string doesn't need to be of the correct length). I think we should make a consistent choice here, but we could also just say that it's undefined.

Comment on lines 57 to 68
let check msg ty =
let msg f = Fmt.strf "%s: %s" msg f in
let buf = with_buf (encode_bin ty foo) in
Alcotest.(check string) (msg "boxed") buf "\003foo";
let buf = with_buf (Unboxed.encode_bin ty foo) in
Alcotest.(check string) (msg "unboxed") buf "foo";
let buf = with_buf (Unboxed.encode_bin (T.boxed ty) foo) in
Alcotest.(check string) (msg "force boxed") buf "\003foo"
in
check "string" T.string;
check "like" (T.like T.string);
check "map" (T.map T.string id id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just duplicate these tests for T.bytes?

@samoht
Copy link
Member Author

samoht commented Jul 13, 2020

I've tried to take all your comments into account (but the 2 about the len which can be addressed later -- as I am not sure how to address these properly)

- add staging to all binary encoding/decoding operations
- replace the confusing ?headers argument by a specialized Type.Unboxed module
- pre-alocate closures in generic operation as early as possible, whenever the
  type is known
Copy link
Member

@craigfe craigfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge when ready.

I'll create a separate issue about the length issues.

@samoht
Copy link
Member Author

samoht commented Jul 15, 2020

CI restarted...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants