-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Remove some String to/form Bytes conversion to behave better with jsoo #13543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This looks okay to me from a distance. (The |
@gasche, I completed the PR, added proper description and undrafted the PR |
|
If I understand things correctly, one way to reason about performance for js_of_ocaml is to consider that the Besides the cases that you have already noticed, I think that there are |
It's currently a bit more complicated but could become exactly that if we merge ocsigen/js_of_ocaml#1229.
|
gasche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with the PR, but I would prefer if you could get rid of the code duplication in the C code. (Unless I am missing something, this should be very easy.)
|
This is good to go, would you like to rebase to squash/fixup the relevant commits? |
This is done |
|
Merged, thanks! Does it make a difference for you whether we cherry-pick in 5.3 (in which case I need to ask our glorious release manager)? |
It could help with merging ocsigen/js_of_ocaml#1229 sooner but it's not a big deal if it doesn't make it, |
|
@Octachron let us know if you agree with backporting this in 5.3. This is a performance bugfix that only affects js_of_ocaml. It changes the set of runtime primitives (it exposes bytes-specific variants of |
|
As a well-scoped performance bug fix, this looks ok to integrate at this point of the release cycle. |
Remove some String to/form Bytes conversion to behave better with jsoo (cherry picked from commit c256a92)
|
If we ever make |
|
I think that we are not opposed to such changes in principle, as long as the maintenance costs are reasonable. The liberal use of those unsafe conversions in the current codebase comes from a least-effort approach when we did the split. |
Context
Js_of_ocaml uses different memory representation for bytes and strings. It leverages JavaScript immutable strings to implement ocaml ones (initial change in ocsigen/js_of_ocaml#923, enabled by default in ocsigen/js_of_ocaml#976).
This means that
Bytes.unsafe_to_string,Bytes.unsafe_of_stringare not longer the identity.The additional cost is usually ok because theses conversions often comes next to computations that are linear in the size of the argument of the conversions. However, some use cases convert large strings/bytes but only look at a small substring/subbytes.
This issue was spotted by @vouillon in ocsigen/js_of_ocaml#1703.
The Yojson parser/lexer uses the following helper that currently result in converting the whole lexbuf to string over and over again.
This PR
In this PR, One tried to find and fix such problematic usage in the stdlib.
Buffer.add_subbytes no longer rely on the string implementation.
and as a result Buffer.add_bytes not longer perform conversion either.
Digest.(sub)?bytes and Digest.MD5.(sub)?bytes no longer rely on the string implementation.
In addition, one has removed useless conversions when hashing the content of a channel.
The PR also contains another kind of change, that has to do with "fixing" the fast path of
String.escapedwith jsoo.String.escapednow avoids a conversion from bytes to string if the optimisation/fastpath in Bytes.unsafe_escape triggers. This change could be dropped from this PR if it's problematic.