-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
irmin-pack: clarify uses of Bytes.unsafe_{to,of}_string #1970
Conversation
For each occurrence of Bytes.unsafe_to_string, I have commented on whether or not the use is safe. Some are, some are not, and for some it is not clear whether they are safe or not.
Each occurrence of unsafe_to_string in irmin-pack has been marked with a comment as to whether the use is safe or not.
Comments use the terminology from the manual ("unique ownership" and "shared ownership").
Codecov Report
@@ Coverage Diff @@
## main #1970 +/- ##
==========================================
- Coverage 63.97% 63.93% -0.04%
==========================================
Files 129 129
Lines 15479 15480 +1
==========================================
- Hits 9902 9897 -5
- Misses 5577 5583 +6
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! Left comments/suggestions.
My only other suggestion is to squash your commits into logical change(s). I could see it either being one commit for everything or one for comments and one for code changes. I like a tidy git history whenever possible. 😄
@@ -33,6 +33,11 @@ module Make_persistent (K : Irmin.Type.S) (V : Value.S) = struct | |||
assert (n = 4); | |||
(file_pos := Int63.Syntax.(!file_pos + Int63.of_int 4)); | |||
let pos_ref = ref 0 in | |||
(* Bytes.unsafe_to_string usage: We assume Io_legacy.read_block returns unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add documentation to Io_legacy.read
@@ -179,6 +179,10 @@ module Make (Args : Args) : S with module Args := Args = struct | |||
read_exn ~off ~len buffer; | |||
let poff = Dispatcher.poff_of_entry_exn ~off ~len mapping in | |||
Bytes.set buffer Hash.hash_size magic_parent; | |||
(* Bytes.unsafe_to_string usage: We assume read_exn returns unique ownership of buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add documentation to read_exn
(* really_write and following do not mutate the given buffer, so | ||
Bytes.unsafe_of_string is actually safe *) | ||
(* Bytes.unsafe_of_string usage: s has shared ownership; we assume that | ||
Util.really_write does not mutate buf (i.e., only needs shared ownership). This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document Util.really_write
@@ -195,6 +204,10 @@ struct | |||
let found = Dispatcher.read_if_not_gced t.dispatcher ~off ~len buf in | |||
if (not found) || gced buf then None | |||
else | |||
(* Bytes.unsafe_to_string usafe: buf is create in this function, uniquely owned; we | |||
assume Dispatcher.read_if_not_gced returns unique ownership; then call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document Dispatcher.read_if_not_gced
Bytes.unsafe_to_string buf | ||
|
||
let decode s pos : t = | ||
(* Bytes.unsafe_of_string usage: s is shared; buf is shared (we cannot mutate it); | ||
we assume Bytes.get_... functions need shared ownership only. This usage is | ||
safe. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to another comment. The lifecycle of the buffer seems important to the safety here.
@@ -253,6 +253,9 @@ end = struct | |||
else | |||
let buffer_off, off, missing_hash = | |||
match | |||
(* Bytes.unsafe_to_string usage: possibly safe, depending on details of | |||
implementation of decode_entry_exn TODO either justify clearly that this is | |||
safe, or change to use safe Bytes.to_string *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decode_entry_exn
is likely safe but it would take some work to verify all the decode functions are. (I think it is highly unlikely that they modify the string)
I agree we should squash these commits into one before merging! |
@tomjridge I opened an issue to document the functions that are being called with a shared buffer so we don't hold up this PR. |
Thanks! Squashed and merged! |
We have had serious bugs in the past related to our use of Bytes.unsafe functions (see eg #1814, #1815).
In #1967 it was found that recent code made use of Bytes.unsafe_to_string that looked clearly unsafe.
This draft PR contains recent work to clarify occurrences of Bytes.unsafe_to_string within irmin-pack.
For most occurrences, I have added a comment justifying why the use is safe. The comments follow the terminology of the OCaml documentation https://v2.ocaml.org/api/Bytes.html#unsafe which uses terms "unique ownership" and "shared ownership". The OCaml documentation needs to be understood first, before one can properly assess whether the uses of Bytes.unsafe functions are actually safe or not.
For some occurrences in the code, it is not clear if the use is safe or not. I have added a comment with text and a TODO marker to indicate that more work needs to be done.
For one occurrence, I changed the location of the buffer to make the call obviously safe. This should not affect performance. Similarly, in one place I changed Bytes.unsafe_to_string to Bytes.to_string, where I thought the performance would not be affected.