Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: clarify uses of Bytes.unsafe_{to,of}_string #1970

Merged
merged 8 commits into from
Jul 8, 2022

Conversation

tomjridge
Copy link
Contributor

We have had serious bugs in the past related to our use of Bytes.unsafe functions (see eg #1814, #1815).

In #1967 it was found that recent code made use of Bytes.unsafe_to_string that looked clearly unsafe.

This draft PR contains recent work to clarify occurrences of Bytes.unsafe_to_string within irmin-pack.

For most occurrences, I have added a comment justifying why the use is safe. The comments follow the terminology of the OCaml documentation https://v2.ocaml.org/api/Bytes.html#unsafe which uses terms "unique ownership" and "shared ownership". The OCaml documentation needs to be understood first, before one can properly assess whether the uses of Bytes.unsafe functions are actually safe or not.

For some occurrences in the code, it is not clear if the use is safe or not. I have added a comment with text and a TODO marker to indicate that more work needs to be done.

For one occurrence, I changed the location of the buffer to make the call obviously safe. This should not affect performance. Similarly, in one place I changed Bytes.unsafe_to_string to Bytes.to_string, where I thought the performance would not be affected.

For each occurrence of Bytes.unsafe_to_string, I have commented on
whether or not the use is safe. Some are, some are not, and for some it
is not clear whether they are safe or not.
Each occurrence of unsafe_to_string in irmin-pack has been marked with
a comment as to whether the use is safe or not.
Comments use the terminology from the manual ("unique ownership" and
"shared ownership").
@codecov-commenter
Copy link

codecov-commenter commented Jul 7, 2022

Codecov Report

Merging #1970 (97340db) into main (347d885) will decrease coverage by 0.03%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##             main    #1970      +/-   ##
==========================================
- Coverage   63.97%   63.93%   -0.04%     
==========================================
  Files         129      129              
  Lines       15479    15480       +1     
==========================================
- Hits         9902     9897       -5     
- Misses       5577     5583       +6     
Impacted Files Coverage Δ
src/irmin-pack/inode.ml 79.23% <ø> (ø)
src/irmin-pack/unix/atomic_write.ml 86.02% <ø> (ø)
src/irmin-pack/unix/io.ml 58.99% <ø> (ø)
src/irmin-pack/unix/mapping_file.ml 16.03% <0.00%> (ø)
src/irmin-pack/unix/pack_index.ml 69.69% <ø> (ø)
src/irmin-pack/unix/traverse_pack_file.ml 56.61% <ø> (ø)
src/irmin-pack/unix/gc.ml 9.01% <50.00%> (ø)
src/irmin-pack/unix/pack_store.ml 81.86% <100.00%> (+0.10%) ⬆️
src/irmin-pack/unix/snapshot.ml 77.16% <100.00%> (ø)
src/irmin-fs/unix/irmin_fs_unix.ml 64.51% <0.00%> (-3.88%) ⬇️
... and 1 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Copy link
Member

@metanivek metanivek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! Left comments/suggestions.

My only other suggestion is to squash your commits into logical change(s). I could see it either being one commit for everything or one for comments and one for code changes. I like a tidy git history whenever possible. 😄

src/irmin-pack/unix/snapshot.ml Show resolved Hide resolved
src/irmin-pack/inode.ml Show resolved Hide resolved
@@ -33,6 +33,11 @@ module Make_persistent (K : Irmin.Type.S) (V : Value.S) = struct
assert (n = 4);
(file_pos := Int63.Syntax.(!file_pos + Int63.of_int 4));
let pos_ref = ref 0 in
(* Bytes.unsafe_to_string usage: We assume Io_legacy.read_block returns unique
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation to Io_legacy.read

@@ -179,6 +179,10 @@ module Make (Args : Args) : S with module Args := Args = struct
read_exn ~off ~len buffer;
let poff = Dispatcher.poff_of_entry_exn ~off ~len mapping in
Bytes.set buffer Hash.hash_size magic_parent;
(* Bytes.unsafe_to_string usage: We assume read_exn returns unique ownership of buffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation to read_exn

(* really_write and following do not mutate the given buffer, so
Bytes.unsafe_of_string is actually safe *)
(* Bytes.unsafe_of_string usage: s has shared ownership; we assume that
Util.really_write does not mutate buf (i.e., only needs shared ownership). This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document Util.really_write

src/irmin-pack/unix/pack_index.ml Outdated Show resolved Hide resolved
src/irmin-pack/unix/pack_index.ml Outdated Show resolved Hide resolved
@@ -195,6 +204,10 @@ struct
let found = Dispatcher.read_if_not_gced t.dispatcher ~off ~len buf in
if (not found) || gced buf then None
else
(* Bytes.unsafe_to_string usafe: buf is create in this function, uniquely owned; we
assume Dispatcher.read_if_not_gced returns unique ownership; then call to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document Dispatcher.read_if_not_gced

Bytes.unsafe_to_string buf

let decode s pos : t =
(* Bytes.unsafe_of_string usage: s is shared; buf is shared (we cannot mutate it);
we assume Bytes.get_... functions need shared ownership only. This usage is
safe. *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to another comment. The lifecycle of the buffer seems important to the safety here.

@@ -253,6 +253,9 @@ end = struct
else
let buffer_off, off, missing_hash =
match
(* Bytes.unsafe_to_string usage: possibly safe, depending on details of
implementation of decode_entry_exn TODO either justify clearly that this is
safe, or change to use safe Bytes.to_string *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode_entry_exn is likely safe but it would take some work to verify all the decode functions are. (I think it is highly unlikely that they modify the string)

@tomjridge
Copy link
Contributor Author

I agree we should squash these commits into one before merging!

@metanivek
Copy link
Member

@tomjridge I opened an issue to document the functions that are being called with a shared buffer so we don't hold up this PR.

@samoht samoht marked this pull request as ready for review July 8, 2022 05:28
@samoht samoht merged commit df93d50 into mirage:main Jul 8, 2022
@samoht
Copy link
Member

samoht commented Jul 8, 2022

Thanks! Squashed and merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog-needed No changelog is needed here
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants