-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup residual files #2095
Cleanup residual files #2095
Conversation
src/irmin-pack/unix/file_manager.ml
Outdated
| None | Some (`Branch | `Dict | `V1_or_v2_pack) -> false | ||
| Some (`Suffix g | `Prefix g | `Mapping g | `Gc_result g) -> | ||
g <> generation | ||
| Some (`Reachable _ | `Sorted _) -> true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should keep the V1_or_v2_pack
or the Gc_result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V1_or_v2_pack
is unreachable because migration to V3
never leaves a V1_or_v2_pack
behind. The choice doesn't matter.
Gc_result
has roughly the same lifetime as Reachable
and Sorted
. You could move it with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ok! The previous Gc_result
will also get removed now :)
@@ -586,6 +586,8 @@ module Make (Args : Args) = struct | |||
let () = Lwt.wakeup_later t.resolver (Ok !s) in | |||
Ok (`Finalised !s) | |||
| _ -> | |||
t.stats <- Some !s; | |||
Fm.cleanup ~root:t.root ~generation:(t.generation - 1); | |||
let err = gc_errors status gc_output in | |||
let () = Lwt.wakeup_later t.resolver err in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.stats <- Some !s
is the quickfix to mark the GC as finished (otherwise it's illegal to call wakeup_later
a second time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder if we should also set t.running_gc
to None
in the Error
case in ext.ml
's finalise_exn
? It sort of works without a change since it would get cleaned up with the next call to finalise_exn
(since you are setting stats here), but that doesn't seem like the best approach.
irmin/src/irmin-pack/unix/ext.ml
Lines 297 to 301 in 410b778
| Ok (`Finalised _ as x) -> | |
t.running_gc <- None; | |
Lwt.return x | |
| Ok waited -> Lwt.return waited | |
| Error e -> Errs.raise_error e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha thanks I missed that! It's indeed a lot cleaner to unset the running_gc
than the previous hack :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏 I think we probably need both -- t.running_gc <- None
to signal no GC is running for the repo and t.stats <- Some !s
to signal the gc process is finished. It would be nice to actually track the state of the GC process (instead of implicitly through stats option) but we can do that in a future PR.
|
||
let of_json_string string = | ||
match Irmin.Type.of_json_string io_err_result string with | ||
| Error (`Msg _) -> Base.of_json_string string | ||
| Ok result -> (result : (_, io_error) result :> (_, [> t ]) result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code didn't quite work to force repr
to union the polymorphic variant. The call to Irmin.Type.of_json_string io_err_result string
raises an "index out bounds" deep in repr
logic (and so Base.of_json_string
never gets a chance to run)
The fix of redefining the full polymorphic variant ain't quite satisfying, but at least the inlining is verified by the typechecker so we can't forget to update it too...
src/irmin-pack/unix/file_manager.ml
Outdated
| None | Some (`Branch | `Dict | `V1_or_v2_pack) -> false | ||
| Some (`Suffix g | `Prefix g | `Mapping g | `Gc_result g) -> | ||
g <> generation | ||
| Some (`Reachable _ | `Sorted _) -> true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V1_or_v2_pack
is unreachable because migration to V3
never leaves a V1_or_v2_pack
behind. The choice doesn't matter.
Gc_result
has roughly the same lifetime as Reachable
and Sorted
. You could move it with them.
@@ -586,6 +586,8 @@ module Make (Args : Args) = struct | |||
let () = Lwt.wakeup_later t.resolver (Ok !s) in | |||
Ok (`Finalised !s) | |||
| _ -> | |||
t.stats <- Some !s; | |||
Fm.cleanup ~root:t.root ~generation:(t.generation - 1); | |||
let err = gc_errors status gc_output in | |||
let () = Lwt.wakeup_later t.resolver err in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not 🤷
src/irmin-pack/unix/io_errors.ml
Outdated
type t = | ||
[ `Double_close | ||
| `File_exists of string | ||
| `Invalid_parent_directory | ||
| `No_such_file_or_directory | ||
| `Not_a_file | ||
| `Read_out_of_bounds | ||
| `Invalid_argument | ||
| `Decoding_error | ||
| `Not_a_directory of string | ||
| `Index_failure of string | ||
| `Invalid_layout | ||
| `Corrupted_legacy_file | ||
| `Corrupted_mapping_file of string | ||
| `Pending_flush | ||
| `Rw_not_allowed | ||
| `Migration_needed | ||
| `Corrupted_control_file | ||
| `Sys_error of string | ||
| `V3_store_from_the_future | ||
| `Gc_forbidden_during_batch | ||
| `Unknown_major_pack_version of string | ||
| `Only_minimal_indexing_strategy_allowed | ||
| `Commit_key_is_dangling of string | ||
| `Dangling_key of string | ||
| `Gc_disallowed | ||
| `Node_or_contents_key_is_indexed of string | ||
| `Commit_parent_key_is_indexed of string | ||
| `Gc_process_error of string | ||
| `Corrupted_gc_result_file of string | ||
| `Gc_process_died_without_result_file of string | ||
| `Gc_forbidden_on_32bit_platforms | ||
| `Invalid_prefix_read of string | ||
| `Invalid_mapping_read of string | ||
| `Invalid_read_of_gced_object of string | ||
| `Inconsistent_store | ||
| `Closed | ||
| `Ro_not_allowed | ||
| `Io_misc of Io.misc_error ] | ||
[@@deriving irmin ~pp] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicating the list of all these variants is very disappointing. I however understand the point of your diff. Let's keep what you did and keep in mind that this could be improved on the repr side (mirage/repr#99)
src/irmin-pack/unix/io_errors.ml
Outdated
type t = [ Base.t | `Io_misc of Io.misc_error ] | ||
|
||
val pp : Format.formatter -> [< t ] -> unit | ||
val pp : [< t ] Fmt.t | ||
val raise_error : [< t ] -> 'a | ||
val log_error : string -> [< t ] -> unit | ||
val catch : (unit -> 'a) -> ('a, [> t ]) result | ||
val catch : (unit -> 'a) -> ('a, t) result | ||
val raise_if_error : ('a, [< t ]) result -> 'a | ||
val log_if_error : string -> (unit, [< t ]) result -> unit | ||
val to_json_string : (int63, [< t ]) result -> string | ||
val of_json_string : string -> (int63, [> t ]) result | ||
val of_json_string : string -> (int63, t) result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you've implemented a typerep for type t
, I believe that pp
, to_json_string
and of_json_string
should go in favour of the typerepr (i.e. type t = [ Base.t | `Io_misc of Io.misc_error ] [@@deriving irmin.type]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I simplified a bit the code (but kept most of the functions you mentioned... I can still inline them at their callsites if you prefer?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes inlining is better, it corresponds to what is done elsewhere in irmin.
Codecov Report
@@ Coverage Diff @@
## main #2095 +/- ##
==========================================
+ Coverage 64.60% 64.68% +0.07%
==========================================
Files 132 132
Lines 15695 15708 +13
==========================================
+ Hits 10140 10160 +20
+ Misses 5555 5548 -7
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
src/irmin-pack/unix/gc.ml
Outdated
@@ -37,6 +37,8 @@ module Worker = struct | |||
len:int63 -> | |||
bytes -> | |||
unit | |||
|
|||
val gc_output_t : (int63, Args.Errs.t) result Irmin.Type.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would typically do: type gc_output : (int63, Args.Errs.t) result [@@deriving irmin]
both in the signature and the implementation.
Feel free to change or not
src/irmin-pack/unix/io_errors.ml
Outdated
@@ -24,54 +24,75 @@ module type S = sig | |||
|
|||
type t = [ Base.t | `Io_misc of Io.misc_error ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type t = [ Base.t | `Io_misc of Io.misc_error ] [@@deriving irmin]
is the typical way to go
7e01b79
to
85b9cf8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, great!
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2) CHANGES: ### Added - **irmin** - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris) - **irmin-graphql** - Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris) - Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris) - **irmin-pack** - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w) - **irmin-tezos-utils** - Add package `irmin-tezos-utils` containing a graphical tool for manual pack files analysis. (mirage/irmin#1939, @clecat) ### Changed - **irmin-pack** - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081, @icristescu) - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack` to `irmin-pack.unix` (mirage/irmin#2084, @metanivek) - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w) ### Fixed - **irmin-pack** - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42) - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106, @icristescu) ### Fixed - **irmin-pack** - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with gced stores. (mirage/irmin#1919, @icristescu)
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2) CHANGES: ### Added - **irmin** - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris) - **irmin-graphql** - Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris) - Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris) - **irmin-pack** - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w) - **irmin-tezos-utils** - Add package `irmin-tezos-utils` containing a graphical tool for manual pack files analysis. (mirage/irmin#1939, @clecat) ### Changed - **irmin-pack** - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081, @icristescu) - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack` to `irmin-pack.unix` (mirage/irmin#2084, @metanivek) - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w) ### Fixed - **irmin-pack** - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42) - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106, @icristescu) - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with gced stores. (mirage/irmin#1919, @icristescu)
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2) CHANGES: ### Added - **irmin** - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris) - **irmin-graphql** - Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris) - Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris) - **irmin-pack** - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w) - **irmin-tezos-utils** - Add package `irmin-tezos-utils` containing a graphical tool for manual pack files analysis. (mirage/irmin#1939, @clecat) ### Changed - **irmin-pack** - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081, @icristescu) - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack` to `irmin-pack.unix` (mirage/irmin#2084, @metanivek) - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w) ### Fixed - **irmin-pack** - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42) - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106, @icristescu) - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with gced stores. (mirage/irmin#1919, @icristescu)
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2) CHANGES: ### Added - **irmin** - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris) - **irmin-graphql** - Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris) - Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris) - **irmin-pack** - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w) - **irmin-tezos-utils** - Add package `irmin-tezos-utils` containing a graphical tool for manual pack files analysis. (mirage/irmin#1939, @clecat) ### Changed - **irmin-pack** - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081, @icristescu) - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack` to `irmin-pack.unix` (mirage/irmin#2084, @metanivek) - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w) ### Fixed - **irmin-pack** - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42) - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106, @icristescu) - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with gced stores. (mirage/irmin#1919, @icristescu)
This PR removes files from the store that are not needed by the current generation:
open_rw
, to clean up a harshly closed store on rebootI'm not quite sure how to unit test the GC failures but there were two issues hiding in there:
Ok final_offset | Error ...
in a temporary file. TheFile_manager.read_gc_output
was unable to parse the error case in general asrepr
didn't quite like the polymorphic variants union (so the json decoding was crashing deep intorepr
internals)lib_context
keeps on callingfinalise ~wait:false
to know the status of the GC. On a GC crash, the error handling was not marking the GC as finished and so the nextfinalise
would attempt to wakeup a second time theresolver
.