Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup residual files #2095

Merged
merged 3 commits into from
Oct 4, 2022
Merged

Cleanup residual files #2095

merged 3 commits into from
Oct 4, 2022

Conversation

art-w
Copy link
Contributor

@art-w art-w commented Sep 27, 2022

This PR removes files from the store that are not needed by the current generation:

  • At open_rw, to clean up a harshly closed store on reboot
  • After a GC crash, to get rid of the failed new generation

I'm not quite sure how to unit test the GC failures but there were two issues hiding in there:

  • The GC worker writes its result Ok final_offset | Error ... in a temporary file. The File_manager.read_gc_output was unable to parse the error case in general as repr didn't quite like the polymorphic variants union (so the json decoding was crashing deep into repr internals)
  • lib_context keeps on calling finalise ~wait:false to know the status of the GC. On a GC crash, the error handling was not marking the GC as finished and so the next finalise would attempt to wakeup a second time the resolver.

| None | Some (`Branch | `Dict | `V1_or_v2_pack) -> false
| Some (`Suffix g | `Prefix g | `Mapping g | `Gc_result g) ->
g <> generation
| Some (`Reachable _ | `Sorted _) -> true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should keep the V1_or_v2_pack or the Gc_result?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V1_or_v2_pack is unreachable because migration to V3 never leaves a V1_or_v2_pack behind. The choice doesn't matter.

Gc_result has roughly the same lifetime as Reachable and Sorted. You could move it with them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ok! The previous Gc_result will also get removed now :)

@@ -586,6 +586,8 @@ module Make (Args : Args) = struct
let () = Lwt.wakeup_later t.resolver (Ok !s) in
Ok (`Finalised !s)
| _ ->
t.stats <- Some !s;
Fm.cleanup ~root:t.root ~generation:(t.generation - 1);
let err = gc_errors status gc_output in
let () = Lwt.wakeup_later t.resolver err in
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.stats <- Some !s is the quickfix to mark the GC as finished (otherwise it's illegal to call wakeup_later a second time)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 🤷

Copy link
Member

@metanivek metanivek Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if we should also set t.running_gc to None in the Error case in ext.ml's finalise_exn? It sort of works without a change since it would get cleaned up with the next call to finalise_exn (since you are setting stats here), but that doesn't seem like the best approach.

| Ok (`Finalised _ as x) ->
t.running_gc <- None;
Lwt.return x
| Ok waited -> Lwt.return waited
| Error e -> Errs.raise_error e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha thanks I missed that! It's indeed a lot cleaner to unset the running_gc than the previous hack :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 I think we probably need both -- t.running_gc <- None to signal no GC is running for the repo and t.stats <- Some !s to signal the gc process is finished. It would be nice to actually track the state of the GC process (instead of implicitly through stats option) but we can do that in a future PR.


let of_json_string string =
match Irmin.Type.of_json_string io_err_result string with
| Error (`Msg _) -> Base.of_json_string string
| Ok result -> (result : (_, io_error) result :> (_, [> t ]) result)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code didn't quite work to force repr to union the polymorphic variant. The call to Irmin.Type.of_json_string io_err_result string raises an "index out bounds" deep in repr logic (and so Base.of_json_string never gets a chance to run)

The fix of redefining the full polymorphic variant ain't quite satisfying, but at least the inlining is verified by the typechecker so we can't forget to update it too...

| None | Some (`Branch | `Dict | `V1_or_v2_pack) -> false
| Some (`Suffix g | `Prefix g | `Mapping g | `Gc_result g) ->
g <> generation
| Some (`Reachable _ | `Sorted _) -> true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V1_or_v2_pack is unreachable because migration to V3 never leaves a V1_or_v2_pack behind. The choice doesn't matter.

Gc_result has roughly the same lifetime as Reachable and Sorted. You could move it with them.

@@ -586,6 +586,8 @@ module Make (Args : Args) = struct
let () = Lwt.wakeup_later t.resolver (Ok !s) in
Ok (`Finalised !s)
| _ ->
t.stats <- Some !s;
Fm.cleanup ~root:t.root ~generation:(t.generation - 1);
let err = gc_errors status gc_output in
let () = Lwt.wakeup_later t.resolver err in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 🤷

Comment on lines 51 to 91
type t =
[ `Double_close
| `File_exists of string
| `Invalid_parent_directory
| `No_such_file_or_directory
| `Not_a_file
| `Read_out_of_bounds
| `Invalid_argument
| `Decoding_error
| `Not_a_directory of string
| `Index_failure of string
| `Invalid_layout
| `Corrupted_legacy_file
| `Corrupted_mapping_file of string
| `Pending_flush
| `Rw_not_allowed
| `Migration_needed
| `Corrupted_control_file
| `Sys_error of string
| `V3_store_from_the_future
| `Gc_forbidden_during_batch
| `Unknown_major_pack_version of string
| `Only_minimal_indexing_strategy_allowed
| `Commit_key_is_dangling of string
| `Dangling_key of string
| `Gc_disallowed
| `Node_or_contents_key_is_indexed of string
| `Commit_parent_key_is_indexed of string
| `Gc_process_error of string
| `Corrupted_gc_result_file of string
| `Gc_process_died_without_result_file of string
| `Gc_forbidden_on_32bit_platforms
| `Invalid_prefix_read of string
| `Invalid_mapping_read of string
| `Invalid_read_of_gced_object of string
| `Inconsistent_store
| `Closed
| `Ro_not_allowed
| `Io_misc of Io.misc_error ]
[@@deriving irmin ~pp]
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicating the list of all these variants is very disappointing. I however understand the point of your diff. Let's keep what you did and keep in mind that this could be improved on the repr side (mirage/repr#99)

Comment on lines 25 to 34
type t = [ Base.t | `Io_misc of Io.misc_error ]

val pp : Format.formatter -> [< t ] -> unit
val pp : [< t ] Fmt.t
val raise_error : [< t ] -> 'a
val log_error : string -> [< t ] -> unit
val catch : (unit -> 'a) -> ('a, [> t ]) result
val catch : (unit -> 'a) -> ('a, t) result
val raise_if_error : ('a, [< t ]) result -> 'a
val log_if_error : string -> (unit, [< t ]) result -> unit
val to_json_string : (int63, [< t ]) result -> string
val of_json_string : string -> (int63, [> t ]) result
val of_json_string : string -> (int63, t) result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you've implemented a typerep for type t, I believe that pp, to_json_string and of_json_string should go in favour of the typerepr (i.e. type t = [ Base.t | `Io_misc of Io.misc_error ] [@@deriving irmin.type] ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I simplified a bit the code (but kept most of the functions you mentioned... I can still inline them at their callsites if you prefer?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes inlining is better, it corresponds to what is done elsewhere in irmin.

@codecov-commenter
Copy link

Codecov Report

Merging #2095 (ef92e1a) into main (a7baa08) will increase coverage by 0.07%.
The diff coverage is 70.37%.

@@            Coverage Diff             @@
##             main    #2095      +/-   ##
==========================================
+ Coverage   64.60%   64.68%   +0.07%     
==========================================
  Files         132      132              
  Lines       15695    15708      +13     
==========================================
+ Hits        10140    10160      +20     
+ Misses       5555     5548       -7     
Impacted Files Coverage Δ
src/irmin-pack/unix/io_errors.ml 55.00% <63.63%> (+20.00%) ⬆️
src/irmin-pack/unix/file_manager.ml 84.06% <69.23%> (+0.45%) ⬆️
src/irmin-pack/layout.ml 92.50% <100.00%> (+0.39%) ⬆️
src/irmin-pack/unix/gc.ml 44.30% <100.00%> (+3.76%) ⬆️
src/irmin-pack/unix/errors.ml 28.88% <0.00%> (ø)
src/irmin-fs/unix/irmin_fs_unix.ml 68.38% <0.00%> (+0.64%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@icristescu icristescu mentioned this pull request Sep 27, 2022
34 tasks
@Ngoguey42 Ngoguey42 mentioned this pull request Sep 30, 2022
@@ -37,6 +37,8 @@ module Worker = struct
len:int63 ->
bytes ->
unit

val gc_output_t : (int63, Args.Errs.t) result Irmin.Type.t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would typically do: type gc_output : (int63, Args.Errs.t) result [@@deriving irmin] both in the signature and the implementation.

Feel free to change or not

@@ -24,54 +24,75 @@ module type S = sig

type t = [ Base.t | `Io_misc of Io.misc_error ]
Copy link
Contributor

@Ngoguey42 Ngoguey42 Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type t = [ Base.t | `Io_misc of Io.misc_error ] [@@deriving irmin] is the typical way to go

@art-w art-w force-pushed the clean branch 3 times, most recently from 7e01b79 to 85b9cf8 Compare October 4, 2022 08:41
@art-w art-w mentioned this pull request Oct 4, 2022
Copy link
Contributor

@icristescu icristescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great!

@icristescu icristescu merged commit 4ab886c into mirage:main Oct 4, 2022
metanivek added a commit to metanivek/opam-repository that referenced this pull request Oct 6, 2022
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2)

CHANGES:

### Added

- **irmin**
  - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris)

- **irmin-graphql**
  -  Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris)
  -  Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris)

- **irmin-pack**
  - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w)

- **irmin-tezos-utils**
  - Add package `irmin-tezos-utils` containing a graphical tool for manual pack
    files analysis. (mirage/irmin#1939, @clecat)

### Changed

- **irmin-pack**
  - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081,
  @icristescu)
  - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack`
    to `irmin-pack.unix` (mirage/irmin#2084, @metanivek)
  - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w)

### Fixed

- **irmin-pack**
  - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42)
  - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106,
    @icristescu)

### Fixed

- **irmin-pack**
  - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with
    gced stores. (mirage/irmin#1919, @icristescu)
metanivek added a commit to metanivek/opam-repository that referenced this pull request Oct 6, 2022
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2)

CHANGES:

### Added

- **irmin**
  - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris)

- **irmin-graphql**
  -  Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris)
  -  Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris)

- **irmin-pack**
  - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w)

- **irmin-tezos-utils**
  - Add package `irmin-tezos-utils` containing a graphical tool for manual pack
    files analysis. (mirage/irmin#1939, @clecat)

### Changed

- **irmin-pack**
  - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081,
  @icristescu)
  - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack`
    to `irmin-pack.unix` (mirage/irmin#2084, @metanivek)
  - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w)

### Fixed

- **irmin-pack**
  - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42)
  - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106,
    @icristescu)
  - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with
    gced stores. (mirage/irmin#1919, @icristescu)
metanivek added a commit to metanivek/opam-repository that referenced this pull request Oct 6, 2022
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2)

CHANGES:

### Added

- **irmin**
  - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris)

- **irmin-graphql**
  -  Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris)
  -  Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris)

- **irmin-pack**
  - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w)

- **irmin-tezos-utils**
  - Add package `irmin-tezos-utils` containing a graphical tool for manual pack
    files analysis. (mirage/irmin#1939, @clecat)

### Changed

- **irmin-pack**
  - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081,
  @icristescu)
  - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack`
    to `irmin-pack.unix` (mirage/irmin#2084, @metanivek)
  - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w)

### Fixed

- **irmin-pack**
  - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42)
  - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106,
    @icristescu)
  - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with
    gced stores. (mirage/irmin#1919, @icristescu)
icristescu pushed a commit to metanivek/opam-repository that referenced this pull request Oct 7, 2022
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2)

CHANGES:

### Added

- **irmin**
  - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris)

- **irmin-graphql**
  -  Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris)
  -  Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris)

- **irmin-pack**
  - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w)

- **irmin-tezos-utils**
  - Add package `irmin-tezos-utils` containing a graphical tool for manual pack
    files analysis. (mirage/irmin#1939, @clecat)

### Changed

- **irmin-pack**
  - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081,
  @icristescu)
  - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack`
    to `irmin-pack.unix` (mirage/irmin#2084, @metanivek)
  - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w)

### Fixed

- **irmin-pack**
  - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42)
  - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106,
    @icristescu)
  - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with
    gced stores. (mirage/irmin#1919, @icristescu)
@irmaTS irmaTS added the tezos-support Support for bugs related to Tezos label Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tezos-support Support for bugs related to Tezos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants