Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sealing: More complete snapdeals abort cleanup #9648

Merged
merged 8 commits into from Nov 28, 2022

Conversation

magik6k
Copy link
Contributor

@magik6k magik6k commented Nov 15, 2022

Related Issues

Will fix #8491

Might fix:

Proposed Changes

  • Cleanup all things which need cleaning up in AbortUpgrade handler
  • Fix FinalizeSector
    • Move all logic around managing unsealed data to ReleaseUnsealed / extend ReleaseUnseaed to be able to partially free the unsealed file
    • Simplify Finalize* functions
      • Should fix a lot of issues with finalize where it can't be scheduled, or is scheduled on the wrong machine

Additional Info

This PR changes some worker APIs, so it bumps the worker API version - IT REQUIRES UPDATING BOTH lotus-miner AND lotus-worker

Checklist

Before you mark the PR ready for review, please make sure that:

  • Commits have a clear commit message.
  • PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
    • area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
  • New features have usage guidelines and / or documentation updates in
  • Tests exist for new functionality or change in behavior
  • CI is green

@magik6k magik6k force-pushed the fix/snap-abort-cleanup branch 3 times, most recently from 8729c91 to 201e59a Compare November 16, 2022 17:48
Comment on lines +436 to +441
if err := m.sealer.ReleaseUnsealed(ctx.Context(), m.minerSector(sector.SectorType, sector.SectorNumber), []storiface.Range{}); err != nil {
log.Error(err)
}

if err := m.sealer.ReleaseUnsealed(ctx.Context(), m.minerSector(sector.SectorType, sector.SectorNumber), sector.keepUnsealedRanges(sector.CCPieces, true, cfg.AlwaysKeepUnsealedCopy)); err != nil {
// and makes sure sealed/cache files only exist in long-term-storage
if err := m.sealer.FinalizeSector(ctx.Context(), m.minerSector(sector.SectorType, sector.SectorNumber)); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main fix - we weren't cleaning up sealed/cache files at all, and we were passing ranges to keep to ReleaseUnsealed.

@magik6k magik6k changed the title [wip] itests: Fix snapdeals abort cleanup itests: Fix snapdeals abort cleanup Nov 16, 2022
@magik6k magik6k changed the title itests: Fix snapdeals abort cleanup fix: sealing: More complete snapdeals abort cleanup Nov 16, 2022
@magik6k magik6k marked this pull request as ready for review November 16, 2022 21:01
@magik6k magik6k requested a review from a team as a code owner November 16, 2022 21:01
@magik6k magik6k added impact/api-breakage Impact: API Breakage kind/bug Kind: Bug area/sealing labels Nov 18, 2022
if moveUnsealed != storiface.FTNone && len(keepUnsealed) != 0 {
err = multierr.Append(err, move(moveUnsealed))

{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a code block here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just enclosing sealedStores in a tighter scope so it's easier to tell that it's not used anywhere below; basically just a code style thing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it important that those are not used below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not important outside making it easier to see where variables are used

@@ -67,7 +69,8 @@ var shortNames = map[TaskType]string{
TTCommit1: "C1",
TTCommit2: "C2",

TTFinalize: "FIN",
TTFinalize: "FIN",
TTReleaseUnsealed: "FUS",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RUS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with FUS (to mean FinalizeUnsealed), to be consistent with other finalization-related short names; Tho I guess the task should be called FinalizeUnsealed in that case..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed but kept some other ReleaseUnsealed in less visible places to avoid breaking the API too much; This code will realistically all get replaced when the new scheduler comes, so just a non-confusing UX is probably good enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we should keep the names of the API and tasks consistent considering thats the pattern followed for others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change the already existing method on the miner api - ReturnReleaseUnsealed was already there; This has the subtle side effect of making lotus-miner technically forward-compatible with workers (e.g. if you downgrade lotus-miner when you have newer versions of workers still running, and trying to return ReleaseUnsealed those results will get recorded correctly)

tbh it doesn't matter that much, just felt like the right approach.

case id == wpaths[0].ID: // worker path
if workers != storiface.FTNone {
require.Len(t, decls, 1)
require.EqualValues(t, decls[0].SectorFileType.Strings(), workers.Strings())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected and actual args are reversed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that's a nitpick :p

@RobQuistNL
Copy link
Contributor

Good stuff! Will let you know if we run into it again with this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lotus-miner fails to remove files from workers (Both Snap and Regular)
3 participants