Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge orphan shards #1838

Merged
merged 21 commits into from
Feb 26, 2024
Merged

Conversation

jotare
Copy link
Contributor

@jotare jotare commented Feb 14, 2024

Description

Orphan shards exists for multiple reasons:

  • a node is not available when removing a KB and shard deletion is never retried
  • old versions of rollover/migration code left orphan shards

This PR extends purge command to remove indexed shards that doesn't belong to any live KB

How was this PR tested?

Integration tests

@jotare jotare requested a review from a team February 14, 2024 16:33
Copy link

This pull request has been linked to Shortcut Story #8866: Purge orphan shards.

Copy link

codecov bot commented Feb 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.89%. Comparing base (62a1240) to head (e2c1fb2).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1838      +/-   ##
==========================================
- Coverage   84.32%   83.89%   -0.44%     
==========================================
  Files         328      328              
  Lines       18697    18695       -2     
==========================================
- Hits        15767    15684      -83     
- Misses       2930     3011      +81     
Flag Coverage Δ
ingest 68.81% <ø> (-0.65%) ⬇️
sdk 87.85% <ø> (ø)
utils 81.81% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@lferran lferran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about providing this logic in the purge command but only really delete orphaned shards if a --shards argument is passed?

If it is not passed (which would be the default), we simply log which ones were found so we can investigate and then run the command manually with the --shards on.

@jotare jotare force-pushed the joanantoniriera4168/sc-8866/purge-orphan-shards branch from 04a5d0a to 6dd50c9 Compare February 16, 2024 12:30
# Log an error in case we found a shard stored but not indexed, this should
# never happen as shards are created in the index node and then stored in
# maindb
not_indexed_shards = stored_shards.keys() - indexed_shards.keys()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to use .items() instead of .keys() here. That way, it compares that (shard_id, node_id, kb_id) match and it will also catch the case where the maindb and index agree that a shard exist but disagree on which KB it belongs to or in which node it is present.

That shouldn't happen and it means a shard is very broken, but since I think it would be easy to add, it might be worth it. It it gets even a bit complicated, this is probably not worth it since we will learn about this from errors in other parts of the application.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I change what Ferran pointed above, we won't have the kbid for each one and this won't be possible

@jotare jotare force-pushed the joanantoniriera4168/sc-8866/purge-orphan-shards branch from 1ecff19 to eeac8cf Compare February 20, 2024 12:30
Comment on lines 236 to 254
orphan_shards = await detect_orphan_shards(driver)
logger.info(f"Orphan shards detect found {len(orphan_shards)} orphans")
await report_orphan_shards(orphan_shards, driver)
if args.purge:
await purge_orphan_shards(driver)
Copy link
Contributor

@lferran lferran Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If purge=True we will be running detect_orphan_shards.
you could change it to something like:

if args.purge:
   await purge_orphan_shards()
else:
   await report_orphan_shards()

And each internally calls detect_orphan_shards

@jotare jotare force-pushed the joanantoniriera4168/sc-8866/purge-orphan-shards branch from 9c8f958 to e2c1fb2 Compare February 23, 2024 16:38
@jotare jotare merged commit 53edc50 into main Feb 26, 2024
83 checks passed
@jotare jotare deleted the joanantoniriera4168/sc-8866/purge-orphan-shards branch February 26, 2024 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants