-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purge orphan shards #1838
Purge orphan shards #1838
Conversation
This pull request has been linked to Shortcut Story #8866: Purge orphan shards. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1838 +/- ##
==========================================
- Coverage 84.32% 83.89% -0.44%
==========================================
Files 328 328
Lines 18697 18695 -2
==========================================
- Hits 15767 15684 -83
- Misses 2930 3011 +81
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about providing this logic in the purge command but only really delete orphaned shards if a --shards
argument is passed?
If it is not passed (which would be the default), we simply log which ones were found so we can investigate and then run the command manually with the --shards
on.
04a5d0a
to
6dd50c9
Compare
# Log an error in case we found a shard stored but not indexed, this should | ||
# never happen as shards are created in the index node and then stored in | ||
# maindb | ||
not_indexed_shards = stored_shards.keys() - indexed_shards.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to use .items()
instead of .keys()
here. That way, it compares that (shard_id, node_id, kb_id) match and it will also catch the case where the maindb and index agree that a shard exist but disagree on which KB it belongs to or in which node it is present.
That shouldn't happen and it means a shard is very broken, but since I think it would be easy to add, it might be worth it. It it gets even a bit complicated, this is probably not worth it since we will learn about this from errors in other parts of the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I change what Ferran pointed above, we won't have the kbid for each one and this won't be possible
1ecff19
to
eeac8cf
Compare
orphan_shards = await detect_orphan_shards(driver) | ||
logger.info(f"Orphan shards detect found {len(orphan_shards)} orphans") | ||
await report_orphan_shards(orphan_shards, driver) | ||
if args.purge: | ||
await purge_orphan_shards(driver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If purge=True
we will be running detect_orphan_shards
.
you could change it to something like:
if args.purge:
await purge_orphan_shards()
else:
await report_orphan_shards()
And each internally calls detect_orphan_shards
nucliadb can access ingest's fixtures
This reverts commit 206c45d0b07993bec029d666b5066b1675af4797.
9c8f958
to
e2c1fb2
Compare
Description
Orphan shards exists for multiple reasons:
This PR extends purge command to remove indexed shards that doesn't belong to any live KB
How was this PR tested?
Integration tests