-
Notifications
You must be signed in to change notification settings - Fork 5
Add delete API for torchstore distributed storage #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Implement delete functionality across the torchstore distributed storage system to allow removal of stored tensors and objects. This adds the delete API to all layers of the storage system: - API layer: Added delete() function with proper documentation - Client layer: Implemented distributed delete across storage volumes with asyncio.gather - Controller layer: Added notify_delete() to maintain key index consistency - Storage layer: Added delete() method to storage implementations with proper error handling The delete operation ensures data consistency by removing entries from all storage volumes where the key exists and updating the controller's key index accordingly. Also includes comprehensive test coverage that verifies: - Sequential deletion of multiple tensors - Verification that deleted keys no longer exist - Confirmation that remaining keys are unaffected during incremental deletion - Proper error handling when attempting to access deleted keys Test Plan: Added test_delete() function that: - Tests deletion across multiple storage volumes and processes - Verifies each deletion incrementally (delete one, verify it's gone, verify others remain) - Confirms deleted tensors cannot be retrieved (proper error handling) - Runs with different transport and strategy configurations like existing tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm! Left a comment on correct order of operations. If we notify of delete first, this presents other storage volumes from initiating new read requests while we're attempting to delete.
|
Wait for tests to pass please! |
|
@kaiyuan-li |
|
Feel free to merge and I'll fix the CI tests. |
Implement delete functionality across the torchstore distributed storage system to allow removal of stored tensors and objects.
Current problems
actor_mesh.slice()seems to accept different arguments. Work around is to usechooseinstead of explicitly choosing a slice.put().See P1955672937 for error logs.
@LucasLLC @kaiyuan-li what is your monarch version?