Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot store #3643

Merged
merged 8 commits into from
Feb 28, 2024
Merged

Snapshot store #3643

merged 8 commits into from
Feb 28, 2024

Conversation

AlexandruCihodaru
Copy link
Contributor

@AlexandruCihodaru AlexandruCihodaru commented Feb 19, 2024

This PR is a first step in an attempt to fix #3324.
To be able to add S3 support and other storage back ends later I defined a trait that will contain all needed methods to handle created snapshots. I took a simplifying assumption that the snapshots will first be created in a temp local place and after the snapshotting is completed we will move the snapshot and checksum.
After S3 support is added, it should be straight forward to implement support for additional storage back ends, the only needed action being implementing the trait for the new type.

To keep the PR small and easy to review, currently it only handles create_snapshot,delete,list (collection/full/shard). If the direction is good others could be implemented in subsequent PRs without breaking compatibility.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@@ -53,7 +53,7 @@ pub struct StorageConfig {
pub storage_path: String,
#[serde(default = "default_snapshots_path")]
#[validate(length(min = 1))]
pub snapshots_path: String,
pub snapshots_path: Option<String>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is needed because in the future there could be either snapshot_path for local storage either some details for the storage back end. It would have been cleaner with an enum in here but I believe it would break backwards compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:check:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@generall an enum would mean to also have a custom deserialization implementation and everything, overall I felt that it was cleaner how I implemented it here, but for reference this is how it would look like AlexandruCihodaru@887d426

snapshots_path: snapshots_path.to_owned(),
snapshot_storage_manager: Self::create_snapshot_storage_manager(snapshots_path),
channel_service,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in the future snapshots_path could be remove so we will avoid info duplication, the path can be retrieved from the snapshot_storage_manager.

@@ -50,6 +51,7 @@ pub struct Collection {
this_peer_id: PeerId,
path: PathBuf,
snapshots_path: PathBuf,
snapshot_storage_manager: Box<dyn SnapshotStorage>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be static dispatch by making Collection generic over SnapshotStorage. We could switch to it if snapshotting happens often because it is cheaper.

@generall generall self-requested a review February 23, 2024 11:07
@AlexandruCihodaru
Copy link
Contributor Author

Hey @generall, I have addressed your comments. In a subsequent PR we should also move the get_snapshot_description and get_snapshot/checksum_path into the trait because we will have different implementations for the LocalFS and S3.

* Define a trait with methods for managing snapshots after their
  creation. The goal of this trait is to allow implementation of
  snapshot management logic for different storage backends without major
  changes in the code.

Signed-off-by: Alexandru Cihodaru <alexandru.cihodaru@gmail.com>
* Move the snapshot deletion logic into the implementation of
  SnapshotStorage for LocalFileSystemConfig.
* Replace snapshot deletion logic with calls to the SnapshotStorage
  implementation.

Signed-off-by: Alexandru Cihodaru <alexandru.cihodaru@gmail.com>
* Move the snapshot listing logic into the implementation of
  SnapshotStorage for LocalFileSystemConfig.

Signed-off-by: Alexandru Cihodaru <alexandru.cihodaru@gmail.com>
* Move the snapshot creation logic into the implementation of
  SnapshotStorage for LocalFileSystemConfig.

Signed-off-by: Alexandru Cihodaru <alexandru.cihodaru@gmail.com>
@AlexandruCihodaru
Copy link
Contributor Author

Hey @generall I was wondering if the desired outcome is to use S3 like a "persistent" storage or to use it for any snapshot related actions. If we targeting the first option than I think it is possible to bring locally from S3 the snapshot dir before initializing the TableOfContent in here

let snapshots_path = Path::new(&storage_config.snapshots_path.clone()).to_owned();
and I believe that there will not be a lot of additional changes after this PR.

@generall
Copy link
Member

/tip $200

Copy link

algora-pbc bot commented Feb 28, 2024

Copy link

algora-pbc bot commented Feb 28, 2024

@AlexandruCihodaru: You just got a $200 tip! 👉 Complete your Algora onboarding to collect your payment.

Copy link

algora-pbc bot commented Feb 28, 2024

🎉🎈 @AlexandruCihodaru has been awarded $200! 🎈🎊

@generall generall merged commit 61fa4e7 into qdrant:dev Feb 28, 2024
17 checks passed
timvisee pushed a commit that referenced this pull request Mar 5, 2024
* [snapshots]: Define trait for snapshots managemet

* Define a trait with methods for managing snapshots after their
  creation. The goal of this trait is to allow implementation of
  snapshot management logic for different storage backends without major
  changes in the code.

Signed-off-by: Alexandru Cihodaru <alexandru.cihodaru@gmail.com>

* [snapshots]: Move snapshot deletion logic

* Move the snapshot deletion logic into the implementation of
  SnapshotStorage for LocalFileSystemConfig.
* Replace snapshot deletion logic with calls to the SnapshotStorage
  implementation.

Signed-off-by: Alexandru Cihodaru <alexandru.cihodaru@gmail.com>

* [snapshots]: Move snapshot list logic

* Move the snapshot listing logic into the implementation of
  SnapshotStorage for LocalFileSystemConfig.

Signed-off-by: Alexandru Cihodaru <alexandru.cihodaru@gmail.com>

* [snapshots]: Move snapshot creation logic

* Move the snapshot creation logic into the implementation of
  SnapshotStorage for LocalFileSystemConfig.

Signed-off-by: Alexandru Cihodaru <alexandru.cihodaru@gmail.com>

* review fixes

* fmt

* refactor full storage snapshot

* create directory on get_stored_file + make sure temp archive file is removed

---------

Signed-off-by: Alexandru Cihodaru <alexandru.cihodaru@gmail.com>
Co-authored-by: generall <andrey@vasnetsov.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants