What's the recommended way of handling backups? #8649
Replies: 4 comments 8 replies
-
|
recommend way is to use backup functionality in qdrant cloud. It's done by taking incremental snapshot of the whole volume and you can restore it into a new cluster in a click. Less recommend way is to use collection snapshots of qdrant itself |
Beta Was this translation helpful? Give feedback.
-
|
I had a similar setup with qdrant where I needed to back up a large collection of vectors. I was running qdrant locally, not on qdrant cloud, so I had to get a bit creative. I ended up using the local snapshot feature, which involves using qdrant's REST API to create snapshots of the collections. Here's how I did it: First, I used the Once the snapshots were created, I set up a cron job to automate this process so that a snapshot was created at regular intervals. After creating the snapshots, I copied them to an external storage solution for safety, like AWS S3. One thing to watch out for is that the snapshot functionality captures the state of the collection at the time it's created, so if you're writing data frequently, you might need to adjust the frequency of your snapshot creation to match your data's volatility. This method worked well for me until I transitioned to a setup where I could use qdrant cloud, which simplified things a lot. If you're running qdrant locally, though, this approach should do the trick. |
Beta Was this translation helpful? Give feedback.
-
|
We've handled similar scale collections in production, with 10 million+ vectors. For backups, we use a combination of Qdrant's snapshot feature and external storage. You can create a snapshot of your collection using the from qdrant_client import QdrantClient
import boto3
client = QdrantClient("http://localhost:6333")
s3 = boto3.client('s3')
snapshot = client.create_snapshot(collection_name="my_collection")
with open("snapshot.tar", "rb") as f:
s3.upload_fileobj(f, "my-bucket", "snapshots/snapshot.tar") |
Beta Was this translation helpful? Give feedback.
-
|
A few additions complementing the snapshot+object-storage flow above — these are the gotchas that bit us when we ran self-hosted Qdrant at similar scale. 1. Use shard-level snapshots, not whole-collection snapshots, once you cross ~10M points. The collection-level 2. Test the restore path quarterly, not just the backup path. Most snapshot pipelines I've reviewed back up successfully and never validate that the 3. If you use The URL-based recover path on self-hosted Qdrant fetches whatever 4. Pin and verify the snapshot artefact itself. S3 object versioning + a 5. Snapshot frequency vs. ingestion volatility — If you're writing steady, snapshot frequency should align with how much work you're willing to re-do from your write-ahead source. With high-volatility writes, Combined recipe: shard-level snapshot → |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a qdrant collection with around 9 million vectors 768D, what's the recommended way to dump / backup that data periodically?
Beta Was this translation helpful? Give feedback.
All reactions