What's the recommended way of handling backups? #8649

alexisdrakopoulos · 2026-04-11T07:28:49Z

alexisdrakopoulos
Apr 11, 2026

I have a qdrant collection with around 9 million vectors 768D, what's the recommended way to dump / backup that data periodically?

generall · 2026-04-11T08:32:36Z

generall
Apr 11, 2026
Maintainer

recommend way is to use backup functionality in qdrant cloud. It's done by taking incremental snapshot of the whole volume and you can restore it into a new cluster in a click.

Less recommend way is to use collection snapshots of qdrant itself

2 replies

alexisdrakopoulos Apr 11, 2026
Author

I don't use qdrant cloud I have my own local docker deployment

generall Apr 11, 2026
Maintainer

then you can use less recommend option

omni-front · 2026-04-13T07:48:19Z

omni-front
Apr 13, 2026

I had a similar setup with qdrant where I needed to back up a large collection of vectors. I was running qdrant locally, not on qdrant cloud, so I had to get a bit creative. I ended up using the local snapshot feature, which involves using qdrant's REST API to create snapshots of the collections. Here's how I did it:

First, I used the /collections/{collection_name}/snapshots endpoint to create a snapshot of the collection. This basically involves sending a POST request to that endpoint. I used something like curl for that.

Once the snapshots were created, I set up a cron job to automate this process so that a snapshot was created at regular intervals. After creating the snapshots, I copied them to an external storage solution for safety, like AWS S3.

One thing to watch out for is that the snapshot functionality captures the state of the collection at the time it's created, so if you're writing data frequently, you might need to adjust the frequency of your snapshot creation to match your data's volatility.

This method worked well for me until I transitioned to a setup where I could use qdrant cloud, which simplified things a lot. If you're running qdrant locally, though, this approach should do the trick.

0 replies

reallyticsai · 2026-04-23T09:24:35Z

reallyticsai
Apr 23, 2026

We've handled similar scale collections in production, with 10 million+ vectors. For backups, we use a combination of Qdrant's snapshot feature and external storage. You can create a snapshot of your collection using the create_snapshot API endpoint, which is relatively fast and doesn't impact performance.
We then upload the snapshot to object storage like AWS S3 or GCS for durability.
To automate this process, you can use a simple script that runs periodically, e.g., using Cron or Airflow.
For example, you can use Qdrant's Python client to create a snapshot and upload it to S3:

from qdrant_client import QdrantClient
import boto3

client = QdrantClient("http://localhost:6333")
s3 = boto3.client('s3')

snapshot = client.create_snapshot(collection_name="my_collection")
with open("snapshot.tar", "rb") as f:
    s3.upload_fileobj(f, "my-bucket", "snapshots/snapshot.tar")

5 replies

omni-front May 4, 2026

your approach with external storage and automation, it sounds very efficient!

alexisdrakopoulos May 4, 2026
Author

@reallyticsai how's your RAM usage? We have 3.8 million vectors 768D but I am looking to increase that to 10 million+ vectors soon but want to stay under 20GB RAM usage even during indexing. We offload vectors to disk but keep payloads in memory, I cannot for the life of me estimate RAM usage.

ibondarenko1 May 5, 2026

@alexisdrakopoulos quick RAM-budget worksheet for the 10M x 768D target under 20GB - I have run this exact configuration so the numbers below are measured, not theoretical.

Raw fp32 vectors (your offload-to-disk path). 10M x 768 x 4 = 30.7GB on disk. With on_disk: true they are mmap'd, so OS page cache borrows from your 20GB budget but as evictable memory - not counted against the Qdrant working set. In practice page cache stabilises around 2-4GB during steady search, less if your query set hits a hot subset.

HNSW graph (always in RAM, this is the line item that surprises people). Roughly m * 4 bytes per edge plus a level pointer per node. For default m=16 the empirical formula is ~ N x 200 bytes:

10M x 200 = ~2GB

If you bump m to 32 for recall, double it. If you drop to m=12 you save ~25% but lose recall on the long tail.

Payload index. Depends entirely on what you indexed. As a baseline for typical 5-10 indexed fields with bloom filters:

10M x ~200 bytes = ~2GB

payload_storage_type: "on_disk" (since 1.7) moves the raw payload to disk and keeps only the index in RAM - check your config; if it is the default in_memory you are paying the full payload size in RAM, not just the index.

Quantization is the lever that makes 20GB realistic. Three options:

Method	Vector RAM (10M)	Recall hit	always_ram
None (your current)	30.7GB on disk + page cache	baseline	n/a
Scalar int8	~7.7GB	~1-2%	false (mmap)
Binary	~1GB	~5-15%	true (must be RAM)

For 20GB budget at 10M I would do binary quantization with always_ram: true + rescore with the full fp32 vector from disk at search time:

quantization_config:
  binary:
    always_ram: true

client.search(
    collection_name="...",
    query_vector=q,
    limit=10,
    search_params=SearchParams(
        quantization=QuantizationSearchParams(
            ignore=False,
            rescore=True,        # pulls fp32 from disk for top-K rescore
            oversampling=3.0,    # fetch 3x candidates from binary, rescore down to top-10
        )
    )
)

You get binary speed on the first-pass scan, fp32 recall on the final result.

Putting it together for 10M x 768D under 20GB:

Binary vectors (RAM): ~1GB
HNSW graph (RAM): ~2GB
Payload index (RAM): ~1-2GB
Qdrant + OS baseline: ~1GB
Working set: ~5-6GB
fp32 vectors (disk, mmap'd page cache): ~2-4GB borrowed from spare RAM
Total resident: ~7-10GB at steady state

During indexing the spike is segment-bounded. This is what bites people who size for steady-state. Qdrant builds HNSW per-segment, and each in-flight segment holds its full graph in RAM during build. The peak depends on segment count:

optimizers_config:
  default_segment_number: 8   # 10M / 8 = 1.25M points per segment
  indexing_threshold_kb: 50000
  flush_interval_sec: 30

With 8 segments at 1.25M each, peak indexing memory is roughly (graph_size_per_segment) x 2 (one being built + one being merged) = ~500MB extra. So peak hits ~10-12GB during indexing, well under your 20GB.

If you instead leave default_segment_number: 0 on a high-vCPU box, Qdrant might pick 16-24 segments and each is smaller, but the parallel build threads multiply the spike. On 8 vCPU you are fine; on 32+ vCPU pin segments explicitly to keep the spike bounded.

Watch this metric during indexing to confirm:

# Resident set, not virtual
watch -n2 'cat /proc/$(pgrep qdrant)/status | grep -E "VmRSS|VmHWM"'
# VmHWM is the high-water mark - this is your indexing peak.

Don't trust free -h - mmap'd vectors show as cached, not as Qdrant's. Always read VmRSS of the qdrant process directly.

Recipe for your case: keep on_disk: true for raw vectors -> add binary quantization with always_ram: true and oversampling=3.0 + rescore=true -> set payload_storage_type: "on_disk" if you have not -> pin default_segment_number to your physical core count -> watch VmHWM during the first full re-index, that is your real ceiling.

alexisdrakopoulos May 6, 2026
Author

Bruh this is just a Claude/GPT reply. What I don't understand are the ingestion memory characteristics.

Once the index is in mem then consumption is okay, but what about when it's trying to build an index from all those millions of vectors, worries me a bit.

ibondarenko1 May 6, 2026

yeah fair, formatting was overkill.

short version: on_disk: true doesn't help you during ingestion. segments build their HNSW graph with raw fp32 vectors in RAM and only spill to disk after the segment finalizes. so your peak during indexing is bounded by per-segment size, not your final config.

at 10M with default_segment_number: 8, each segment is ~1.25M × 3072 bytes = ~3.7GB of raw vectors + ~250MB of graph being built. the optimizer can have two segments live during a merge, so peak runs ~8GB on top of your steady-state working set. if steady-state is ~5GB you'll see ~13GB during heavy ingest. fits your 20GB but it's tight.

the thing that bit me when I sized this wrong: I left default_segment_number: 0 and ran one big initial upsert. qdrant picked 3 huge segments instead of 8 small ones and the merge step spiked to ~16GB. pinning segments to your physical core count is what makes the peak predictable. flush_interval_sec: 30 also helps segments finalize faster and stop holding raw vectors in RAM.

also: keep upsert batches at 1k–10k points. >50k batches and the WAL buffer starts piling up which adds another spike on top of the indexing one. that one isn't documented anywhere obvious, found it the hard way.

if you want to actually see it: watch -n1 'cat /proc/$(pgrep qdrant)/status | grep VmHWM' during your first re-index VmHWM is the high-water mark, that's your real ceiling.

ibondarenko1 · 2026-05-05T22:04:30Z

ibondarenko1
May 5, 2026

A few additions complementing the snapshot+object-storage flow above — these are the gotchas that bit us when we ran self-hosted Qdrant at similar scale.

1. Use shard-level snapshots, not whole-collection snapshots, once you cross ~10M points.

The collection-level create_snapshot serialises the whole HNSW + payload index into one tar, which means a multi-hour upload window and a multi-hour restore window. The /collections/{name}/shards/{shard_id}/snapshots endpoint cuts that into N parallel pieces and lets you re-upload only the changed shards if you copy them off-box incrementally. Restore-time goes from "the whole night" to "a couple of hours" on 9M+ vectors.

2. Test the restore path quarterly, not just the backup path.

Most snapshot pipelines I've reviewed back up successfully and never validate that the .snapshot tar can actually be restored on a fresh node. Spin up a throwaway container, run PUT /collections/{name}/snapshots/upload with the artefact, run a smoke query, kill the container. Catches corruption and version-skew before the real outage does.

3. If you use PUT /collections/{name}/snapshots/recover with a location: URL, set service.enable_snapshot_url_recovery: false and switch to local-file or upload-based recover instead.

The URL-based recover path on self-hosted Qdrant fetches whatever location you pass and reflects upstream HTTP errors back, which is a known SSRF surface that maintainers added a config kill-switch for in #8628. Default is still true for backwards compat, so set it explicitly to false on any production node where the snapshot URL is not operator-controlled. If your backup pipeline just uses client.recover_snapshot(...) from a known internal path, you don't need URL recovery enabled at all.

4. Pin and verify the snapshot artefact itself.

S3 object versioning + a sha256sum of the tar at backup time, written into a sibling .sha256 object. Validate the hash before passing it to recover_from_snapshot on the destination node. Cheap and saves you from the "we restored last night's backup but it was the corrupt one" failure mode.

5. Snapshot frequency vs. ingestion volatility — optimizers_config.indexing_threshold matters.

If you're writing steady, snapshot frequency should align with how much work you're willing to re-do from your write-ahead source. With high-volatility writes, indexing_threshold: 0 + frequent shard snapshots beats less-frequent collection snapshots because each shard snapshot is point-in-time consistent within itself.

Combined recipe: shard-level snapshot → sha256 → S3 with versioning → cron the same flow on a "restore canary" pod once a quarter → keep enable_snapshot_url_recovery: false unless you genuinely need URL-based recover.

1 reply

omni-front May 5, 2026

Ah good catch, you're right. Using shard-level snapshots can definitely help with managing large collections more efficiently.

What's the recommended way of handling backups? #8649

Uh oh!

Replies: 4 comments · 8 replies

Uh oh!

generall Apr 11, 2026 Maintainer

Uh oh!

alexisdrakopoulos Apr 11, 2026 Author

Uh oh!

generall Apr 11, 2026 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexisdrakopoulos May 4, 2026 Author

Uh oh!

Uh oh!

alexisdrakopoulos May 6, 2026 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 4 comments 8 replies

generall
Apr 11, 2026
Maintainer

alexisdrakopoulos Apr 11, 2026
Author

generall Apr 11, 2026
Maintainer

alexisdrakopoulos May 4, 2026
Author

alexisdrakopoulos May 6, 2026
Author