Content-fingerprint dedup on the write path (cheap hash dedup before mem0 extraction)

## Summary
Add a cheap, deterministic content-fingerprint check on the `add_memory` / `POST /api/v1/memories` write path so that re-submitting identical content is a no-op **before** we pay for an LLM fact-extraction call.

## Idea from OB1
OB1's [`recipes/content-fingerprint-dedup`](https://github.com/NateBJones-Projects/OB1/tree/main/recipes/content-fingerprint-dedup) solves "if you import the same ChatGPT export twice, you get double the rows." Its algorithm:
1. **Normalize** — lowercase, strip leading/trailing whitespace, collapse runs of whitespace to a single space.
2. **SHA-256** the normalized string → a deterministic 64-char hex fingerprint.
3. **Upsert** — `INSERT ... ON CONFLICT (content_fingerprint) DO UPDATE` (merge metadata on collision, insert otherwise), enforced by a unique index so no caller can bypass it.

Result: "Re-running any import produces 0 new rows for already-imported content."

## Why it fits memserv
- mem0 already deduplicates semantically, but it does so by **invoking the LLM** (Claude Haiku) on every `add()` — see PRD §19 ("Fact-extraction cost scales with `add_memory` call volume"). A pre-extraction hash check skips that cost entirely for byte-identical re-submits, which is exactly what happens during the bulk imports proposed in the import-toolkit issue and on webhook/n8n retries.
- It's a write-path optimization that doesn't touch the single-user / dual-protocol invariants.

## Proposed approach
- Compute a normalized SHA-256 fingerprint of the raw `content` in `app/rest.py` / `app/mcp_server.py` (or a shared helper in `app/memory.py`).
- Store the fingerprint in the memory's mem0 `metadata` (e.g. `metadata["content_fp"]`) on `add`.
- On a new `add`, first check Qdrant for an existing payload with that fingerprint (a payload filter, no vector search needed) and short-circuit with the existing record if found.
- Make it opt-out via a flag/param for callers who deliberately want re-extraction.

## Notes / scope
We can't reuse OB1's Postgres `ON CONFLICT` mechanic directly (we're on Qdrant), but the normalize→hash→check pattern ports cleanly as a payload-filter lookup. Keep it single-user; the fingerprint lives in metadata, not a new table.

Source: https://github.com/NateBJones-Projects/OB1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content-fingerprint dedup on the write path (cheap hash dedup before mem0 extraction) #48

Summary

Idea from OB1

Why it fits memserv

Proposed approach

Notes / scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Content-fingerprint dedup on the write path (cheap hash dedup before mem0 extraction) #48

Description

Summary

Idea from OB1

Why it fits memserv

Proposed approach

Notes / scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions