Conversation
Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
|
@copilot Review und Code Audit. Wurden alle Aufgaben des issue erfüllt? |
…8 tests Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
Audit complete. Here's the status of all 8 acceptance criteria:
Two gaps fixed in commit AC-4: AC-8: No overhead test existed. Added opt-in test |
There was a problem hiding this comment.
Pull request overview
Adds opt-in performance tests and a dedicated CI workflow for the perceptual-hashing content deduplication feature, and wires an enable_deduplication per-ingest config flag into ContentManager::ingestRawBlob() to support per-collection policy gating.
Changes:
- Add AC-4-focused correctness tests plus opt-in perf tests for pHash, MinHash+LSH lookup, and dedup overhead.
- Gate perceptual dedup check/registration in
ContentManager::ingestRawBlob()usingconfig["enable_deduplication"]. - Introduce a GitHub Actions workflow to build and run the focused content deduplication tests on relevant changes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tests/test_content_deduplication.cpp |
Adds AC-4 gate tests and opt-in perf benchmarks for perceptual dedup components. |
src/content/content_manager.cpp |
Adds a per-ingest enable_deduplication gate for perceptual dedup check + post-store registration. |
.github/workflows/content-dedup-perceptual-hashing-ci.yml |
New CI workflow to run focused test suite for perceptual dedup changes. |
You can also share your feedback on Copilot code review. Take the survey.
| // ---- Perceptual deduplication (opt-in via ContentPolicy::enable_deduplication) ---- | ||
| // Callers pass `config["enable_deduplication"] = policy.enable_deduplication`. | ||
| // When the key is absent, falls back to the ProcessorChainConfig stage flag so | ||
| // that existing callers that do not set this key are unaffected. | ||
| // Compute pHash (image) or MinHash (text) once; reuse for both the duplicate | ||
| // check and the post-storage registration to avoid redundant computation. | ||
| const bool dedup_policy_enabled = config.value("enable_deduplication", stage_cfg.deduplication.enabled); | ||
| const bool dedup_is_image = (category == ContentCategory::IMAGE); | ||
| const bool dedup_is_text = (category == ContentCategory::TEXT); | ||
| std::string cached_phash; | ||
| std::vector<uint32_t> cached_minhash; | ||
|
|
||
| if (stage_cfg.deduplication.enabled && dedup_checker_ && (dedup_is_image || dedup_is_text)) { | ||
| if (dedup_policy_enabled && dedup_checker_ && (dedup_is_image || dedup_is_text)) { | ||
| metrics_.dedup_checks_total.fetch_add(1); |
| /** | ||
| * @brief Verify that ContentPolicy::enable_deduplication defaults to false | ||
| * (no dedup key in config → falls back to ProcessorChainConfig stage default). | ||
| * When no checker is attached, no dedup happens regardless of config. | ||
| */ | ||
| TEST_F(ContentSHA256DedupTest, PerceptualDedupDefaultsToOffWithoutChecker) { | ||
| // No checker attached — dedup must never fire. | ||
| ASSERT_EQ(mgr_->getDeduplicationChecker(), nullptr); | ||
|
|
||
| const std::string blob = "some text content for default-off dedup test"; | ||
| auto res1 = mgr_->ingestRawBlob(blob, "a.txt", "text/plain"); | ||
| auto res2 = mgr_->ingestRawBlob(blob + " ", "b.txt", "text/plain"); | ||
| ASSERT_TRUE(res1.success); | ||
| ASSERT_TRUE(res2.success); | ||
| EXPECT_FALSE(res2.metadata.contains("duplicate_of")) | ||
| << "Dedup must not fire when no DeduplicationChecker is attached"; | ||
| } |
| ci-scope-classifier: | ||
| permissions: | ||
| contents: read | ||
| uses: ./.github/workflows/ci-scope-classifier.yml |
|
@copilot fix it |
…I workflow path Co-authored-by: makr-code <150588092+makr-code@users.noreply.github.com>
|
@copilot retry |
.github/workflows/content-dedup-perceptual-hashing-ci.ymlPerceptualDedupSkippedWhenPolicyDisabled,PerceptualDedupDefaultsToOffWhenKeyAbsentcontent_manager.cpp—enable_deduplicationdefault changed tofalse(opt-in); ANDed withstage_cfg.deduplication.enabledso stage config still acts as a global kill switchtest_content_deduplication.cpp—PerceptualDedupDefaultsToOffWhenKeyAbsentnow attaches a real checker, registers an image withenable_deduplication=true, then re-ingests without the key and verifies noduplicate_ofmetadata is producedci-scope-classifierpath to./.github/workflows/01-core/ci-scope-classifier.ymlOriginal prompt
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.