Skip to content

feat(pipeline): add deduplicate option to SparqlConstructExecutor#309

Merged
ddeboer merged 2 commits intomainfrom
worktree-deduplicate-construct
Mar 24, 2026
Merged

feat(pipeline): add deduplicate option to SparqlConstructExecutor#309
ddeboer merged 2 commits intomainfrom
worktree-deduplicate-construct

Conversation

@ddeboer
Copy link
Copy Markdown
Member

@ddeboer ddeboer commented Mar 24, 2026

Summary

  • Add opt-in deduplicate option to SparqlConstructExecutor that removes duplicate triples from CONSTRUCT output streams using a streaming string-based identity set
  • Export standalone deduplicateQuads() async generator for use outside the executor
  • Quad identity follows rdf-string serialisation conventions (same approach as Comunica's distinctConstruct): exact identity, no hash collisions
  • Dedup set is scoped per execute() call, keeping memory bounded to the number of unique quads per batch

Fix #307

@ddeboer ddeboer force-pushed the worktree-deduplicate-construct branch from e8aebf4 to 76a6886 Compare March 24, 2026 18:21
@ddeboer ddeboer enabled auto-merge (squash) March 24, 2026 18:22
ddeboer added 2 commits March 24, 2026 19:22
- Add opt-in 'deduplicate' option that removes duplicate triples from
  CONSTRUCT output using a streaming string-based identity set
- Export standalone deduplicateQuads() for use outside the executor
- Use rdf-string serialisation conventions (same approach as Comunica's
  distinctConstruct) for exact quad identity without hash collisions
- Dedup set scoped per execute() call, keeping memory bounded to batch
- Document the option in README with usage example
@ddeboer ddeboer force-pushed the worktree-deduplicate-construct branch from 76a6886 to a824f5e Compare March 24, 2026 18:24
@ddeboer ddeboer merged commit 4688d86 into main Mar 24, 2026
2 checks passed
@ddeboer ddeboer deleted the worktree-deduplicate-construct branch March 24, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deduplicate triples in CONSTRUCT output stream

1 participant