feat(sparql-qlever): support JSON-LD and zipped distributions by ddeboer · Pull Request #397 · ldelements/lde

ddeboer · 2026-05-21T11:36:05Z

Refs netwerk-digitaal-erfgoed/dataset-knowledge-graph#284 — the consumer-side change that takes advantage of this lives in that repo.

Summary

Extends @lde/sparql-qlever so the QLever importer can ingest:

JSON-LD distributions (plain, gzipped, or zipped) — converted to N-Quads in Node before qlever-index runs.
Zip-compressed RDF distributions — dcat:compressFormat=application/zip with a known inner dcat:mediaType. The inner mediaType drives which zip entries are accepted (application/ld+json, application/n-triples, application/n-quads).

Standalone dcat:mediaType=application/zip is intentionally not accepted: the inner RDF format must be declared so we know what to expect inside the archive. Publishers should declare e.g. application/ld+json+zip (which the dataset-register normalizes to mediaType=application/ld+json + compressFormat=application/zip).

Changes

@lde/sparql-qlever
- New preprocess.ts module: JSON-LD → N-Quads via jsonld; zip extraction via yauzl; gzip fallback by file extension when compressFormat is missing; mtime-based output caching.
- Importer.import() now sorts distributions by preference — native formats first, JSON-LD last — so e.g. an nq distribution is tried before a ld+json one.
- Importer.doImport() dispatches through the preprocessor only when needed; the existing gunzip -c | qlever-index path is untouched for plain and gzipped native formats.
- New deps: jsonld, yauzl (+ types).
@lde/dataset
- Add Distribution.compressMimeType getter (strips the IANA prefix from compressFormat).
@lde/distribution-probe
- Add application/zip to compressionTypes so a zip Content-Type no longer raises a format-mismatch warning.
Tests: 12 new unit tests covering preference ordering, JSON-LD conversion, zip extraction, inner-mediaType validation, and the standalone-zip rejection.

- Add JSON-LD support via in-Node preprocessing to N-Quads (jsonld lib). - Add zip extraction when compressFormat=application/zip (yauzl), with the inner mediaType driving entry filtering. Standalone application/zip distributions are rejected: the inner format must be declared. - Sort distributions in Importer.import() to prefer native QLever formats (nt/nq/ttl) over JSON-LD; the preprocessor is only invoked when needed. - Add compressMimeType getter on Distribution that strips the IANA prefix. - Treat application/zip as a compression Content-Type in distribution-probe so a zip Content-Type no longer raises a format-mismatch warning.

…paths

…or native zips - Replace the in-memory jsonld lib with jsonld-streaming-parser + n3.StreamWriter so JSON-LD documents flow through the pipeline as a stream and memory use stays bounded for large distributions. - Restrict Node-side preprocessing to JSON-LD only. Native RDF (nt/nq/ttl) in a zip container is now handled by the shell pipeline via unzip -p, which is already available in the QLever Docker image. - Add application/zip to the importer's compressionTypes set so a server returning that Content-Type doesn't get flagged as a format mismatch.

…pressionMediaTypes Code-review follow-ups on the JSON-LD/zip support: - Switch JSON-LD preprocessing to rdf-parse + rdf-serialize, matching the stack @lde/fastify-rdf already uses. Drops the direct deps on jsonld-streaming-parser and n3 (still present transitively via rdf-parse) and leaves the preprocessor format-agnostic for future formats. - Lift the compression-content-type set into @lde/dataset as compressionMediaTypes and reuse it from the importer and the probe. - Collapse the importer's supportedFormats Set + preferenceOrder Record into a single ordered acceptedMediaTypes list. - preprocess.ts: open the output writable once per call and use a PassThrough tap to keep it open across zip entries; close the yauzl handle in a finally; tighten the mtime cache check to strict greater-than. - Hoist basename(file) in index() and drop the unused PreprocessedFormat type export.

…ough (#292) Refs #284. Companion of ldelements/lde#397, which teaches `@lde/sparql-qlever` to handle JSON-LD and zip-compressed distributions. This PR is the consumer-side change here in the DKG. ## Changes - Emit `dcat:compressFormat` in the CONSTRUCT so the LDE `Distribution` model receives the compression info it now uses to decide between `gunzip -c` (gzip), `unzip -p` (zip) and JSON-LD preprocessing. - `OPTIONAL { ?distribution dcat:compressFormat ?distribution_compressFormat }` added to the WHERE. - Drop the `application/{ld+json,n-quads,n-triples}+gzip` and `text/turtle+gzip` lines from the `FILTER`: the dataset register normalizes those suffixes into a separate `dcat:compressFormat` during ingestion, so those values never appear in `?distribution_mediaType`. The lines were dead. ## What it unlocks Now selectable end-to-end: - Gzipped JSON-LD — e.g. Nijmegen `LOD+Beelddocumenten.jsonld.gz` (`mediaType=application/ld+json` + `compressFormat=application/gzip`). - Plain JSON-LD — already in the FILTER, now actually processable. - Zipped JSON-LD when the publisher declares the inner format — i.e. `encodingFormat=application/ld+json+zip` on the schema.org side, which the register splits into `mediaType=application/ld+json` + `compressFormat=application/zip`. ## What it does NOT unlock The Verhaal van Utrecht (`vvu_verhalen`) distribution mentioned in #284 declares `encodingFormat=application/zip` alone, with the inner format only in a free-text `description`. The pipeline intentionally rejects that — without a declared inner mediaType we can't safely process the archive. The publisher needs to update their schema.org to `application/ld+json+zip`.

ddeboer added 6 commits May 21, 2026 13:35

test(sparql-qlever): cover importer JSON-LD dispatch and zip warning …

dadc2c7

…paths

test(sparql-qlever): cover gzipped JSON-LD and not-need-preprocess guard

47e2ceb

docs(sparql-qlever): update preprocess doc to reference rdf-parse

d63e5a7

ddeboer enabled auto-merge (rebase) May 21, 2026 13:02

refactor(sparql-qlever): use stream.pipe + finished for JSON-LD pipeline

659c9e7

ddeboer merged commit 260757d into main May 21, 2026
2 checks passed

ddeboer deleted the feat/zip-jsonld-distributions branch May 21, 2026 13:08

ddeboer mentioned this pull request May 22, 2026

feat: select JSON-LD/zipped distributions and pass compressFormat through netwerk-digitaal-erfgoed/dataset-knowledge-graph#292

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sparql-qlever): support JSON-LD and zipped distributions#397

feat(sparql-qlever): support JSON-LD and zipped distributions#397
ddeboer merged 7 commits into
mainfrom
feat/zip-jsonld-distributions

ddeboer commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ddeboer commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ddeboer commented May 21, 2026 •

edited

Loading