Skip to content

Refactor/valid expanded json ld#44

Merged
zonotope merged 15 commits intomainfrom
refactor/valid-expanded-json-ld
Aug 14, 2025
Merged

Refactor/valid expanded json ld#44
zonotope merged 15 commits intomainfrom
refactor/valid-expanded-json-ld

Conversation

@zonotope
Copy link
Contributor

@zonotope zonotope commented Jul 8, 2025

This patch updates the expand output to be valid JSON-LD instead of our proprietary format while retaining keyword context functionality that allows for namespaced keywords as compact iris. It changes the :id, :value, :type, :list, and :language keyword keys found in the expanded output to @id, @value, @type , @list, and @language keys, respectively, and it also removes the :idx key from the expanded output while still reporting on the current index when any errors occur.

This changes the output contract, so we should bump the library version to 2.0.0 if we choose to merge this patch.

zonotope added 7 commits July 7, 2025 15:47
This change updates the JSON-LD expansion implementation to produce
output that conforms to the W3C JSON-LD specification, replacing
the previous proprietary format with standard JSON-LD keywords.

Key changes:
- Use "@value" instead of ":value" for literal values
- Use "@id" instead of ":id" for node identifiers
- Use "@type" instead of ":type" for data types
- Use "@language" instead of ":language" for language tags
- Use "@list" instead of ":list" for ordered lists
- Remove proprietary ":idx" metadata from expanded output

The expansion now produces valid JSON-LD expanded form that can be
processed by other JSON-LD processors, while maintaining backward
compatibility for input formats (both compact JSON-LD and Clojure
keyword forms are still supported).

Example transformation:
Before: {:id "uri", "prop" [{:value "text", :type nil, :idx [...]}]}
After:  {"@id" "uri", "prop" [{"@value" "text"}]}

Updated corresponding tests to expect the new standard format.
- Remove unused binding parameters (idx) from parse-node-val methods
- Replace unnecessary str wrapping of string literals in error messages
- All methods now use underscore (_) for unused parameters to satisfy clj-kondo
- Fixes 6 linting warnings with no functional changes to the code
…rmat

- Update all test expectations from old format (:id, :idx, :type, :value) to JSON-LD format (@id, @type, @value, @graph)
- Remove :idx fields entirely as they are no longer part of the expanded output
- Change :json type to "@JSON" string to match JSON-LD specification
- Update graph tests to use "@graph" instead of :graph keyword
- All tests now pass with the refactored expansion that produces valid JSON-LD

This completes the migration from the proprietary data format to standard JSON-LD
expansion output, ensuring compatibility with JSON-LD processors and tools.
@zonotope zonotope requested review from a team and bplatz July 8, 2025 03:47
@zonotope zonotope self-assigned this Jul 8, 2025
@bplatz
Copy link
Contributor

bplatz commented Jul 13, 2025

I'm curious the motivation for this change, and downstream impacts from it. There are other standards-compliant libraries out there that were passed up because there were tradeoffs that were important at least at the time. If the time and need has changed then that may be the current situation.

The original tradeoffs were:
a) speed - keyword based lookups (although I'm not sure what real world difference would be at this point and perhaps minimal)
b) optimizations [speed again] - the :type-id made short work of nested contexts (a context map in a context map) and if I recall proved useful beyond that
c) metadata - the :idx key allows us to give specific user feedback exactly where issues we discover downstream originated from - something that is completely lost in standard expansion
d) clojure keywords for the main "special values" that make it easy to check for conditions in idiomatic clojure code instead of a bunch of string-based lookups

If these things are no longer needed (or no longer the case, e.g. it isn't actually any faster) then we should decide if we want to have a library that does the same thing other libraries already do instead of continuing to maintain this.

@zonotope
Copy link
Contributor Author

zonotope commented Jul 14, 2025

I think on balance we should make this change, but I don't feel particularly strongly about it. We've had a number of bugs related to our expanded data not being valid json-ld, but those bugs have been fixed.

The motivation for this change expanded data no longer being valid json-ld breaks json-ld semantics. We currently can't expand anything twice, for example. This makes the code base harder to work with and I expect that in the future it will make it more difficult for new people to work on and understand the code base because lots of the pre-existing W3C documentation won't apply.

I remember off of the top of my head bugs due to us serializing expanded documents and treating them as json-ld upon de-serialization, transmitting expanded documents between servers and treating them as json-ld on the receiver side, and attempting to expand a document that was already expanded throwing errors because it wasn't valid json-ld. Now, in each of these cases the best thing to do was to either compact the document beforehand or avoid trying to expand it more than once, but that won't always be practical, especially if the document is small and the compression benefits won't matter as much or we're dealing with input from multiple paths.

I'll respond to the tradeoffs you mentioned in line.

a) speed - keyword based look ups (although I'm not sure what real world difference would be at this point and perhaps minimal)

I think the real world differences here are minimal if anything. The difference in speed between keyword based and string based look ups come from the runtime being able to compare memory addresses for keywords because they're interned vs a character by character equality check for strings. Both comparisons are fast, but keyword checks are slightly faster since the equality check complexity is constant instead of linear. I think this would matter slightly more if we converted arbitrary iris into keywords, but we only convert a limited set of iris, all of which happen to be short strings.

b) optimizations [speed again] - the :type-id made short work of nested contexts (a context map in a context map) and if I recall proved useful beyond that

I'm not sure what you mean by :type-id. Could you elaborate?

c) metadata - the :idx key allows us to give specific user feedback exactly where issues we discover downstream originated from - something that is completely lost in standard expansion

I think this matters most during the expansion process, and we still keep track of the current index in the document during the expansion, and any expansion errors we throw include the index into the document where the error occurred. We don't currently use the idx metadata for any other type of error reporting, and if we have downstream code that runs into a semantically different type of error post expansion, we'll also have much more context related to that specific error to include in the error message, including the expanded and compact form of any iris related to the error.

d) clojure keywords for the main "special values" that make it easy to check for conditions in idiomatic clojure code instead of a bunch of string-based look ups

We'll never be able to avoid string based look ups because we can't convert arbitrary iris to keywords, and I think managing maps with mixed key types is even less idiomatic then having all string keys. Another problem that comes from this is that some input data we receive could already be in expanded form, so we could always have "@id" as the id key anyway, so we still always have to check for it.

If these things are no longer needed (or no longer the case, e.g. it isn't actually any faster) then we should decide if we want to have a library that does the same thing other libraries already do instead of continuing to maintain this.

I usually prefer to use a well maintained library if one exists instead of re-implementing that functionality ourselves, but our library is relatively stable at this point, so I think we should probably stick to it for now unless we find thorny bugs or large gaps in the feature set.

Like I said, I don't feel particularly strongly about this, but if we do decide to stick with the current functionality, then I think we should at least change the name of our function. expand has a well established meaning in a json-ld context, and what we currently do is different, so it should have a different name.

@dpetran
Copy link
Contributor

dpetran commented Jul 15, 2025

As to the availability of other libraries, the ones that I tested were about an order of magnitude slower than ours, though they cover all of the json-ld processor api.

@zonotope
Copy link
Contributor Author

@fluree/core this is ready for review

CHANGELOG.md Outdated
### Changed
- **BREAKING**: Expansion now produces valid JSON-LD output conforming to JSON-LD 1.1 specification
- Expansion output now uses standard JSON-LD keywords (`@id`, `@type`, `@value`, `@graph`, `@list`) instead of proprietary format
- Removed `:idx` metadata from expansion output - expansion now produces clean JSON-LD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update this - it looks like we actually do retain :idx as metadata. Nice idea, btw, I like that we don't have to deal with it while processing the data but that it's still there if we ever want to use it.

Copy link
Contributor

@dpetran dpetran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍇

@zonotope zonotope merged commit 7408353 into main Aug 14, 2025
10 checks passed
@zonotope zonotope deleted the refactor/valid-expanded-json-ld branch August 14, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants