Conversation
This change updates the JSON-LD expansion implementation to produce output that conforms to the W3C JSON-LD specification, replacing the previous proprietary format with standard JSON-LD keywords. Key changes: - Use "@value" instead of ":value" for literal values - Use "@id" instead of ":id" for node identifiers - Use "@type" instead of ":type" for data types - Use "@language" instead of ":language" for language tags - Use "@list" instead of ":list" for ordered lists - Remove proprietary ":idx" metadata from expanded output The expansion now produces valid JSON-LD expanded form that can be processed by other JSON-LD processors, while maintaining backward compatibility for input formats (both compact JSON-LD and Clojure keyword forms are still supported). Example transformation: Before: {:id "uri", "prop" [{:value "text", :type nil, :idx [...]}]} After: {"@id" "uri", "prop" [{"@value" "text"}]} Updated corresponding tests to expect the new standard format.
- Remove unused binding parameters (idx) from parse-node-val methods - Replace unnecessary str wrapping of string literals in error messages - All methods now use underscore (_) for unused parameters to satisfy clj-kondo - Fixes 6 linting warnings with no functional changes to the code
…rmat - Update all test expectations from old format (:id, :idx, :type, :value) to JSON-LD format (@id, @type, @value, @graph) - Remove :idx fields entirely as they are no longer part of the expanded output - Change :json type to "@JSON" string to match JSON-LD specification - Update graph tests to use "@graph" instead of :graph keyword - All tests now pass with the refactored expansion that produces valid JSON-LD This completes the migration from the proprietary data format to standard JSON-LD expansion output, ensuring compatibility with JSON-LD processors and tools.
|
I'm curious the motivation for this change, and downstream impacts from it. There are other standards-compliant libraries out there that were passed up because there were tradeoffs that were important at least at the time. If the time and need has changed then that may be the current situation. The original tradeoffs were: If these things are no longer needed (or no longer the case, e.g. it isn't actually any faster) then we should decide if we want to have a library that does the same thing other libraries already do instead of continuing to maintain this. |
|
I think on balance we should make this change, but I don't feel particularly strongly about it. We've had a number of bugs related to our expanded data not being valid json-ld, but those bugs have been fixed. The motivation for this change expanded data no longer being valid json-ld breaks json-ld semantics. We currently can't expand anything twice, for example. This makes the code base harder to work with and I expect that in the future it will make it more difficult for new people to work on and understand the code base because lots of the pre-existing W3C documentation won't apply. I remember off of the top of my head bugs due to us serializing expanded documents and treating them as json-ld upon de-serialization, transmitting expanded documents between servers and treating them as json-ld on the receiver side, and attempting to expand a document that was already expanded throwing errors because it wasn't valid json-ld. Now, in each of these cases the best thing to do was to either compact the document beforehand or avoid trying to expand it more than once, but that won't always be practical, especially if the document is small and the compression benefits won't matter as much or we're dealing with input from multiple paths. I'll respond to the tradeoffs you mentioned in line.
I think the real world differences here are minimal if anything. The difference in speed between keyword based and string based look ups come from the runtime being able to compare memory addresses for keywords because they're interned vs a character by character equality check for strings. Both comparisons are fast, but keyword checks are slightly faster since the equality check complexity is constant instead of linear. I think this would matter slightly more if we converted arbitrary iris into keywords, but we only convert a limited set of iris, all of which happen to be short strings.
I'm not sure what you mean by :type-id. Could you elaborate?
I think this matters most during the expansion process, and we still keep track of the current index in the document during the expansion, and any expansion errors we throw include the index into the document where the error occurred. We don't currently use the idx metadata for any other type of error reporting, and if we have downstream code that runs into a semantically different type of error post expansion, we'll also have much more context related to that specific error to include in the error message, including the expanded and compact form of any iris related to the error.
We'll never be able to avoid string based look ups because we can't convert arbitrary iris to keywords, and I think managing maps with mixed key types is even less idiomatic then having all string keys. Another problem that comes from this is that some input data we receive could already be in expanded form, so we could always have "@id" as the id key anyway, so we still always have to check for it.
I usually prefer to use a well maintained library if one exists instead of re-implementing that functionality ourselves, but our library is relatively stable at this point, so I think we should probably stick to it for now unless we find thorny bugs or large gaps in the feature set. Like I said, I don't feel particularly strongly about this, but if we do decide to stick with the current functionality, then I think we should at least change the name of our function. |
|
As to the availability of other libraries, the ones that I tested were about an order of magnitude slower than ours, though they cover all of the json-ld processor api. |
|
@fluree/core this is ready for review |
CHANGELOG.md
Outdated
| ### Changed | ||
| - **BREAKING**: Expansion now produces valid JSON-LD output conforming to JSON-LD 1.1 specification | ||
| - Expansion output now uses standard JSON-LD keywords (`@id`, `@type`, `@value`, `@graph`, `@list`) instead of proprietary format | ||
| - Removed `:idx` metadata from expansion output - expansion now produces clean JSON-LD |
There was a problem hiding this comment.
Can we update this - it looks like we actually do retain :idx as metadata. Nice idea, btw, I like that we don't have to deal with it while processing the data but that it's still there if we ever want to use it.
This patch updates the expand output to be valid JSON-LD instead of our proprietary format while retaining keyword context functionality that allows for namespaced keywords as compact iris. It changes the
:id,:value,:type,:list, and:languagekeyword keys found in the expanded output to@id,@value,@type,@list, and@languagekeys, respectively, and it also removes the:idxkey from the expanded output while still reporting on the current index when any errors occur.This changes the output contract, so we should bump the library version to 2.0.0 if we choose to merge this patch.