-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify property-generator round-tripping algorithm #160
Comments
There might be a much easier solution if we decide to change the "semantics" of property generators. We could leave everything as is and interpret the context {
"term": {
"@id": [
"http://example.org/vocab#term1",
"http://example.org/vocab#term2"
]
}
} as {
"term": "http://example.org/vocab#term1",
"term": "http://example.org/vocab#term2"
} and say that expansion will use all IRIs to create expanded output, i.e., I think the advantage of this approach is that it is easy to implement and easy to understand/explain. |
I think @lanthaler's approach probably is simplest, but it requires changing the term redefinition logic to allow for additive definitions within a single level of context processing. |
Just to make it clear: The second code snippet should just illustrate how it will work, this should not and, in most languages, cannot be supported directly. If we chose to adopt this approach, the only thing we would need to decide is whether we eliminate duplicates during compaction (which might get tricky when it comes to lists) or not. |
Gregg asked me to post here. I'll clarify how we would use property generators in Drupal. I am leaving out our use of language maps to keep things clear. In my example, a site admin has created a content type that has two distinct fields, one field for tags and one field for related news items. While they are distinct in Drupal's vocabulary, both map to schema:about. {
"@context": {
"site": "http://mysite.com/",
"field_tags": {
"@id": ["site:vocab/field_tags", "http://schema.org/about"]
},
"field_related": {
"@id": ["site:vocab/field_related", "http://schema.org/about"]
}
},
"@id": "site:node/1",
"field_tags": [
{
"@id": "site:term/this-is-tag"
}
],
"field_related": [
{
"@id": "site:node/this-is-related-news"
}
]
} So on expansion, this is what results: [
{
"@id": "http://mysite.com/node/1",
"http://schema.org/about": [
{
"@id": "http://mysite.com/term/this-is-tag"
}],
"http://mysite.com/vocab/field_tags": [
{
"@id": "http://mysite.com/term/this-is-tag"
}],
"http://mysite.com/vocab/field_related": [
{
"@id": "http://mysite.com/node/this-is-related-news"
}]
"http://schema.org/about": [
{
"@id": "http://mysite.com/node/this-is-related-news"
}],
}] My understanding is that, with some compaction algorithms, the data would get mixed together when going back to compact form. For example: "field_tags": [
{
"@id": "site:term/this-is-tag"
},
{
"@id": "site:node/this-is-related-news"
}
],
"field_related": [
{
"@id": "site:term/this-is-tag"
},
{
"@id": "site:node/this-is-related-news"
}
] We would prefer to compact to approximately the same form as we started with, which only has one value for each alias. |
I just finished listening to the audio from the last telecon. There was a question from Manu about why we care about property generation if we don't care about the representation in other RDF formats. Being able to match properties using universal identifiers has value in and of itself, even if something can't be transformed cleanly to N-Triples. For example, two sites might want to contribute data to the same central repository. They have different content types with different fields, but they can use common vocabularies to provide alignment. For one site, the context would be: {
"@context": {
"site": "http://one.com/",
"central_repo": "http://central.org/"
"field_about_modules": {
"@id": ["site:vocab/field_about_modules", "central_repo:about"]
},
} For the other site, the context would be: {
"@context": {
"site": "http://two.com/",
"central_repo": "http://central.org/"
"field_tutorial_about": {
"@id": ["site:vocab/field_tutorial_about", "central_repo:about"]
},
} The central repository will only be looking for central_repo:about, so it will expand the JSON-LD. It doesn't have to convert it to another RDF format, however... it can just work with the expanded JSON object. |
Lin, actually, your example would expand to use just a single key for http://schema.org/about: While they are distinct in Drupal's vocabulary, both map to schema:about. {
"@context": {
"site": "http://mysite.com/",
"field_tags": {
"@id": ["site:vocab/field_tags", "http://schema.org/about"]
},
"field_related": {
"@id": ["site:vocab/field_related", "http://schema.org/about"]
}
},
"@id": "site:node/1",
"field_tags": [
{ "@id": "site:term/this-is-tag"}
],
"field_related": [
{"@id": "site:node/this-is-related-news"}
]
} Results in : [
{
"@id": "http://mysite.com/node/1",
"http://schema.org/about": [
{"@id": "http://mysite.com/term/this-is-tag"},
{"@id": "http://mysite.com/node/this-is-related-news"}
],
"http://mysite.com/vocab/field_tags": [
{"@id": "http://mysite.com/term/this-is-tag"}
],
"http://mysite.com/vocab/field_related": [
{"@id": "http://mysite.com/node/this-is-related-news"}
]
}] This complicates the compaction algorithm, as it does require the combinatorial step of looking for all values across all IRI keys mapped to a property generator term and extracting only those values that are in common. So, for every property generator term, look at every property defined by that term for common values and only use the property generator term for those values. Potentially, there could be more than one property generator term with overlapping property IRIs associated with it, which could mean that the values would be replicated. Also, as noted before, the values could be nontrivial, including node definitions that are recursive and may or may not be equivalent. This is really complicated, and has a big spec-smell to me. |
For our use case, I don't think we would require the combinatorial step. I could be wrong, but I believe any one of Manu's proposals would give us the values we want for "field_tags" and "field_related" in my example. And it seems that only the first requires the combinatorial algorithm. |
I had a good chat with Lin today about Drupal's use case. The first part of the discussion focused on trying to see if Drupal could use a pre-processing or post-processing step to achieve what property generators provide. After a bit of discussion, it became clear that any sort of pre-or-post processing step would add unnecessary complexity to the Drupal system. The second part of the discussion concerned the question of why property generators were difficult. The key issue is that round-tripping is hard, for two reasons: The first reason is that, upon expansion, it becomes impossible to tell which values came from a property generator and which ones did not. You are adding data to the graph without tracking where that data came from. This means that compacting back to what you had is a non-deterministic operation. The second reason is that, upon expansion and then compaction again, even if you could end up with what you started with, you would still have to process each item being associated with the property-generator term to check for duplicates. This operation would be very time-complex as you would have to do a deep compare on each object against every other object being coalesced into the property generator. During the conversation a new approach surfaced. What if we just marked all the things that were generated as a result of a property generator when going to expanded form? That is, if we have this: {
'@context': {
'foo': {
'@id': ['http://example.com/foo', 'http://schema.org/foo']
}
},
"foo": "bar"
} Expanded form would result in this: {
"http://example.com/foo": "bar",
"http://schema.org/foo": {
"@value": "bar",
"@processor": {"source": "property-generator"}
}
} Re-compacting (with the same context) would just throw out anything that had a "@processor": {"source": "property-generator"} in it, giving you the original input. The upsides to this approach are 1) understanding which terms came from a property generator are no longer non-deterministic, and 2) Time complexity is reduced to a constant N time algorithm where N is the number of property generator entries that have to be removed from the output. The downside to this approach is that someone compacting may not want to remove property-generator-sourced terms. We could have an API option, like {removePropertyGeneratedValues: false} to the compact() call in this case. We could also list the @context that contained the property generators, and the item is removed only if the @context is the one that is being used for compaction. I don't think we need to do either of these things, but perhaps others feel more strongly about it than I do. I think this proposal addresses all of the concerns that are related to this issue that the group has at the moment. Which corner-cases am I missing? |
Manu, even though your proposal solves the problem for Drupal’s use case, I don’t think it solves it in general. Your proposal relies on the fact that some metadata is embedded to facilitate round-trip-ability. There’s no way to ensure that such metadata is always there. What if some server directly generates expanded JSON-LD? This would be a quite reasonable thing to do in M2M-communication as there’s no need for nice, short terms. I’m still not sure if I really grasped Lin’s requirements. In here comment above she said
Why do exist two fields in Drupal to express the same concept!? Either it is the same, or it is not. Let's assume we have the following data: fieldA: 1, 3, 5 both would map to schema:about: 1, 2, 3, 4, 5, 6 Is it OK to end up with fieldA: 1, 2, 3, 4, 5, 6 if the data is re-imported into the same system? I assume it wouldn't be OK otherwise there wouldn't be two distinct fields in the first place, right? The problem is that in the context you implicitly state that fieldA == schema:about == fieldB. All these problems could be solved easily by not using property-generators at all: include the relevant data multiple times. You would then end up with fieldA: 1, 3, 5 in the expanded output and round-tripping that would be a no-brainer. I would like to stress that I really want to support Drupal's use cases and try to find a solution for their problems but I think we should not try at any price. The discussions around this specific feature show that it is not even clear how this feature should work in the optimal case, not to mention the corner cases. I would therefore prefer to put this feature on hold for the time being. The spec is purposely designed to ignore unknown data in the context. So the Drupal community could exploit that to introduce a proprietary extension to JSON-LD that satisfies their needs. Maybe something along the lines of
a simple preprocessing step would could then just iterate over the data and duplicate all fieldA keys x-times using the IRIs in otherIds as keys. PROPOSAL: Do not support property-generators in JSON-LD 1.0. |
Why do exist two fields in Drupal to express the *same* concept!? Within Drupal, these two fields would be handled differently, as different concepts. Tags would be configured to autocomplete from a free-tagging vocabulary. Related News would be configured to autocomplete from nodes of the "news article" type. They could also be formatted in separate ways. The values would be stored in different database tables. This is a distinction that we need to maintain internally and need to maintain in the deployment use case (moving content from site to site), but it is not a distinction that is important to other consumers. Therefore, the two values are exposed using separate properties for Drupal consumers, but the same property for search engine consumers. I can ask the group whether we are OK with having property values repeated multiple times. However, many of the people anticipating this work in Drupal are concerned specifically with mobile. I'm not sure they will agree to doubling and tripling the size of the data. Adding our own custom preprocessing as part of a library might be an option. It would make it hard to interface with multiple non-Drupal consumers, though, as they could not be expected to have a JSON-LD library with this customization... unless it becomes a widely adopted non-speced feature. |
OK, I see. Would it be possible to coerce the values to different datatypes Would this solve your problem?
Do you have a sample document? I'm quite sure gzipping it would actually
Not sure I understand what you are saying here. Normal consumers wouldn't |
On the telecon today, I tried to outline what I think would allow for effective round-tripping of property generators without needing to introduce pragma-like data to the expanded output. The basis of the proposal, common with most everything presented to date, is to associate multiple IRIs with a term:
Expanding this document results in both IRIs being used as properties with the same value:
Compacting this expanded object with the same context effectively requires the following (before 2.3):
The nodeComparison algorithm would compare values, node definitions, node references and lists of values or node definitions/references for equivalence. Comparing node definitions is somewhat complicated. The set of expanded values may have at most one node definition. Comparison of node definitions/references is done by comparing the node identifier ( Expansion is modified to ensure that node definitions are output only once, and node references otherwise. This requires that all anonymous node definitions which are values of property generator properties be assigned a Blank Node identifier using a well-known prefix. Detecting any Blank Node identifier in the input graph using this prefix must result in an error. This algorithm should handle the case where simple terms are used alongside property generator terms, and only those terms having values that are common across all property generator IRIs are assigned to the property generator term on compaction. |
@dlongley and I looked at @gkellogg's proposal above and found the following issues:
(2) is the generalized solution to the problem (which we have discussed before and rejected because of the requirement to do the deep comparison). However, that is the correct solution to the particular problem in front of us - removing duplicates. There is no way to avoid a deep comparison between objects in the list - it's O(n^2). @dlongley came up with a different approach to this problem that doesn't create the duplicate problem, is far less complicated, but still solves the Drupal use case (and the general use case of sites using different IRIs to refer to data in expanded form). I'll put that proposal in below. |
Property Aliasing Proposal The core problem that Drupal is attempting to solve is to ensure that Drupal sites can exchange data with one another while not having to tightly couple their context property IRIs. That is, each Drupal site has their own data internally, with their own property IRIs. Many of those property IRIs are specific to the drupal site. For example, the title of a tag may be mapped to http://mydrupalsiteA.org/vocabs/title or http://schema.org/title or http://purl.org/dc/terms/title. Site A might know about http://schema.org/title but not http://purl.org/dc/terms/title, Site B might know about http://schema.org/title and http://purl.org/dc/terms/title but not http://mydrupalsiteA.org/vocabs/title. So the problem is: How do you export data such that Site A can map http://schema.org/title to "footitle", and site B can map http://purl.org/dc/terms/title to "bartitle"? Property Generators One approach was to use property generators to duplicate data. So, Site A would duplicate the same data (in expanded form) for http://schema.org/title and http://purl.org/dc/terms/title and http://mydrupalsiteA.org/vocabs/title. The down-side with this approach is that, when compacting, you must coalesce the data back into a single property (if the consuming site is using a property generator for the same property). The coalesce step requires a deep comparison to de-duplicate data and is a very expensive O(n^2) operation. If we have to implement this, we can, but there is a better solution that doesn't duplicate data in the first place. Property Aliasing Property aliasing has the benefits of addressing the Drupal use case above while not duplicating data. In order to use a property alias, Site A would do this (same syntax as we have for property generators): {
"@context": {
"footitle": { "@id": ["http://schema.org/title", "http://purl.org/dc/terms/title", "http://mydrupalsiteA.org/vocab/title"]}
},
"footitle": "baz"
} Site B contacts Site A to get the data. Site A expands the data using its context. The new property aliasing feature would use the first IRI in the list to expand the data: {
"http://schema.org/title": "baz"
} Site B would then use its own context to compact the data above and work with it: {
"@context": {
"bartitle": { "@id": ["http://purl.org/dc/terms/title", "http://schema.org/title", "http://mydrupalsiteB.org/vocab/title"]}
},
"bartitle": "baz"
} Site B could then communicate the same data back to Site A by following the same algorithm. Note that expanded form is different this time around (because the first IRI in the list was different on SiteB): {
"http://purl.org/dc/terms/title": "baz"
} Site A could them compact the data above and work with it, like so: {
"@context": {
"footitle": { "@id": ["http://schema.org/title", "http://purl.org/dc/terms/title", "http://mydrupalsiteA.org/vocab/title"]}
},
"footitle": "baz"
} So, the data round-trips to exactly what one would expect without needing to duplicate data in expanded form and without the need for @processor statements. Pathological Cases There is one class of pathological cases. Basically, this is when developers manually inject full URL values into compacted or expanded data. Let's look at the compact case: {
"@context": {
"footitle": { "@id": ["http://schema.org/title", "http://purl.org/dc/terms/title", "http://mydrupalsiteA.org/vocab/title"]}
},
"footitle": "baz",
"http://purl.org/dc/terms/title": "baz",
"http://mydrupalsiteA.org/vocab/title": "baz"
} The above would expand out to: {
"http://schema.org/title": "baz",
"http://purl.org/dc/terms/title": "baz",
"http://mydrupalsiteA.org/vocab/title": "baz"
} and then compact to: {
"@context": {
"footitle": { "@id": ["http://schema.org/title", "http://purl.org/dc/terms/title", "http://mydrupalsiteA.org/vocab/title"]}
},
"footitle": ["baz", "baz", "baz"]
} Obviously, this is not preferable, but also keep in mind that the developer had to go out of their way to make this happen. If you continue to just use terms and not muck around with the expanded data, you're in good shape. There is no duplicate removal for property aliasing because it's a 'grouping' mechanism, not a 'same as' mechanism. When you use a property alias, you're saying "Any of the following URLs should be grouped under term X". You're not saying "Any of the following URLs are owl:sameAs the other URLs". While I do admit that this is a bit of weasel wording, it prevents us from having to do a deep compare. We /could/ do a deep compare, but the assertion is that if people use the type of markup above in compact form, they're doing it wrong(tm). This same line of reasoning applies to subject definitions that have the same @id, but different data. We don't merge in that case either. Also note that this pathological problem exists in the other proposals as well. |
It addresses this by allowing only one value to be a node definition, the rest MUST be node references.
The fact that we have node references only requires that we compare @id elements. #2#2 is the generalized solution to the problem (which we have discussed before and rejected because of the requirement to do the deep comparison). However, that is the correct solution to the particular problem in front of us - removing duplicates. There is no way to avoid a deep comparison between objects in the list - it's O(n^2). If only one node can be a node definition, then there is nothing to compare deeply. Gregg @dlongleyhttps://github.com/dlongley came up with a different approach to this problem that doesn't create the duplicate problem, is far less complicated, but still solves the Drupal use case (and the general use case of sites using different IRIs to refer to data in expanded form). I'll put that proposal in below. — |
Gregg: I don’t think that Drupal could live with the fact that the data is just under one IRI, I think it has to be under all IRIs. The problem is that site B might not use a property generator but just use a term for the single property it is interested in (which then contains just a node reference). Manu: I don't understand how your proposal should work. Let's make it simple by saying site A expands "term" to "A" and "X", so the expanded output would just contain a property "A". Site B understands "B" and "X". Your proposal would only work if the order of the IRIs in A's context is "X", "A" and in B's context "X", "B". All other combinations wouldn't support round-tripping. I find it very dangerous to rely on the order of the IRIs here. |
Markus,
This is a general problem; not one having to do with property generators per se. For example, suppose site A put "A", "B", and "C" in its property generator, but site B needed "D". The assumption here is that there's at least one shared property -- and that one is listed first. So when building your context, you put the most "public" or "general" property name first. All that being said, I think we're probably just going to have to go with bnode generation + deep comparison. There are some issues with the proposal Manu and I put forward that would require including something like an @origin flag to keep track of where properties came from ... but that flag would be lost during compaction. It seems like the most readily available solution to this problem is doing deep comparisons to remove duplicates (which itself has drawbacks including maintaining read/write synchronicity between sites ... but that may not be a requirement). |
@dlongley , @davidlehn , @msporny and I had a discussion about the merits of either approach. As dave says, I think we settled on a variation of my approach that uses node definitions for each expanded property, rather than a signal node definition and one or more node references. This has the consequence of requiring deep node comparison to test for equivalence (although IMO, a node reference could still be used for comparison too). This allows us to stay within the RDF data model and ensure that this is compatible with both from- and to-RDF. There should be a warning that exchanging data with a service that does not understand all terms in a property generator could result in data which is not round-tripable. For example, if I use this to describe a personal profile document in both FOAF and schema.org, if I provide this to an application that only updates the FOAF part of the data, it will not round trip:
This would expand to the following:
If I use an service that updates this document to add an additional foaf:mbox:
it will not result in something that can be compacted using the property generator term:
In my mind, this is perfectly acceptable, and what I would want to have happen anyway. |
The problem you describe Gregg could easily be solved by merging the data but that would break Drupal’s requirements. As I tried to explain in the last telecon I find this quite problematic when looking at this from a consumers point of view. If I’m a consumer and state that termA maps to the IRIs X, Y, Z I would expect it to select X independently if there’s an equivalent Y, Z. What’s the relationship of the IRIs in a property generator? It’s definitely not a owl:sameAs.. but the structure looks exactly like that and I find that very problematic. I will post my proposal in a separate comment after lunch as I don’t think it’s understandable from the minutes. |
Separating compaction from expansion for property generators The problem with all proposals so far has been in supporting round-tripping of the results. We tried hard to find a solution that restores the original input document when expanding a document and then compacting it again using the same context. This sounds easy in principle but is difficult to implement efficiently as it requires to compare all data for equality which has a computational complexity of O(n²). The other problem with all of the current proposals is that the form property generators are specified in the context suggests that all IRIs denote the same concept (as in owl:sameAs) which is not true. In practice, however, the relationship between the IRIs is more the one of a subproperty to several super-properties. Typically you would use the feature to use a term with a very specific IRI (likely from a proprietary vocabulary) and map the same term to other widely used vocabularies to allow other applications that don't know the proprietary vocabulary to "understand" the data nevertheless. The solution I would thus like to propose explicitly separates the IRIs in the context to highlight that they are not equal. This should make it clear to developers that if they use that feature, their documents won't round-trip anymore as shown in the following example:
expands to:
compacting it again with
yields
So the additionally produced data stays there. It is then up to the application to decide what to do with it. Most likely an application would only extract the data it is interested in anyway so I don't think this should be a big problem. Pros of this approach:
Cons of this approach:
|
Regardless of my proposal above I did a little experiment to see if property generators are really worth the effort. The only advantage they seem to bring is to save bandwidth. So I went to DBpedia and queried for all people that were born in Boston before 1950. Turns out there were 728 persons. I then went and constructed a JSON-LD document in which each of these persons is the friend of one other person:
The document above contains a property generator (which proposal is chosen is irrelevant for this experiment). Thus every name would not just expand to To evaluate how useful property generators are I also created a second document without property generators that looks like this:
I then compared the size of document as-is and the sizes after gzipping the documents. Here are the numbers (in bytes):
Honestly I was quite surprised to see that compression wasn't able to eliminate the repeated inclusion of properties with different values. This is probably due to the fact that we have quite few properties and with short values. I was also surprised to see how a 72k document, that consumes 11k on the wire if transmitted compressed expands to over 600k in-memory - quite an easy way to launch a DoS attack. So if we are really going to implement this we will need to at least at a flag to disable this and clearly outline the security consequences of this feature. |
On Oct 5, 2012, at 3:25 AM, Markus Lanthaler notifications@github.com wrote:
It can also be solved by flattening. However, Drupal's covered because they just need to compact what was previously expanded, not deal with a change from an outside party to only part of the data; that's mostly a theoretical issue.
The proposal is that you would get all of the data with either X, Y, or Z. I don't see what the problem is.
Definitely not sameAs, it could be similar to subPropertyOf, and we discussed this. In fact, I don't think that there is any implied relationship. |
No, you will only get the data that was expanded to X and Y and Z otherwise that additional email address would be put under that term as well. I assume a developer would expect an or here instead if all IRIs are just listed in an array. |
On Fri, Oct 5, 2012 at 5:15 PM, Markus Lanthaler
We can certainly warn people about the consequences of expansion with |
Compression is negotiable by the client and the server. Just switch it off and limit the data that you are going to accept. I believe the uncompressed size is also transferred at the end of the gzip stream. On the other hand you can’t just switch off property generators as the data you would get out of expansion is completely different. |
Wouldn't expansion in general leave you open to DoS, then? An attacker could provide an IRI that is extremely long for a term that is very short. HTTP doesn't specify an upper bound for URIs, and AFAIK, URIs up to 2080 chars are recognized by all browsers. {
"@context": {
"a": "http://aaaaa...."
},
"a:p1": {"@id": "a:one"}
....
} |
That’s true, but expansion doesn’t allow you to duplicate whole subtrees and is therefore much less “effective”. |
Markus, great work on getting the data together on size on the wire vs. size in memory for property generators. I do agree that there is a DDoS possibility there and that we should do something about it. I suggest that we put in a maximum memory limit as an option. I also suggest that we put in a maximum processing limit as an option as well. I think we should leave the default unspecified because it's almost completely dependent on the environment in which and hardware on which you're operating. The maximum memory limit is needed for the property generator feature. Units would be bytes? The maximum processing limit is needed for the normalization feature. There are certain graph isomorphisms that are NP. Units would ideally be time taken inside a particular JSON-LD call, but I don't know if there is a very good way to get that data. Units would be milliseconds? Barring that, we could do number of 'ticks', which would be per-processor dependent. All this said, it would be easier to DoS the site in other ways. However, I don't think we should allow ourselves to be the reason a site is DoS'd. |
Cons of this approach:
@linclark, correct me if I'm wrong, but I don't think Drupal developers would be very happy if their data didn't round-trip, or finding extra data in their compacted form. I understand your concern, Markus. The way property generators are expressed makes it seem as if we are saying "these X properties are exactly the same". However, what we're really saying is "if you see property A, then expand to X, Y, and Z". There is no implied relationship between them... it's effectively a copy-and-paste operation. I think that's fairly easy to point out in the spec, and we have to carefully lay that out in tutorials as well. I'm not convinced that @alsoExpandTo or anything else would imply that there is no rigorous semantic relationship between the expanded IRIs. If people see A expand to X, Y, and Z - some of them are going to think there is a relationship regardless of what the spec states or the keywords imply. There is a very weak relationship, but it's not owl:sameAs and I'd even go as far as to say it's not subpropertyOf either. I think that the only relationship that's there is that one property was copy-pasted to other properties... nothing else is implied (either implicitly or explicitly). The no round-tripping con is a big issue. I think it's important if we're not going to confuse developers with the round-tripped data. We can do better, even if it is computationally costly. That is, I'd rather warn folks of the dangers of property generators and let them decide whether or not they want to burn the CPU cycles on the feature. What we do shouldn't suprise developers and I think that anything that doesn't round-trip the information cleanly (when using the same context) is going to really confuse people. The best proposal I've seen so far is a hybrid: Use @gkellogg's proposal above, use subject definitions when expanded, generate bnode IDs so you know what to eliminate when compacting, and use deep object comparisons when compacting. This allows one to expand and compact cleanly and without error when the same context is used. It solves the Drupal use case and it solves the problem in a general way. We will also need to place the following warnings in the spec:
PROPOSAL: Adopt Gregg Kellogg's property generator algorithm when expanding/compacting with the following modifications; 1) use subject definitions everywhere when expanding, 2) generate bnode IDs for all subject definitions without an '@id', 3) use deep comparisons when eliminating subject definitions during compaction. PROPOSAL: Add warning language to the JSON-LD Syntax and API specs noting the most problematic issues when working with property generators. |
That's true for expansion, but not at all for compaction. For compaction it is "if the same value exists for all (not any) of these IRIs, use this term term with that value". It's exactly that all that worries me but perhaps it turns out that it's not an issue in practice.. Frankly, I still don't like this feature and how we are going to implement it but I won't object it.. so let's proceed. |
RESOLVED: Adopt Gregg Kellogg's property generator algorithm when expanding/compacting with the following modifications; 1) use subject definitions everywhere when expanding, 2) generate bnode IDs for all subject definitions without an |
RESOLVED: Add warning language to the JSON-LD Syntax and API specs noting the most problematic issues when working with property generators. |
RESOLVED: Add a non-normative note to tell developers that their implementations may have a feature that allows all but one node definition created by a property generator to be collapsed into a node reference. |
I have a few more questions regarding property generators as they seem to break basically every single algorithm we have at the moment. What should happen if a term that is a property generator is used as the value of Do we need to relabel all blank nodes in expansion? I think yes What about compaction? Property generator terms should probably be preferred but what if you later find that it doesn't apply, i.e., not all property IRIs of that term contain the value? The only potential solution I see at the moment for this problem is to do IRI compaction in two steps.. first get a set of candidates for a specific IRI/value pair, then check the candidates according their rank and choose the first one that applies. |
Having not put much thought into it, here are my off-the-cuff responses (which may change once I think about it a bit more): What should happen if a term that is a property generator is used as the value of @id or @type? Throw an error? Yes, fatal error. Do we need to relabel all blank nodes in expansion? I think yes Yes, probably. It's probably a good idea to do so anyway so that people won't depend on the blank node IDs. We will need to keep a global mapping table around for old blank node IDs -> new blank node IDs, which is kinda annoying. Do IRI compaction in two steps.. first get a set of candidates for a specific IRI/value pair, then check the candidates according their rank and choose the first one that applies. Seems like a sensible first cut at the problem. |
Yes, I agree with Manu. Fortunately, we already have algorithms for renaming BNodes, so applying it to expansion shouldn't be a stretch, but it is something new. |
This is more or less an exact copy of the example in issue #160. See #160 (comment)
RESOLVED: Rename all blank node identifiers when doing expansion. |
I've updated all algorithms, unless I hear objections I will close this issue in 24 hours. |
When using property generators, most authors will expect that a single term that is expanded to multiple properties via .expand() would result in the same single term if run through .compact() with the same @context. There are a number of proposed ways of achieving this round-tripping:
For the sake of clarity, the context for #1, #2, and #3 would look like this:
The context for #4 would look like this:
The text was updated successfully, but these errors were encountered: