Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Should each member in a list contribute to term rank? #172

Closed
lanthaler opened this Issue · 12 comments

4 participants

@lanthaler
Owner

There have been some discussions on the mailing list about what the outcome of the following compaction should be.

Input

{
  "@id": "http://example.com/id1",
  "http://example.com/term": [
    {
      "@list": [
        { "@value": "v1.1", "@language": "de" },
        { "@value": "v1.2", "@language": "de" },
        { "@value": "v1.3", "@language": "de" },
        4,
        { "@value": "v1.5", "@language": "en" },
        { "@value": "v1.6", "@language": "en" }
      ]
    },
    {
      "@list": [
        { "@value": "v2.1", "@language": "en" },
        { "@value": "v2.2", "@language": "en" },
        { "@value": "v2.3", "@language": "en" },
        4,
        { "@value": "v2.5", "@language": "de" },
        { "@value": "v2.6", "@language": "de" }
      ]
    }
  ]
}

Context

{
  "@context": {
    "@language": "de",
    "term1": {
      "@id": "http://example.com/term", "@container": "@list" },
    "term2": {
      "@id": "http://example.com/term", "@container": "@list", "@language": "en" }
  }
}

Please note that term1 uses the context's default languag, i.e., de, whereas term2 uses en; otherwise they have exactly the same definition.

The question is whether both lists should be compacted to term1 (and thus trigger an error) or whether term2 should be choosen instead for list 2 as there are more matches (three en compared to two de).

@msporny
Owner

OPTION 1: Both lists should be compacted to term1.
OPTION 2: term2 should be chosen for list 2.
PROPOSAL 1: Lists-of-lists should only throw an error in JSON-LD when converting the document to RDF.

@msporny
Owner

I don't want to complicate the term ranking algorithms any more than what they are right now. It seems as if this may be a corner-case. Since term2 doesn't match every item in the list, I think term1 should be picked instead for both lists. I don't know if this should throw an error... converting toRDF should throw an error, but probably not compaction? What do other folks feel about whether lists-of-lists should be allowed in regular JSON-LD, but when converting to RDF, they should throw an error?

OPTION 1: +1
OPTION 2: -1
PROPOSAL 1: +1

@tidoust

I thought lists-of-lists were not allowed in this version of JSON-LD specifically because of the added complexity with regards to algorithms?

Provided the term ranking algorithm is clear enough for implementers, I don't think there's a quick-and-easy way to improve it without introducing further complications.

OPTION 1: +1 (i.e. follow the current algorithm)
OPTION 2: -1 (no change to the algorithm to support this corner case)
PROPOSAL 1: -1. If we don't want lists-of-lists, it would be good not to fail silently when one is found.

@lanthaler
Owner

OPTION 1: -1, it is quite obvious what the best match would be and this is not a corner case IMHO
OPTION 2: +1, because term2 is able to compact 4 elements instead of just 2 as term1 does
PROPOSAL 1: -1 as most algorithms would probably have to be changed

Honestly I'm a bit surprised by your choices. Would your opinion change if the context would look like this:

{
  "@context": {
    "term1": { "@id": "http://example.com/term", "@container": "@list", "@language": "de" },
    "term2": { "@id": "http://example.com/term", "@container": "@list", "@language": "en" }
  }
}

Actually I don't think this adds complexity, but it would remove some complexity.

@lanthaler
Owner

As I just found out, the playground fails completely in this example as it collapses the two lists into one: http://bit.ly/UQG2C7

@gkellogg
Owner

Te problem with this example is that it subtly depends on narrow specifics of the term ranking algorithm.. We might just consider this non-conforming and leave it up to the processor to select one; there's no obviously right answer. Alternatively, just make sure that the choice (and the examples) are appropriate for the specified algorithm, which this is not.

PROPOSAL 1: -1, lists of lists are not well formed, so Postel's rule applies.

@lanthaler
Owner

Why should such a list be non-conforming? Why is this an "in-appropriate choice for the specified algorithm"? This is not a list of lists but there are two separate lists in this example.

@gkellogg
Owner

What could be non-conforming is the selection of lists having multiple languages along with language maps. If not conforming, it's certainly a pathological corner-case.

Regarding the example, I noted this in d1b3ad3 and in this email.

The ranks in this test add up as follows (using the spec'ed algorithm):

first list, term 1/term 2: 13/7
second list term1/term 2: 11/10

This is why the example isn't appropriate. If you definitely want to have term 2 selected for the second list, it should be less ambitious as to how it turns out. If the point of the example is to show that your alternate algorithm is better, fine, but if we have tests, they should be consistent with the algorithm specified.

In any case, this is like re-aranging armchairs on the Titanic. It's a corner case, and there's no absolutely right answer.

@lanthaler
Owner

We are not discussing language maps in this example.

I know what the spec'ed algorithm sums up to, and in my opinion it shouldn't sum up to these numbers. If you use the following context it would in fact separate the lists.

{
  "@context": {
    "term1": { "@id": "http://example.com/term", "@container": "@list", "@language": "de" },
    "term2": { "@id": "http://example.com/term", "@container": "@list", "@language": "en" }
  }
}

I can't see any compelling argument why the context above should yield a different result than the context below:

{
  "@context": {
    "@language": "de",
    "term1": { "@id": "http://example.com/term", "@container": "@list" },
    "term2": { "@id": "http://example.com/term", "@container": "@list", "@language": "en" }
  }
}

If the point of the example is to show that your alternate algorithm is better, fine, but if we have tests, they should be consistent with the algorithm specified.
In any case, this is like re-aranging armchairs on the Titanic. It's a corner case, and there's no absolutely right answer.

Well, in my opinion the most important tests are the ones that test corner cases. I tried several times to have a discussion about the algorithms but it seems to be impossible because "there are already two implementations" and that's "how it is currently specified" so I'll just give up on this.

@gkellogg
Owner

If Term Tank needs to be re-written any way to take into consideration language maps and other resolutions, then I think it's fair game to play with and improve. If it doesn't need to be updated, I'd just say leave wee enough alone.

@lanthaler
Owner

RESOLVED: When compacting lists, the most specific term that matches all of the elements in the list, taking into account the default language, must be selected.

@lanthaler lanthaler referenced this issue from a commit
@lanthaler lanthaler Add compaction test for mixed lists
This addresses #172.
3abfe61
@lanthaler
Owner

I've updated all algorithms, unless I hear objections I will close this issue in 24 hours.

@lanthaler lanthaler closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.