Should each member in a list contribute to term rank? #172

Closed
lanthaler opened this Issue Oct 22, 2012 · 12 comments

Comments

Projects
None yet
4 participants
Member

lanthaler commented Oct 22, 2012

There have been some discussions on the mailing list about what the outcome of the following compaction should be.

Input

{
  "@id": "http://example.com/id1",
  "http://example.com/term": [
    {
      "@list": [
        { "@value": "v1.1", "@language": "de" },
        { "@value": "v1.2", "@language": "de" },
        { "@value": "v1.3", "@language": "de" },
        4,
        { "@value": "v1.5", "@language": "en" },
        { "@value": "v1.6", "@language": "en" }
      ]
    },
    {
      "@list": [
        { "@value": "v2.1", "@language": "en" },
        { "@value": "v2.2", "@language": "en" },
        { "@value": "v2.3", "@language": "en" },
        4,
        { "@value": "v2.5", "@language": "de" },
        { "@value": "v2.6", "@language": "de" }
      ]
    }
  ]
}

Context

{
  "@context": {
    "@language": "de",
    "term1": {
      "@id": "http://example.com/term", "@container": "@list" },
    "term2": {
      "@id": "http://example.com/term", "@container": "@list", "@language": "en" }
  }
}

Please note that term1 uses the context's default languag, i.e., de, whereas term2 uses en; otherwise they have exactly the same definition.

The question is whether both lists should be compacted to term1 (and thus trigger an error) or whether term2 should be choosen instead for list 2 as there are more matches (three en compared to two de).

Owner

msporny commented Nov 20, 2012

OPTION 1: Both lists should be compacted to term1.
OPTION 2: term2 should be chosen for list 2.
PROPOSAL 1: Lists-of-lists should only throw an error in JSON-LD when converting the document to RDF.

Owner

msporny commented Nov 20, 2012

I don't want to complicate the term ranking algorithms any more than what they are right now. It seems as if this may be a corner-case. Since term2 doesn't match every item in the list, I think term1 should be picked instead for both lists. I don't know if this should throw an error... converting toRDF should throw an error, but probably not compaction? What do other folks feel about whether lists-of-lists should be allowed in regular JSON-LD, but when converting to RDF, they should throw an error?

OPTION 1: +1
OPTION 2: -1
PROPOSAL 1: +1

Contributor

tidoust commented Nov 20, 2012

I thought lists-of-lists were not allowed in this version of JSON-LD specifically because of the added complexity with regards to algorithms?

Provided the term ranking algorithm is clear enough for implementers, I don't think there's a quick-and-easy way to improve it without introducing further complications.

OPTION 1: +1 (i.e. follow the current algorithm)
OPTION 2: -1 (no change to the algorithm to support this corner case)
PROPOSAL 1: -1. If we don't want lists-of-lists, it would be good not to fail silently when one is found.

Member

lanthaler commented Nov 20, 2012

OPTION 1: -1, it is quite obvious what the best match would be and this is not a corner case IMHO
OPTION 2: +1, because term2 is able to compact 4 elements instead of just 2 as term1 does
PROPOSAL 1: -1 as most algorithms would probably have to be changed

Honestly I'm a bit surprised by your choices. Would your opinion change if the context would look like this:

{
  "@context": {
    "term1": { "@id": "http://example.com/term", "@container": "@list", "@language": "de" },
    "term2": { "@id": "http://example.com/term", "@container": "@list", "@language": "en" }
  }
}

Actually I don't think this adds complexity, but it would remove some complexity.

Member

lanthaler commented Nov 20, 2012

As I just found out, the playground fails completely in this example as it collapses the two lists into one: http://bit.ly/UQG2C7

Owner

gkellogg commented Nov 20, 2012

Te problem with this example is that it subtly depends on narrow specifics of the term ranking algorithm.. We might just consider this non-conforming and leave it up to the processor to select one; there's no obviously right answer. Alternatively, just make sure that the choice (and the examples) are appropriate for the specified algorithm, which this is not.

PROPOSAL 1: -1, lists of lists are not well formed, so Postel's rule applies.

Member

lanthaler commented Nov 20, 2012

Why should such a list be non-conforming? Why is this an "in-appropriate choice for the specified algorithm"? This is not a list of lists but there are two separate lists in this example.

Owner

gkellogg commented Nov 20, 2012

What could be non-conforming is the selection of lists having multiple languages along with language maps. If not conforming, it's certainly a pathological corner-case.

Regarding the example, I noted this in d1b3ad3 and in this email.

The ranks in this test add up as follows (using the spec'ed algorithm):

first list, term 1/term 2: 13/7
second list term1/term 2: 11/10

This is why the example isn't appropriate. If you definitely want to have term 2 selected for the second list, it should be less ambitious as to how it turns out. If the point of the example is to show that your alternate algorithm is better, fine, but if we have tests, they should be consistent with the algorithm specified.

In any case, this is like re-aranging armchairs on the Titanic. It's a corner case, and there's no absolutely right answer.

Member

lanthaler commented Nov 20, 2012

We are not discussing language maps in this example.

I know what the spec'ed algorithm sums up to, and in my opinion it shouldn't sum up to these numbers. If you use the following context it would in fact separate the lists.

{
  "@context": {
    "term1": { "@id": "http://example.com/term", "@container": "@list", "@language": "de" },
    "term2": { "@id": "http://example.com/term", "@container": "@list", "@language": "en" }
  }
}

I can't see any compelling argument why the context above should yield a different result than the context below:

{
  "@context": {
    "@language": "de",
    "term1": { "@id": "http://example.com/term", "@container": "@list" },
    "term2": { "@id": "http://example.com/term", "@container": "@list", "@language": "en" }
  }
}

If the point of the example is to show that your alternate algorithm is better, fine, but if we have tests, they should be consistent with the algorithm specified.
In any case, this is like re-aranging armchairs on the Titanic. It's a corner case, and there's no absolutely right answer.

Well, in my opinion the most important tests are the ones that test corner cases. I tried several times to have a discussion about the algorithms but it seems to be impossible because "there are already two implementations" and that's "how it is currently specified" so I'll just give up on this.

Owner

gkellogg commented Nov 20, 2012

If Term Tank needs to be re-written any way to take into consideration language maps and other resolutions, then I think it's fair game to play with and improve. If it doesn't need to be updated, I'd just say leave wee enough alone.

Member

lanthaler commented Nov 27, 2012

RESOLVED: When compacting lists, the most specific term that matches all of the elements in the list, taking into account the default language, must be selected.

lanthaler added a commit that referenced this issue Dec 8, 2012

lanthaler added a commit that referenced this issue Dec 8, 2012

lanthaler added a commit that referenced this issue Dec 20, 2012

Member

lanthaler commented Dec 20, 2012

I've updated all algorithms, unless I hear objections I will close this issue in 24 hours.

@lanthaler lanthaler closed this Dec 21, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment