Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithm specification updates by editors #218

Closed
lanthaler opened this issue Feb 12, 2013 · 22 comments
Closed

Algorithm specification updates by editors #218

lanthaler opened this issue Feb 12, 2013 · 22 comments
Labels

Comments

@lanthaler
Copy link
Member

This issue is an attempt to streamline the editing of the API spec. I'll start by creating a todo-list with the feedback Gregg, David L., and I gave so that we can thick of things already that have been addressed.

@lanthaler
Copy link
Member Author

Gregg's feedback:

  • I think the wording is a bit folksy, and should use more spec-like language in describing what is to be done; but this is not critical for this version of the document. Section uses "we" quite a bit, which I don't think is quite right. I'd rather see it written using "After searching is complete ...", rather than "When we have finished searching ...". It should also probably have an Algorithm section, even if it is brief.
  • The RDF algorithms indicate that more work is necessary, but I see that issue Define from/to RDF algorithms in terms of standard RDF #125, requiring that to/from RDF algorithms be defined in terms of standard RDF has been closed, so it might be okay after all. The issue "This algorithm needs some clarification on its details", itself needs some clarification, as I don't see what's not clear. However, I think we can add to 5.18 some conversion examples, that go through some of the cases.
  • This is also a good place to note that relative IRIs are resolved against the enclosing document, so that remote contexts are opened property relative to either the referencing JSON-LD document, or to another remote context.
  • Rather than calling sub-sections "Problem", we should consider something like "Purpose".
  • Ii like having a "General Solution" sub-section, to give high-level overview of the basic operation of the algorithm is. Both of these sections should be marked as "informative".
  • I find that saying "xxx equals null", as this is often not how it provided in many contexts, or equivalence has a different meaning. Also, there is really only one "null" value. I prefer to say "xxx is null". I think this wording is better for other cases, such as "If iri does not equal @type" => "If iri is not @type".
  • In the Create Term Definition Subalgorithm algorithm, what is the purpose of the special "@preserve" keyword? It is not defined or used anywhere else AFAICT.
  • Is it reasonable for a property generator to have @type as one of the @id values? Perhaps not (5.4.3 step 12.2.2.2).
  • Otherwise .. otherwise is awkward, we can always smooth this out post LC.
  • Normative statements in the algorithm may duplicate such statements in the Syntax grammar, for example "Otherwise id must be a string" (5.4.3 step 12.3).
  • We should probably add something to Context Expansion to error if "key" is a keyword (section 5.3.3 step 3.6) because from the grammar "A term MUST NOT equal any of the JSON-LD keywords".

@lanthaler
Copy link
Member Author

David's feedback

  • Introduction: Text indicates doc is for authors and developers that want details or for implementors. This is leaving out the common case of regular API users. Perhaps the Introduction should link to the Algorithms section and say it's for details and implementors and link to the API section and say it's for regular use.
  • Features: The features link to the associated algorithms and perhaps should also link to the APIs.
  • Examples: The Compaction text mentions compacting numbers. Do we want toi ncrease example complexity to show off features like this? Its a general issue for all examples. There are many use cases that would be informative but are only living in test cases right now.

@lanthaler
Copy link
Member Author

Markus' review (including Dave's and Gregg's replies).

2. Features

  • Considering the discussion on the mailing list, I would propose to keep using FOAF in these examples.

2.1 Expansion

  • 1st paragraph: Emphasize removal of context, not switching between contexts

Dave: Maybe. I like how the main purpose of removing contexts is explained
now. JSON-LD is often really, IMO, about being able to express the same
data using different contexts, so it makes sense to talk about
everything in that ... context.

Gregg: Hmm; the purpose of expansion is clearly to remove a context. Alternative contexts only come in when re-compacting.

  • Under Example2: “The difference is in their context information” – that’s wrong, the contexts are the same.
    I would thus propose to replace the paragraph with the intro from the corresponding paragraph in index.html

Dave: Are you sure? The two contexts look different to me, but we may be
talking about something else. The context in Example 1 has the term
"name", whilst Example 2's context does not.

Markus: You are right, "name" is missing. But it's very difficult to see because other things are highlighted and even if "name" would be there, it wouldn't make a difference.

2.3 Flattening

  • I would like to mention that all blank nodes will be labeled. This is an integral part of flattening. It makes the examples slightly more difficult but certainly helps to understand the algorithms later on.

Dave: This was removed because blank nodes and blank node label is a fairly
complex issue, IMO. I didn't think it was appropriate to be in the
introduction. Perhaps it needs more highlighting later on?

Gregg: All blank nodes are already relabeled in expansion, AKAIKR, so I'm not sure why this stands out as being any different.

Unlabeled blank nodes are not labeled in expansion. Only labeled blank nodes get relabeled. So I think it's important to highlight that straight in the introduction. I think in the API spec we can work under the assumption that the data model is clear.

5. Algorithms

5.2 Remote Context Resolution

  • I really like the idea in general but would change it slightly. Instead of pre-processing the whole document I would call the algorithm as first step whenever a local context is being processed. So it would be a sub-algorithm of context processing. It should also be described more formally and include error handling, mention that other data in the remote context document is ignored, etc. Slightly adapting the steps under Context Processing 2.2 in index.html will do the job.

Dave: Well, one of the reasons for getting it all out of the way to begin
with, is to avoid introducing a lot of asynchronous complexity with the
other algorithms. If someone were implementing a new processor in an
asynchronous environment, and they tried to follow the algorithms, they
might find control flow to be a pain. Hopefully they'd notice that if
they got remote context resolution out of the way first their work would
be easier, but why not give them a headstart in solving that problem for
them? I do think both approaches should be mentioned either way.

Gregg: For my part, do synchronous processing anyway, and HTTP caching can eliminate most sources of latency; I find it better to keep it in the cor Expansion algorithm, which works well in my case. I'm > agnostic about separating the algorithm out, but it sounds like an implementation-specific optimization.

The API call itself is asynchronous, whether the algorithms it calls are doesn't really matter. In some languages you would spawn a thread, in others you rely on an event-loop, ... So I agree with Gregg that this is, IMO, clearly an implementation-specific optimization. Whether you have one async call before expansion or a number of them during expansion doesn't really matter. Furthermore, the current pre-processing step directly modifies the input document which you tried to avoid in all other operations. Passing around a map of external contexts just makes it more complex to explain.

5.3 Context Processing

  • Steps 3.4.4 and 3.5.4 are unclear. Instead of removing the keys, they are set to true. This is confusing till you read the Create Term Definition Subalgorithm.

Dave: I don't know that it's confusing to mark those keys as having been
defined (their entries in the defined map are set to true). I will say
that the reason why those keys aren't removed is because, in this
rewrite, I tried to take the approach of not modifying inputs as well as
not having to say "copy the input" everywhere. This means it takes an
implementor less mental effort to have to tease out where they can
optimize away copying and where they can't.

.. and yet you modify your input document when you dereference external contexts. As already said above I believe it's much simpler to pre-process each local context and then pass the pre-processed result (i.e. a modified copy) to the context processing algorithm. You can then modify it as you like which would simplify the algorithm.

5.4 Create Term Definition Subalgorithm

  • I’m not sure that it’s a good idea to separate these steps from Context Processing. I also find the introduction quite confusing. Is it talking about the term (the left hand side) or its IRI mapping? What if both sides contain compact IRIs?

Dave: Is the confusion in the choice of the word "term"? I have found some of
the terminology to be confusing myself; we create "term definitions" for
keys that are CURIEs ... are these "terms" or not? The terminology in
the spec indicates that a term is any "short word" defined in a context
that may be expanded to an IRI. Perhaps we should be saying "keys" (in
the local context) before terms are defined? I think this is a general
problem we have in the specification; I know it was there when I was
looking over the old spec during the rewrite. I think we should get some
consensus on our terminology for this moving forward.

Gregg: If a CURIE is on the LHS, it is a term, IMO.

+1, everything on the LHS is a term, it's a (more or less) opaque string. The only time you really look into the string at LHS is when @id is missing in its definition to check if you can expand it nevertheless to an IRI.

  • Step 3 seems to suggest that it is the left hand side but it is not clear how to proceed after that step.

Dave: This may just arise from the general confusion w/"term". All you do is
invoke the create term definition algorithm recursively to define the
dependency (as step 3 states), and then you move onto step 4. There's
nothing special going on there.

And that's exactly what confuses me. Just by looking at the LHS you don't know if it's a dependency. It's a dependency if a compact IRI is used in @id or if @id is missing and the LHS is a compact IRI.

Gregg: The algorithm also causes curies to inherit the term definitions of the prefix, which isn't called out in the Syntax spec, AFAIKR.

  • Step 5 suggests that keyword aliases are managed separately. Why is then just the alias removed and not a previous IRI mapping?

Dave: Because the IRI mapping will be overwritten when you define the term.
I'm not sure if this needs to be made more clear with a note?

Markus: That's exactly the point. Just the IRI mapping is overwritten. The algorithm might return in step 8.3 never without resetting the type, language, or container mapping.

Dave: Yes, this is for supporting the framing algorithm and it should be
removed at this time.

  • Step 8.2 vocabRelative must be set to true; the result might be a keyword, but it won’t be put handled properly as keyword alias as in 8.1

Dave: I thought you are not allowed to use keyword aliases in the context. Has
this changed?

There are two issues: One is that vocabRelative must be set to true because all the IRIs in the context are vocabRelative. The other one is that the result of the IRI expansion operation might be a keyword but the description of the algorithm doesn't handle it correctly. Your implementation does it - and it also supports keyword aliases (expand-0051).

Regarding keyword aliasing: we agreed that you cannot use keyword aliases as keys of term definitions, they have to be @id/@type/@language/@container. You can, however, use keyword aliases as values. Just as you can use other terms or compact IRIs. I think the whole separation of keyword aliases from term definitions causes more confusion than it helps. The algorithms shouldn't care (in most cases) and thus, shouldn't separate those mappings IMO.

  • Step 8.4 is empty
  • Step 11, really? Including type mapping etc. What’s the point?

Dave: Inheritance from dependencies. You may add/override a container, type,
or language mapping. For example, if you have a term that is a CURIE
that defines only a container, language, or type mapping, you want to
inherit the IRI mapping from the prefix's definition. It sounds like we
should add several test cases to the test-suite for this.

No, you don't inherit anything. Quote from the syntax spec:

"Duplicate context terms are overridden using a last-defined-wins mechanism"

The only thing that's special when you use a compact IRI at the LHS is that you might omit the IRI mapping since it can be calculated automatically.

  • Step 12.2.2.2 Property generators should just expand to IRIs, not @type, this should also be checked.

Dave: This has been fixed ("@type" is now disallowed as a property generator
IRI mapping).

Why a special case for @type? @graph is disallowed as well, just as dlfkj if it doesn't expand to an IRI.

  • Step 12.3 Why special case for @type? vocabRelative flag missing again

Dave: The special case has been removed, it was an artifact from a previous
algorithm. The vocabRelative flag is missing for the same reason as
above, no keyword aliases are permitted in the context

Talked about keywords above. What about vocabRelative IRIs? See expand-0052.

  • Step 14 Where comes the prefix mapping from? Active ctx, local ctx?

Dave: From the active context (the step text is correct). All dependencies
are resolved at step 3.

  • The order of the algorithms is a bit confusing IMHO. It clearly makes sense to put context processing first, but then the IRI expansion algorithm should directly follow it. Alternatively the order of index.html could be used, i.e., the order of their dependency: expansion -> context processing -> iri expansion

Dave: I definitely think the order should not be what is in index.html.
However, if it makes more sense to juggle the order some more in
alternate2.html I'm all for that. Context processing should definitely
come first. We should get some consensus on the order afterwards.

Gregg: I think IRI/Value Compaction/Expansion should come after context processing. For my, EvaluationContext is a class, which implements these algorithms as methods, this is logical to me.

5.6 IRI Expansion

  • The defined map may be more efficient but I believe it’s easier to understand how cycles are detected if you pass an array of the currently being processed terms, e.g., [ A, B, C ] if C then requires B you can simply detect the recursion if it’s already in the algorithm. Don’t know, maybe it’s also just a personal preference. I don’t have a strong opinion about this but wanted to mention it.

Dave: The map was also used because of the way the algorithm was rewritten --
it stores a tri-state not defined/defining/defined, not just
defining/not defined. This is used to avoid infinite recursion and is
simpler to understand, IMO. I thought showing that there were three
states was more clear than just having an array. Clearly we differ here;
I don't really mind which way it is explained in the algorithm so long
as the consensus is that it is easiest to understand (or maybe it just
doesn't matter that much).

Gregg: I sort of liked this too.

As said, I don't have a strong opinion about this. If both of you agree its simpler, let's keep it.

  • Problem desc: The algorithm doesn’t care if it’s already an absolute IRI, it tries nevertheless. The given value might be an absolute IRI, we don’t know if it is at this point.
  • General solution: I would shorten it, maybe just keeping the first paragraph. The rest is just as complicated to read as the algorithm itself.

Dave: I remember wanting to shorten that particular general solution as well.

Algorithm

  • Step 1: Also keywords should be returned as is

Dave: This is handled in step 5 to better consolidate, after keyword aliasing
has been handled.

  • Step 2 & 3: This should be made a bit clearer. It’s quite difficult to understand what’s going on. The point is, that terms/prefixes in the local ctx that haven’t been processed yet, are processed first.

Dave: Agree. We could change the language a bit here to clarify.

  • Step 4: no need to say “explicitly ignore value”, this is confusing. Just return null.

Dave: I disagree. I wanted to know why null was being returned and I wanted to
know it right there in the algorithm without having to go find out what
was going on via another document or statement somewhere. I think it's
important to have notes within these algorithms about what the special
meaning of things, like a "null mapping", are.

OK, but then the note should be understandable. I wouldn't understand it. We could change it something like the following sentence (maybe someone has an idea how to make it a bit less clumsy).

"If value has a null mapping in active context return null as it has been explicitly marked to be ignored."

  • Step 6 onwards: isAbsoluteIri is confusing, could be a bNode ID as well.

Dave: There are probably some other temporary variable names that could be
better, in addition to this one.

  • Step 7: Why not just return the mapping (given vocabRelative is true)?

Dave: It may be a blank node, which needs renaming. This is consolidated into
a single step below.

  • The whole branching around isAbsoluteIri is really confusing.
    Maybe it’s simpler to use the algorithm index.html and invoke the Create Term Def. algorithm there instead of recursing into itself (even if I find that simpler since the algorithm is self-contained).

Dave: I definitely prefer keeping the algorithm self-contained. Maybe we can
clean up the branching in another way. I would prefer that we don't
repeat ourselves too, if we can avoid it. I'm ok with the general idea:
get the value associated with the active context, if it's a keyword or
null return, if you got something else don't look for a CURIE, otherwise
you got nothing, so you have to check for a CURIE. Once you've dealt
with the possible CURIE, do blank node renaming or apply vocab or base.
I think that flow is easy to follow and it's what we should go with, we
might just need to clean up the algorithm text a bit.

Gregg: Another thing, it should be clear what the document base is when running this. When expanding a context IRI, it should be the referencing context.

5.5 Expansion

  • Step 3.1: The @context member is not removed

Dave: Yeah, I see that now. Perhaps we want to instead skip that key further
down (rather than removing it, to keep with the notion of not having to
copy inputs).

Or we create an array of the keys, sort it and then iterate over that array as we do in the compaction algorithm.

  • Step 3.2: This wouldn’t be needed if @graph would be handled accordingly under 3.4.3 (see index.html 4.3.2.8) and would simplify 3.4.6, 3.9, and 4.1

Dave: We should look into this. From what I recall, the algorithm in
index.html was buggy. One of the test cases I added might be related to
this as well (it didn't work w/your playground). There may be another
that is required to further expose the problem.

  • Step 3.4.2: ... or a bnode ID

Dave: Yes, just another case related to bnode IDs not being IRIs. It would be
nice to have a catch-all identifier name that includes bnode IDs. Either
that or we can use "if the expanded property is a string that contains a
colon", but I find that somewhat ugly and doesn't do as good of a job
explaining what's going on.

That's all we really do. We check for a colon. We should say it because saying "if it's an absolute IRI" is a bit a stretch if we never validate IRIs.

Gregg: Yes, let's mint a "term" for this.

Would be fine with that as well. But what do we call it? A string containing a colon? :-P

  • Steps 3.4.3.x: IMHO it would be much simpler to expand the values directly here (just as in steps 4.3.2.x in index.html) instead of having to expand the property again in value expansion to find out that it is @id and then expand using the IRI expansion algorithm. All keywords would then be completely processed after this step.

Dave: The split was made to separate validation from the general processing
and to avoid repeating the same steps for array processing of @type,
etc. In this way @type isn't handled differently from @list or @graph,
both of which recurse. These array-handling steps are not present in the
index.html version (so it doesn't work if you implement it according to
the text because IRI expansion doesn't handle arrays). You can either
recurse to handle arrays or you can repeat yourself; I tried to keep
things consistent with recursing for @type ... which also meant it
seemed cleaner to keep validation consistent and separate from general
processing. I think this is all a matter of style. Of course, if we can
simplify it all without repeating ourselves somewhere I'm all for that.

It may be that it's just always a tradeoff with where we repeat
ourselves. In that case, I'd prefer to just recurse through the
expansion algorithm to let it handle arrays vs. other values and let all
values run through "value expansion". The difference in approach may
again arise from the somewhat amorphous nature of our terminology (eg:
"value").

I just find it easier to do it in one place. When you read the algorithms and come to a point where it says "If expanded property is @type..." it seems natural to handle it completely there so that I can then forget about that case. Otherwise I have to find my way through the algorithm to see how it's really handled. That's the reason why in index.html all keyword processing is consolidated in one place (steps 4.3.2.x). After that, you don't have to think about keywords anymore.

  • Step 3.4.4.2: “and value language value in value” is confusing

Dave: Agree. We should comb over the spec and read it without special markup
like italics, etc. -- for similar issues.

I would also argue that it would be much simpler to recognize variables by giving them names such as languageValue and formatting them in monospace (not orange though) instead of italics.

  • Step 3.4.6: might need some clarification. It might be easier to put it under 3.4.3

Dave: Agree, needs clarification.

  • Step 3.4.11: also this step would be simpler if everything keyword-related would be done under 3.4.3
  • Step 3.5.3: already verified in 3.4.3.5, but @type must be a string and not an array
  • Step 3.9: could be simplified if 3.2 is changed (see index.html 4.6)

Dave: Again, these simpifications come with tradeoffs. As usual, we should do
whatever the consensus is.

  • insideList flag can be eliminated by checking looking instead into active property’s container mapping or active property=@list

Dave: We need to check this. That may not be true -- unless a side-effect
somewhere else handles a particular case, perhaps through recursion. I
think we need more test cases in the test-suite for this.

5.7 Value Expansion

As already said above, I think it would be simpler to handle @id and @type expansion directly in the expansion algo. Fewer jumps, fewer branches, shorter algorithm(s) – not a deal breaker though.

  • Step 5: not needed, it’s done again in step 9.
  • Step 10: default language missing
  • Step 12: missing, return result

Dave: The result is returned in step 11, but we can break that out into a
separate step if necessary. The other two steps have been fixed.

5.8 Label Blank Nodes Subalgorithm

  • The isId flag isn’t needed, @type is relabeled already in IRI expansion
  • When will step 4 ever be reached (and do something, not reassigning the same bnodeId it already is)?

Dave: This has been removed. It used to be used when blank node relabeling
could be "triggered" to be turned on when encountering a property
generator instead of having to always be on.

5.9 Generate Blank Node Identifier

  • Proposal: Get rid of the configurable prefix, we never change it anyway. Just hardcode _:t or, even better because it’s easier to remember, _:b.
  • Briefly describe what the identifier map is used for, i.e., keeping track of blank node reassignments.

Dave: I agree with this. The reason why the prefix was configurable is because
this algorithm is reused by RDF graph normalization (no longer part of
this document).

5.10 Compaction Algorithm

  • I didn’t mention this for the other algorithms, but I think it would be simpler to reorder the branches so that the longest comes last. In this case e.g., first process scalars, then arrays, and finally objects (instead of array, object, scalar which results in a lonely one-liner at the end). Another advantage is that the steps in object-processing would be a level higher since the if-block is not needed anymore, there won’t be any other case and the two previous blocks returned already (see step 2-3 in index.html).

Dave: I think it's easier to think of "scalar" as "not the other things". I'm
not sure if other developers will tend to agree with me or not. I have a
vague feeling (no hard data), based on my experience with reading other
people's code, that they prefer to check for arrays and objects first
... and then fall back to primitives. This is cleaner to do in certain
languages. However, I do prefer that shortest steps come first. I could
go either way here.

  • The algorithm needs to be passed the inverse context otherwise it can’t pass it to the subalgorithms (or they have to re-create it constantly from the active context)

Dave: I may have been intentionally vague on how you get to the inverse
context. Perhaps that should be changed.

  • Step 1.3: The compactArrays option is not used

Dave: This needs to be added.

  • Step 2.2: I don’t think we need to explain a shallow copy is created and certainly we don’t need to distinguish between array values and non-array values, the point is that each value is copied but the references contained in that copy remain the same references.

Dave: The reason for the distinction was to highlight that the shallow copy is
actually two-levels deep. If we can clean this up without losing what's
going on, that would be great.

Does it really have to be two levels deep? I don't think so. You just need the copy because you remove property or property values (references in case of objects) from shallow. So you create a copy of the array but not of element it contains.

  • Step 2.4: Say keys from shallow and mention that this has to be done because keys can be removed from shallow in the following loop

Dave: I think it says from "element", not from "shallow".

I think I wasn't clear. I meant it would be clearer if we would say we "initialize keys to an array containing all of the keys in shallow, ordered lexicographically" and mention that we need to do this because keys from shallow can be removed within the loop - that's the reason why you create the array containing all of them but have the check in 2.5.1.

  • Step 2.5.3.1: The IRI compaction algorithm has no documentRelative flag; it is also not needed as vocabRelative can be used to determine if the value is being used in @type (true) or @id (false). All IRIs are document-relative.

Dave: This has been corrected.

  • Step 2.5.3.2.2: Drop documentRelative
  • Step 2.5.5.1: Is activeProperty the same as “active property” or not? It’s confusing to have two variables with the same name

Dave: Not the same. This has been fixed.

  • Step 2.5.5.1 (and in general): Passing parent to the IRI compaction algo which then passes it to term selection just to eliminate some of the potential property generators because not all IRIs are in shallow doesn’t make sense to me since it doesn’t optimize much but makes the algorithms much harder to understand. You can easily do that check in the Find and Remove Property Generator Duplicates Subalgorithm and I think that’s the place where it belongs.

Dave: I think it makes more sense to consolidate how term selection happens.
It's easier, IMO, to understand that you pass in some parameters to a
term selection algorithm -- which then gives you back the term you are
supposed to use. I think that's better than it maybe returning you a
list of terms to try out yourself.

But you don't check for property generator duplicates. So how do you know if it's the right property generator? Just because the properties exist doesn't mean the values exist as well.

  • Step 2.5.5 and substeps: There’s a bug that’s a bit difficult to explain, please see compact-0046 which I just added.

Dave: This is fixed.

  • Step 2.5.6.2: Active property could be an array of potential property generators, how do you choose the container then? Actually I went on and saw that property generators are not handled correctly at all. I added another test, compact-0047 which shows the problem. Given this bug, the rest of the algorithm works under the wrong assumptions. So it doesn’t make much sense to review the remaining steps.

Dave: This has been fixed.

5.12 Inverse Context Creation Subalgorithm

  • Overall: I find it confusing to “initialize [variable] to the value associated with the [key] key in [map]. What you do here is to store a reference to that member, you don’t assign its value to [variable].

Dave: I'm not sure what language we want to use to clarify what's going on
here. We could easily be more verbose to hide what is really an
implementation detail, but it is also just used to help people
understand the steps by breaking them down a bit. I'm not sure what we
want to do with these situations in general.

This is definitely not an implementation detail because the algorithm wouldn't work at all if it's not a reference.

  • Step 3.3.3: It would be more efficient to do this outside of the loop
  • Step 3.3.4: No need to initialize the defaultLanguage key here, this is done again in 3.3.6.3.1
  • Step 3.3.6: I would find it easier to understand if the checks would be “If there’s a type mapping…”, “Otherwise, if there’s a language mapping”, and finally just “Otherwise”. This would also eliminate some nestings.

These sound like good suggestions.

  • There’s a bug since the algorithm doesn’t distinguish between a term with no language (and no type) mapping and a term with a language mapping set to null. I’ve added compact-0048.

Dave: I think it does make that distinction, but I think it makes in an
opposite direction from the test case. I'm not sure that test-case has
the desired output. It seems like we'd prefer to select the null
language term over the no-language term because it is more specific.

5.11 IRI Compaction Algorithm

  • As already said in the Compaction algorithm, I would suggest dropping the “parent” parameter. Instead I would pass the inverse context explicitly. Depending of the outcome of ISSUE-204, also the base IRI is required.
  • Step 5.4: This is related to the bug described above, numbers, e.g., have no language mapping. This is different than strings with a language that equals null.

Dave: I'm not sure I agree with this approach. I'll defer to the group,
though. It's a very easy change if we decide to go the route you're
proposing.

  • Step 5.4.4 and substeps: At least @id objects are not handled properly; I’ve added compact-0049 for this bug. I won’t review the rest of these sub-steps but continue directly with compact IRIs.
  • Step 7.4 and 7.5: Why introduce the variable isUsableCurie here?

Dave: The bug has been fixed. That variable is meant to help make clear when a
CURIE can be used and when it can't. We could do the same in prose and
remove the variable.

  • I can’t see how the algorithm works if there are potential property generators that turn out to not match because the data doesn’t match. I’ve added compact-0050 as a minimal test. The point is that the fallback terms are never returned. As I understand the algorithm, in this case only “gen” would be returned.

Dave: This has been fixed.

  • Depending on the outcome of ISSUE-204, compaction to relative IRIs is missing.

5.13 Term Selection Subalgorithm

  • Drop parent parameter and the corresponding “optimization”
  • Sometimes @null is used, sometimes @none (also in the other algorithms). I find this confusing as it is very difficult to remember what’s in use where.

Dave: I thought it was very important to distinguish between the two. Saying
something is null in JSON-LD explicitly means you don't want to use it;
whereas "@none" is used when you haven't said anything at all.

I don't think that's the case here. If I said nothing, the key will be "@null". If I said it is null, the value of the according will be "@null".

  • Step 3: Missing space “underwhich”
  • Step 4: Typo in typeOrLangageValue. In step 3 typeOrLanguageValue is modified, in step 4 then a new variable “preferred” is introduced. Why not reuse typeOrLanguageValue? Why not do those steps directly in IRI compaction? Same question for step 1

Dave: We could modify typeOrLanguageValue to be an array when the algorithm
begins if people think that's clearer than putting it into an array of
preferred values to iterate over. Perhaps renaming the variables, in
general (you also suggest this), would help resolve this issue.

  • Overall, I’m not sure if the introduction of so many variables with very similar names is helpful (e.g. typeOrLanguage, typeOrLanguage map, typeOrLanguageValue map)
  • Only one property generator is ever selected but it is not checked whether the values really match so that the property generator is applicable. There’s also no way to invoke the algorithm again saying to ignore that specific property generator in the second run. So the algorithm doesn’t really work.

5.14 Value Compaction

  • Substeps of 1: The whole preserveIndex stuff is quite complicated. Why not just create a copy of value and then modify that copy as the algorithm in index.html does?

I didn't think it was that complicated. I'll have to compare the two
algorithms again.

  • Step 2: What if there’s also @Index? Why expand the value? Why IRI expand an object? It’s already expanded, we are compacting here. I think it should mean expand active property. Typo “passing passing”
  • PROPOSAL: Keep the algorithm in index.html which is half as long. Modify it to work on a copy of value if necessary or construct a copy step-by-step (this should require just minimal changes).

Dave: I think the case where @index is in a subject reference may not be
handled. Sounds like we need a test-case for that. Also, the algorithm
in index.html doesn't seem to handle compacting the IRI in that case.
The length of the algorithm may have more to do with
understandability/clarify than anything, but I'll have to go back and
look. Obviously, if the extra verbosity adds nothing we should shorten it.

But the compaction algorithm recurses into that object and compact the value @id (the place where you just validate its value).

5.15 Find and Remove Property Generator Duplicates Subalgorithm

In contrast to the algorithm in index.html, the algorithm accepts just a single property generator. So they do something different.

The algorithm in index.html does the following:

  • Check for every potential property generator if all of its IRIs are included as properties in element
  • If so, check if they all contain the passed value
  • If so remove those duplicate values (and potentially the whole property if no other value is left)
  • Return the first matching property generator for which all duplicates have been found
  • If no matching property generator was found, return the fallback term, compact IRI or IRI.

The algorithm in alternate2.html does the following:

  • It checks for a single’s property generator IRIs if element contains the duplicates
  • If a duplicate is found, it is deleted straight ahead (not checking first verifying whether all duplicates are really there) – this algorithm doesn’t seem to work on a copy of element so data is potentially lost
  • It doesn’t return anything

Since the algorithm is just invoked in the compaction algorithm the remaining work must be done there but it isn’t. Here’s what done there:

If there is a term definition for activeProperty in active
context that is a property generator, then invoke the Find
and Remove Property Generator Duplicates algorithm, passing
active context, shallow for element, expanded property,
expanded item for value, and activeProperty.
  • So I cannot see how this would possibly work. I would thus propose to keep the algorithm of index.html but include steps 2.3.1.X from alternate2.html to describe in detail what’s a duplicate and what’s not.

lanthaler added a commit that referenced this issue Feb 17, 2013
lanthaler added a commit that referenced this issue Feb 17, 2013
…algorithm

Fix prefix to "_:b" which is easier to remember as "_:t". Improved explanation slightly. Remove note about keeping it in active context, which probably caused more confusion than it helped.

This addresses #218.
lanthaler added a commit that referenced this issue Feb 17, 2013
lanthaler added a commit that referenced this issue Feb 17, 2013
lanthaler added a commit that referenced this issue Feb 17, 2013
This addresses #218.
lanthaler added a commit that referenced this issue Feb 18, 2013
This illustrates a bug in the current algorithms. See also #218.
@msporny
Copy link
Member

msporny commented Feb 18, 2013

@lanthaler This is a brilliant way of dealing with feedback on the spec, especially when it's complicated. Great job organizing the info in this way!

@gkellogg
Copy link
Member

To RDF algorithm updated in 6d8e825 using RDF Concepts (triples and datasets, not quads) and after Node mapping.

@lanthaler
Copy link
Member Author

Here's a potential re-ordering of the algorithms:

  • Context Processing
    • Context Processing Algorithm
    • Create Term Definition Subalgorithm
    • IRI Expansion (perhaps renaming it to something like "IRI and Keyword Resolution" would make it clearer that it belongs to ctx proc.)
    • Generate Blank Node Identifier <=== where should we put this algo?
  • Expansion
    • Expansion Algorithm
    • Value Expansion
    • Label Blank Nodes Subalgorithm
  • Compaction
    • Inverse Context Creation Subalgorithm
    • Compaction Algorithm
    • IRI Compaction Algorithm
    • Term Selection Subalgorithm
    • Find Property Generator Duplicates Subalgorithm
    • Value Compaction
  • Flattening
    • Flattening Algorithm
    • Node Map Generation
  • RDF Conversion
    • Convert to RDF Algorithm
    • Node Map to RDF Subalgorithm
    • Object to RDF Conversion Subalgorithm
    • List to RDF Conversion Subalgorithm
    • Convert from RDF Algorithm
    • Data Round Tripping

@lanthaler
Copy link
Member Author

Something else: I think we should get rid of the "Purpose" subsections. Most of the time they are just restating what has already been said in the introduction of the specific section. In some section the general intro is missing.

E.g., Create Term Definition Subalgorithm: This algorithm is called from the Context Processing algorithm to create term definitions in a new active context.
Purpose: A term definition must be created for the given term.

I will spend a couple of hours editing the specs tomorrow. So if you have a couple of minutes to comment on this and the previous comment (or anything else in this thread for this matter) would be much appreciated. Thanks!

//cc @gkellogg @dlongley

@msporny
Copy link
Member

msporny commented Feb 26, 2013

-1 on removing the "Purpose" subsections. It's editorial, and we can do it later. I still find them helpful, if not for merely re-stating what's in the introduction sections in a more concise way. If the group agrees to remove them in the future, we can do so without risking another Last Call.

@dlongley
Copy link
Member

I guess leave "Purpose" in there for now (would be my view). I feel like it adds some necessary structure to help people follow what's going on and to tie into the general solution; I could possibly be persuaded otherwise, but let's skip removing it for now.

As for the reordering, I think it's good other than moving the subalgorithms that are called from various places (not just context processing) into their own sort of "Basic Utility Algorithms" (Poor name, but don't have the time to think of a better one at the moment). I agree that they should be mentioned early (I believe you made this point, Markus)... I would just separate them into their own group so people don't think they are only used during context processing.

lanthaler added a commit that referenced this issue Feb 27, 2013
This addresses #218
lanthaler added a commit that referenced this issue Feb 27, 2013
lanthaler added a commit that referenced this issue Feb 27, 2013
This is probably controversial, thus the separate commit so that we can easily revert it. I think almost all algorithms are "subalgorithms". Just marking a few as such doesn't improve clarity. The approach I'm taking here is to add the word "Algorithm" to the main algorithms and drop it (as well as "Subalgorithm") from all others.

//cc @dlongley @gkellogg

This addresses #218.
lanthaler added a commit that referenced this issue Feb 27, 2013
lanthaler added a commit that referenced this issue Feb 27, 2013
lanthaler added a commit that referenced this issue Feb 28, 2013
dlongley added a commit that referenced this issue Mar 5, 2013
lanthaler added a commit that referenced this issue Mar 13, 2013
Using a span with a class is a lot of work when typing. I think the i-tag does the job. Since it's not used for anything else, it is also easy to replace it with something else if needed.

For the time being I didn't change the style of variables - I couldn't find anything that looks good :-)

This addresses #218.
lanthaler added a commit that referenced this issue Mar 14, 2013
The only change was to disallow relative base IRIs. The test suite has been updated accordingly.

This addresses #218 and #223.
lanthaler added a commit that referenced this issue Mar 14, 2013
lanthaler added a commit that referenced this issue Mar 14, 2013
lanthaler added a commit that referenced this issue Mar 14, 2013
.. to "For each key-value pair language-language value in value". Same for index maps.

This addresses #218.
lanthaler added a commit that referenced this issue Mar 14, 2013
lanthaler added a commit that referenced this issue Mar 14, 2013
@lanthaler
Copy link
Member Author

I've had a look at all the algorithms today. Here are some concrete proposals of what I would like to change (since I believe some parts may be controversial):

PROPOSAL 1: Stop processing as soon as an error is encountered. This might imply that we remove a number of error conditions (haven't checked) and replace them with some automatic recovery. Everything that has an error constant, should stop processing IMO.

PROPOSAL 2: Do not handle keyword aliases separately. Currently the algorithms are written in a way that suggest that keyword aliases are treated different from other terms. I would like to consolidate that (while ensuring that the fact that the result might be a keyword alias is mentioned).

PROPOSAL 3: Do not use value expansion to handle the value of the following keywords: @id, @type, @value, @language, @index. If you look at the expansion algorithm, you'll see that it just validates the value of those keywords. It is difficult to figure out how they are handled (the value, which is most of the time a string), is passed again to the expansion algorithm, which then passes it to the value expansion algorithm, which then has to expand the property again to know to which keyword the value belongs). It would be much simpler to call IRI expansion directly for @id and @type. The other values don't have to be processed. This would also simplify value expansion

PROPOSAL 4: Do not use @none in inverse context but just @null. Dave and I have had quite some discussions about this (see also previous comments above). In the end, it is a personal preference I would say. My argument is that @none and @null are difficult to distinguish and that it is not obvious which is used when. Dave's argument is that having just @null conflates things (no language vs. language set to null). Technically both work. So I would like to hear the opinion of the rest of the group and we just choose what the majority of the group thinks is the better choice.

PROPOSAL 5: Add a small section explaining which algorithms the various API calls invoke (and how). I'm not sure we really need this but I thought I bring it up nevertheless.

@gkellogg
Copy link
Member

PROPOSAL 1: No opinion on this yet, as most other RDF processors attempt to continue to produce triples even after encountering a problem. Mine only stops if run in validation mode. I could support this if it resulted in some substantial simplification of the algorithms.

PROPOSAL 2: +1, I think keyword aliases should be handled like any other terms where possible.

PROPOSAL 3: +1

PROPOSAL 4: +0.5 In general, fewer keywords is better. If the results are equivalent, and we can do with just @null, then I'd say go for it.

PROPOSAL 5: +0 I'm not sure we need this.

@dlongley
Copy link
Member

PROPOSAL 1: +1
PROPOSAL 2: +1
PROPOSAL 3: +1 if the end result is actually cleaner (not too much repetition that replaces simpler recursion)
PROPOSAL 4: -1 @none more naturally expresses what the term definitions mean and it simplifies the implementation/number of lookups
PROPOSAL 5: +1

lanthaler added a commit that referenced this issue Mar 20, 2013
... by processing keywords directly.

This addresses #218.
lanthaler added a commit that referenced this issue Mar 21, 2013
The difference is that Value Compaction doesn't create new objects but just tries to compact values to scalars. Keyword aliases are thus handled directly in the Compaction algorithm.

This addresses #218.
lanthaler added a commit that referenced this issue Mar 21, 2013
Previously, the algorithm didn't have a dedicated parameter but used language like "if array compaction has been requested" without specifying how that might be done. I could have used a variable but I choose to directly link to the corresponding option in JsonLdOptions instead.

@gkellogg, let me know if you think that couples the algorithms and the API too much.

This addresses #218.
lanthaler added a commit that referenced this issue Mar 21, 2013
Both algorithms are short and there are no recursions. It's easier to read them when they are together instead of being in separate sections. I've also clarified Object to RDF Conversion and List to RDF Conversion a bit.

Perhaps also the List to RDF Conversion should be folded into the Convert to RDF Algorithm!?

/cc @gkellogg

This addresses #218.
lanthaler added a commit that referenced this issue Mar 22, 2013
lanthaler added a commit that referenced this issue Mar 22, 2013
There might be some cases where we wanna switch to an automatic recovery instead. If so, please propose that directly in issue #218.

This addresses #218.
@lanthaler
Copy link
Member Author

I've started to implement the proposals above

PROPOSAL 1: eebdf14
PROPOSAL 2: 1e3c3eb
PROPOSAL 3: 0383dcb
PROPOSAL 4: nothing changed as Dave -1'ed it, I still believe it would simplify things but we need consensus
PROPOSAL 5: nothing added yet

@lanthaler
Copy link
Member Author

Here's what I think needs still be done for the API spec:

Intro, Purpose and General Solution is missing for

  • Generate Blank Node Identifier
  • Flattening Algorithm
  • Node Map Generation
  • RDF to Object Conversion

I'm happy with almost all algorithms, the only thing that we should discuss is

  • term selection (see discussion about @null vs. @none above) and perhaps move the construction of preferred values to IRI Compaction so that term selection is really just the loop.

We should probably also

  • Fold "List to RDF Conversion" into "Convert to RDF Algorithm"? No recursion, just used there.. only 5 (7) steps
  • Add a section on how the API invokes the various algorithms (add it to methods description)
  • We should discuss whether we combine the Intro & Purpose instead of having a separate purpose sub-section. Quite a few algorithms have no intro at all or one that is basically equivalent to the Purpose sub-section.
  • Replace camelCased variables

lanthaler added a commit that referenced this issue Mar 26, 2013
lanthaler added a commit that referenced this issue Mar 26, 2013
.. so that Term Selection is really just the loop.

This addresses #218.
lanthaler added a commit that referenced this issue Mar 26, 2013
and remove "JSON-LD output" which is never used.

This addresses #218.
lanthaler added a commit that referenced this issue Mar 27, 2013
lanthaler added a commit that referenced this issue Mar 27, 2013
@lanthaler
Copy link
Member Author

RESOLUTION: Remove blank node re-labeling during expansion since it is no longer required. The flattening algorithm must still re-label blank nodes.

lanthaler added a commit that referenced this issue Mar 27, 2013
lanthaler added a commit that referenced this issue Mar 27, 2013
lanthaler added a commit that referenced this issue Mar 27, 2013
I hope this is clear enough and would appreciate feedback.

/cc @msporny, @dlongley, @gkellogg, @niklasl

This addresses #218
@lanthaler
Copy link
Member Author

I'm closing this issue now. I don't think we need it anymore. The only thing that hasn't been addresses yet is the "folksy language".

Feel free to reopen it if something needs to be discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants