-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context item → Context value? #129
Comments
I like the idea of supporting the EnclosedExpr syntax on fat arrows for consistency and symmetry between the new syntax. I also like the idea of passing a context value (sequence) to a query. This needs to work with the current focus definition (https://qt4cg.org/branch/master/xquery-40/xquery-40-diff.html#dt-focus), where context item, context position, and context size in the dynamic context form the focus. We would therefore need something that binds to the context item -- such as To achieve this, we can define a context value as a sequence bound to the context item that when evaluated returns the content of its containing sequence. If we wanted a syntax for this, we could have something like |
I would hope that we can effectively replace the term context item by context value, and indicate in the Change Log that the context value was formerly restricted to a single item. As I see no general stumbling blocks, I’m trying to adapt the definitions and parts of the further context references. It could read as follows: 2.1.1 Static Context[Definition: Context value static type. This component defines the static sequence type of the context value.] 2.1.2 Dynamic Context[Definition: The first three components of the dynamic context (context value, context position, and context size) are called the focus of the expression.] The focus enables the processor to keep track of which items are being processed by the expression. If any component in the focus is defined, all components of the focus are defined. [Definition: A singleton focus is a focus that refers to a single item; in a singleton focus, context value is set to the item, context position = 1 and context size = 1.]
2.4.4 Input SourcesAn expression can access input data either by calling one of these input functions or by referencing some part of the dynamic context that is initialized by the external environment, such as a variable or context value. 4.3.4 Context Item Expression
A context value expression evaluates to the context value, which may be an arbitrary sequences of nodes, atomic values and functions. If the context value is absent, a context value expression raises a dynamic error [err:XPDY0002]. 4.4.2.1 Evaluating Dynamic Function CallsExample: Using the Context Value in an Anonymous Function The following example will raise a dynamic error [err:XPDY0002]: let $vat := function() { @vat + @price }
return shop/article/$vat() Instead, the context value can be implicitly bound with a function item expression and the declare context value := collection('items')/shop/article;
=> { ./(@vat + @price) }() …or one by one with the let $vat := -> { @vat + @price }
return shop/article/$vat(.) 5.17 Context Item DeclarationIf a module contains more than one context item declaration and context value declaration altogether, a static error is raised [err:XQST0099]. Should be treated as special-case legacy version of the “Context Value Expression”. 5.xx Context Value Declaration (…parts)
A context value declaration allows a query to specify the static type, value, or default value for the initial context value. In every module that does not contain a context value declaration, the effect is as if the declaration declare context value as item()* external; appeared in that module. If a module contains more than one context value declaration and context item declaration altogether, a static error is raised [err:XQST0099]. During query evaluation, a focus is created in the dynamic context for the evaluation of the “The context value is the value currently being processed” is still misleading. It’s already the original definition “The context item is the item currently being processed.” that I found confusing: It matches well for a singleton focus that’s temporarily created for the evaluation of a predicate or simple map expression, but it doesn’t really fit if the context item is declared in the prolog and globally available in the main module. |
For the prolog and main module, the "currently being processed" will be in the context of the QueryBody, so that should be fine. The ContextItemDecl documentation (and thus equivalently also the ContextValueDecl documentation) as you reference describes how that is determined in that case. |
That second sentence is defeating me. I think the first sentence is something like "there is one and only one focus at any point in time in an entire query, so the context value component of the dynamic context must have the same setting everywhere in the query." But there MUST be a focus, conceptually; there has to be a dynamic context if the query can be evaluated. So I think the second sentence might be "If the context value component has a defined value, it is referred to as the initial context value." |
Yes, I assume it's logically correct. Maybe something like “The context value is available for reference within the corresponding context.” would sound more intuitive to me, in accordance with the definition of in-scope variables.
The original definition is as follows:
I imagine that the set of definitions needs to be rephrased as a whole in order to get consistent again. |
I'm concerned about the sheer number of things in the spec that would need to change. I'm also concerned about performance. If we can't statically infer that "." is a singleton, there's a significant risk that existing code slows down because existing optimisations are no longer safe. At the same time, I recognise the need - for example, when iterating over an array. But I feel it's a bridge too far. |
I’m pretty optimistic that performance shouldn't be something that cannot be resolved. At least that's the impression I got when implementing it by myself, including all corner cases I managed to find. |
In XQuery I think you can always statically determine what expression provides the focus for any occurrence of ".". That's not true in XSLT (and it's not true for XPath if the context item is supplied by the host language, as will often be the case). XSLT is much more heavily dependent on the context item than XQuery is. |
In #149 (comment), some more use cases are given for binding sequences to the context. |
To take an example of the problems this would cause in XSLT, consider named templates. The context item, position, and size are passed through a call-template instruction, so code in the called template has no idea what the context item might be. Which means that we wouldn't be able to determine statically that expressions such as Frankly, I think this is a non-starter. |
I think a better approach might be to introduce a separate concept called the context value, represented by the symbol We could certainly use this to refer to the implicit argument of an expression such as I would also like to do something similar in XSLT allowing a pipeline of instructions to feed into each other - rather like the arrow operator:
where each instruction in the pipe binds its output to the implicit variable This could also be useful for iterating over arrays and filtering arrays:
selects all the members of an array that are sequences of length 3. (I'm not sure what symbol one might use for a mapping operator in XPath; but we could certainly do xsl:for-each-member in XSLT, binding each member to In xsl:for-each-group, we could bind I'm not sure how position() and size() fit into this, or whether there is some relationship between |
You mentioned named templates. Could it be an option to ensure that the input of named templates will always be singletons, similar to the context inside predicates? If performance considerations in XSLT are too troublesome, maybe we can restrict the proposed extension to XQuery, or provide support for syntactic extensions in all languages, but disallow bindings of sequences in XSLT? |
I guess it would be possible to say that if a named template has no context item declaration, then the default is |
Still being somewhat tenacious and enthusiastic about my initial proposal, I fully understand your concerns. I was positively surprised to see that it was close to a no-brainer to integrate the generalization in XQuery, and the result feels clean and conclusive to me. But it didn’t escape me that numerous sections in the specification would need to be revised, as you already indicated. It seems onerous indeed to restrict the generalization to XQuery; and I clearly lack the XSLT perspective. Sigh. Maybe it’s best to postpone it to a potential 4.1 or 5.0 release. Hope dies last… |
I think we should definitely file this under "too difficult". I've just been worrying about the semantics of simple expressions like |
I think having "context value" (or some other name...) as a separate concept from context item is much more workable. It can be represented by The following use cases would then be conceivable, among others:
let ~ := collection('flowers');
//flower[name = 'Psychotria']
(: some $p in petals satisfies $p gt 4 :)
$flowers [| ~/petals > 4 |]
$flowers ?! count(~)
|
As I’ll give some more background on my proposal (sorry in advance for being verbose). I mentioned earlier that it was a “no-brainer to integrate the generalization in XQuery”; I still agree on that, but I actually referred to the specific three syntax extensions in this proposal. The general idea has a longer history. If path expressions are run against databases, it’s often irrelevant if data is located in a single document or spread across a set of documents. Users want to answer the same questions, no matter if one, thousands, or millions of documents are stored in a database. The officially legal way to run queries on multiple documents is to use
All APIs we offer (REST, programming language bindings, command line, GUI) work similarly: The selection of a document (i.e., the selection of the initial sequence of document nodes) is separated from the actual evaluation of the query. We certainly don’t aim to enforce idiosyncratic behavior of our processor to be applied to the official language just to make it legal (it has been existing for too long, and no one ever complained about it, so we’ll stick to it anyway). We just experienced over the years it feels like the most natural choice to do be allowed to run I can well imagine, though, that this is not an issue in XSLT, which focuses on single documents. As the overall implications are too far-reaching, the idea of adding a syntax for binding sequences may be the only realistic choice (@michaelhkay thanks for further pursuing this). For the first use case – binding collections globally – I would press for a declaration to bind sequences (however named) … declare context value := collection('flowers');
~//flower[name = 'Tigridia'] …and not restricting it to FLWOR expressions. If we wanted to additionally enhance the let clause, I think we should have both |
OUTINE PROPOSAL This proposal is in two parts. Part A introduces the notion of "context value" to the dynamic context -- except that I will call it the initial input value, abbreviated for the purpose of this proposal to IIV. Part B, which is dependent on part A, introduces the idea that some expressions might use the IIV implicitly. PART A The dynamic context is extended with a component called the initial input value. Its value is either a sequence, or absent. A new kind of primary expression is introduced, the initial input value expression, written In XQuery, the initial input value for the main query may be set using a "declare initial input" clause in the prolog, whose syntax parallels the "declare context item" declaration. For the time being, we will allow the context item and the initial input to be set independently of each other. In XPath, the initial input value may be set by the calling application. In XSLT, the initial input value will be absent when the transformation is initiated. The initial input value is set to absent on entry to a global function or variable declaration, and in XSLT, on entry to other callable constructs such as templates and attribute sets. The initial input value is set to a non-absent value by the following constructs: (a) An inline function declaration using the fat-arrow syntax with implicit signature, for example (b) An enclosed expression on the RHS of the binary (c) An instruction in an XSLT pipeline. A new XSLT instruction xsl:pipe is introduced; its content is a sequence of instructions called a pipeline. The value of each instruction other than the last becomes the initial input value for the following instruction; the value of the final instruction is delivered as the value of the xsl:pipe instruction. (d) An array filter expression EXPR "[|" predicate "|]" is introduced. EXPR must evaluate to an array; within the predicate, (e) An new operator is introduced to do array mapping. The expression takes the form PART B A new component called implicit mapping enabled is added to the static context for an expression. Its value is a boolean. If implicit mapping enabled is true for a path expression or context item expression E, then the result of the expression is effectively |
Great to see how the different requirements are coalescing. I do appreciate your efforts, and I’ll let it sink in. I assume that if…
…and not
…must be I like the example for arrays. We may also need to clarify what’s supposed to happen if both the context item and the initial input value are declared. |
Yes indeed. I chose |
VERSION II PROPOSAL A revised version of my previous outline proposal. The dynamic context is extended with a component called the context value. It is always present, and its value is a sequence. Its default value (for example on entry to a function) is an empty sequence. The context value can be accessed using the expression The "context item" is no longer an independent quantity. If the context value is a single item, then we call this the "context item". If the context value is empty or contains multiple items, we say that the context item is absent, and throw an error if it is referenced, either explicitly using "." or implicitly. In XQuery, the context value for the main query may be set using a "declare context [value|item]" clause in the prolog, whose syntax parallels the "declare context item" declaration. If the keyword "item" is used, rather than "value", then the supplied value must be a singleton. In XPath and XSLT (and XQuery in the absense of "declare context value"), the initial context value may be set by the calling application. The context value is set to a non-absent value by the following constructs: (0) All constructs that currently bind the context item are redefined so they now bind the context value to a singleton. This means that within a predicate, for example, (a) It may be set explicitly using the syntax (b) An enclosed expression on the RHS of the binary => operator, for example (c) We change the abbreviated function syntax (c) An instruction in an XSLT pipeline. A new XSLT instruction xsl:pipe is introduced; its content is a sequence of instructions called a pipeline. The value of each instruction other than the last becomes the context value for the following instruction; the value of the final instruction is delivered as the value of the xsl:pipe instruction. (d) An array filter expression (e) An new operator is introduced to do array mapping. The expression takes the form E1 !! E2. E1 must evaluate to an array; E2 is evaluated with (f) In XSLT, (g) Leading (h) Relative path expressions such as |
Thanks again for the comprehensive proposal and the resulting constructs. I like the approach to make the context value the new default, and to treat the context item as a subordinate concept.
From today’s perspective, I think the fat arrow could be an even better choice for binding the context:
We would then have:
The fat arrow could generally be used for inline functions
It would be great if we could also relax the semantics for relative paths. I think it should make no difference if the path expression is absolute or relative. Otherwise, expressions such as the following ones would not work: (: bound externally or in the prolog declaration :)
declare context value external := collection('store')//person;
name If performance concerns prevent us from doing so, maybe we could let the implementation decide if sequences are allowed as input for path expressions. |
|
Yes, what is confusing is: where is the proposed new concept in this expression, and what it gives us? Without any explanation/description how would one even know that this has something to do with the proposed new concept? |
Sorry. The topic was discusse somewhere in this thread, but it's already too long. |
@liamquin Thanks for the proposal: I like |
I would definitely prefer having a separate, unambiguous representation, such as current-container(). If not a function, then something that would not be too-easy to arrive at by an accidental misspelling. And, if possible, visually adequate for the intended meaning. This is why I also proposed (...) |
When it has to update two things, that is bad for performance. Perhaps worse than redefining
It looks like a snake. Longer than a
It is a multiplication
Another unused symbol is |
For what it's worth, I do as well. There isn't much left except the at sign if we're going for single characters, and the day is not quite yet to start wandering through Unicode for operators.
I can't find either of those earlier in the thread. I'd guess that [Edited to add] Or, duh, .* is a sequence of zero or more items, .? is a sequence of zero or one items, and .+ is a sequence of one or more items, where . is a sequence of exactly one item. |
|
Of course, it has many meanings in many different contexts. I guess the usage that made it feel natural to me was its use in Unix filenames, which a lot of XPath syntax is inspired by. It's not exactly the same meaning of course, but when I see |
I have reworked the initial comment of this issue by adding explanatory comments and aligning it with the features from the latest version of the specs. I’ll be happy to present it in the meeting; it may help us resolve QT4CG-026-01. |
Reopening this (for the pending edits). |
…closed. The major remaining challenge to solve is #755. |
This has already been discussed before at various places, I’d like to raise it again: What about generalizing the context item and allowing it to reference sequences? Are there definitive showstoppers?
The Context Item
As its name says, the context item is a container for a single item in the current context. A value that is bound to the context item is referenced with the Context Item Expression, the single dot:
.
.The context item shares many similarities with variables. The main difference is that it currently cannot be used for sequences. I propose to generalize the semantics and introduce a “context value”:
transform with
expression, etc.) are now bound to the context value.item
keyword – but we can treat it as a secondary concept.Context Value Declaration
It has become a common pattern to use
declare context item
to bind a document to the context item and process queries on that item:If data can be distributed across multiple documents (which is often, if not the standard case, in databases), this approach does not work. It would work if we could bind sequences:
External Bindings
Many processors allow users to bind external values to the context item. This approach is particularly restricting for databases, in which data is often distributed across multiple documents. With the generalized concept, it would get possible to bind sequences and collections to the context. Paths like the following one could be used, no matter if the contents are stored in a single document or in a collection:
//flower[name = 'Iridaceae']
Focus Functions
The focus function provides a compact syntax for common arity-one functions. The single argument is bound it to the context item:
With the generalization to values, we could easily enhance focus functions to accept arbitrary sequences:
Use Case: Arrow Expressions
The arrow expression provides an intuitive syntax for performing multiple subsequent operations on a given input. With the context value generalization, we could also process chained sequences:
The text was updated successfully, but these errors were encountered: