New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Invisible XML #238
Comments
I'm hardly going to object, given I've got an implementation :-) However, I think we need a little bit more flexibility in the API. It should be possible to pass an XDM node to |
I would second Norm's request for an optional options map, both on the compiling and the runtime. My own implementation could support such a function in the XSLT API stylesheet, albeit in a non-fn namespace. |
Having built-in support for invisible XML appeals to me, so I am in favor. But I am not sure about the function signature
--- or, at least, I think that this is not the only function signature we are likely to want. This signature would work fine for processors which work by first compiling the input grammar into some usable form (whether an annotated grammar or a function, e.g. a recursive-descent parser) and then using that compiled form to handle the input string, and it is clearly desirable in cases where (a) grammar compilation takes a significant proportion of the necessary time and (b) the same grammar is to be used repeatedly. Where those conditions don't apply, other function signatures may be more appealing. Both Norm Tovey-Walsh's nineml processor and my Aparecium processor use the same parsing machinery to parse the input grammar and then the input string, and this interface would help in cases where the same grammar is used repeatedly. John Lumley's parser and Steven Pemberton's parser, on the other hand, use hand-tuned parsers for the input grammars, and grammar-preparation time is really not dominant. In the design of the user-facing function interface for Aparecium (my ixml processor, intended for use in XSLT and XQuery but currently implemented only in XQuery), I included several functions, some of which may be possibilities we may wish to consider:
I assume that an optimizing QT processor may be able to detect that the same grammar is used multiple times and avoid parsing the grammar repeatedly for repeated calls. In Aparecium I also included functions for compiling a grammar and for parsing an input string using a compiled grammar. I won't list them here because I think MK's suggestion of returning a function item is better, but I do agree with Norm that it would be convenient to be able to supply the grammar in any of several forms:
When I started working on ixml, I also thought it might be nice to have an invisible-XML function similar in its way to |
To clarify Michael SMQ’s remarks, my ixml parser does I think benefit from a compile-once / use-many invocation, especially where the input sentences are short and the grammar long/complex. The EBNF-BNF rewrites are done once and the compiled grammar is a tree of JS class instances ready to prime the Earley parser. Come to think of it I could probably create a clonable Earley parser which has all the zeroth-step predictions already performed. Will need to investigate on return from holiday.
John Lumley
…Sent from my iPad
On 15 Nov 2022, at 23:08, C. M. Sperberg-McQueen ***@***.***> wrote:
Having built-in support for invisible XML appeals to me, so I am in favor.
But I am not sure about the function signature
fn:invisible-xml($grammar as xs:string) as (function($string) as document-node())
--- or, at least, I think that this is not the only function signature we are likely to want. This signature would work fine for processors which work by first compiling the input grammar into some usable form (whether an annotated grammar or a function, e.g. a recursive-descent parser) and then using that compiled form to handle the input string, and it is clearly desirable in cases where (a) grammar compilation takes a significant proportion of the necessary time and (b) the same grammar is to be used repeatedly. Where those conditions don't apply, other function signatures may be more appealing. Both Norm Tovey-Walsh's nineml processor and my Aparecium processor use the same parsing machinery to parse the input grammar and then the input string, and this interface would help in cases where the same grammar is used repeatedly. John Lumley's parser and Steven Pemberton's parser, on the other hand, use hand-tuned parsers for the input grammars, and grammar-preparation time is really not dominant.
In the design of the user-facing function interface for Aparecium (my ixml processor, intended for use in XSLT and XQuery but currently implemented only in XQuery), I included several functions, some of which may be possibilities we may wish to consider:
* aparecium:parse-string($input-strngi as xs:string, $input-grammar as xs:string) as element()
* aparecium:parse-resource($input-uri as xs:string, $grammar-uri as xs:string) as element()
I assume that an optimizing QT processor may be able to detect that the same grammar is used multiple times and avoid parsing the grammar repeatedly for repeated calls.
In Aparecium I also included functions for compiling a grammar and for parsing an input string using a compiled grammar. I won't list them here because I think MK's suggestion of returning a function item is better, but I do agree with Norm that it would be convenient to be able to supply the grammar in any of several forms:
* a string conforming to the ixml specification grammar
* an XML element / XDM node conforming to the ixml specification (a 'visible-XML' grammar, we sometimes call this in the ixml community group to avoid confusing with other grammar forms)
* a URI pointing to an document (text/plain or other) with an ixml grammar in invisible-XML form
* a URI pointing to an XML document with a visible-XML grammar
When I started working on ixml, I also thought it might be nice to have an invisible-XML function similar in its way to doc(), which would accept a URI pointing to an input string, dereference it, use information in the HTTP header to find an appropriate grammar, fetch the grammar, and return the parse result. (Steven Pemberton's 2013 paper describes using the HTTP header to point to an ixml grammar; failing that, my idea was to get the MIME type and for the ixml implementation to have a library of grammars for often-used MIME types.) That currently seems like a bit of a reach to me, but I mention it here because I still think it would be a nice idea, and no group is better positioned than this one to make it happen.
—
Reply to this email directly, view it on GitHub<#238 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABHNADUZLSNG6MKNQBHUNX3WIQJVFANCNFSM6AAAAAAR6G2J54>.
You are receiving this because you commented.Message ID: ***@***.***>
|
I wonder if we should confront users with the fact that a grammar may be compiled only once. Isn’t it better and more elegant to treat this as a processor-specific optimization issue? There are many other use cases I can think of in which query compilers and optimizers may decide to do things only once during the runtime of a query without making this explicit (XSLT stylesheets; SQL statements; regular expression patterns; documents resulting from deterministic functions such as |
Caching always has the disadvantage that it involves guesswork; there's a substantial memory cost in caching a large grammar in the case where it isn't used again. I think that capturing the compiled grammar in a function is a much more elegant approach (which could well be used elsewhere, e.g. for fn:transform). |
Accepted at meeting 052. |
I propose that we support Invisible XML by means of a function
fn:invisible-xml($grammar as xs:string) as (function($string) as document-node())
The function takes as input a string defining an invisible XML grammar in ixml format, and returns as output a function that can be used to parse strings conforming to that grammar, converting them into XDM document nodes.
As a "dog-food" use case, we could use this for rendering function signatures in the F&O specification. Rather than using manual markup to define the signature of each function, we could define an IXML grammar for function signatures, and use this as the basis for formatting the representation in the spec. This would be particularly beneficial as we start to introduce more complex signatures involving record types.
The text was updated successfully, but these errors were encountered: