Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Invisible XML #238

Closed
michaelhkay opened this issue Nov 12, 2022 · 7 comments
Closed

Support Invisible XML #238

michaelhkay opened this issue Nov 12, 2022 · 7 comments
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators

Comments

@michaelhkay
Copy link
Contributor

I propose that we support Invisible XML by means of a function

fn:invisible-xml($grammar as xs:string) as (function($string) as document-node())

The function takes as input a string defining an invisible XML grammar in ixml format, and returns as output a function that can be used to parse strings conforming to that grammar, converting them into XDM document nodes.

As a "dog-food" use case, we could use this for rendering function signatures in the F&O specification. Rather than using manual markup to define the signature of each function, we could define an IXML grammar for function signatures, and use this as the basis for formatting the representation in the spec. This would be particularly beneficial as we start to introduce more complex signatures involving record types.

@ndw
Copy link
Contributor

ndw commented Nov 12, 2022

I'm hardly going to object, given I've got an implementation :-)

However, I think we need a little bit more flexibility in the API. It should be possible to pass an XDM node to fn:invisible-xml because it's perfectly reasonable to construct a parser from the XML serialization of Invisible XML. I also think we want to allow an options map to be passed to the function because people will want the flexibility to, for example, suppress the state information about ambiguity or prefix parsing. And even if we thought that was unnecessary because they could transform the output to remove those things, implementations may want to expose additional options. My implementation, for example, has options to allow undefined/unreachable/unproductive symbols, multiply defined symbols, and options for debugging to show additional state information.

@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators Feature A change that introduces a new feature labels Nov 14, 2022
@johnlumley
Copy link

I would second Norm's request for an optional options map, both on the compiling and the runtime. My own implementation could support such a function in the XSLT API stylesheet, albeit in a non-fn namespace.

@cmsmcq
Copy link

cmsmcq commented Nov 15, 2022

Having built-in support for invisible XML appeals to me, so I am in favor.

But I am not sure about the function signature

fn:invisible-xml($grammar as xs:string) as (function($string) as document-node())

--- or, at least, I think that this is not the only function signature we are likely to want. This signature would work fine for processors which work by first compiling the input grammar into some usable form (whether an annotated grammar or a function, e.g. a recursive-descent parser) and then using that compiled form to handle the input string, and it is clearly desirable in cases where (a) grammar compilation takes a significant proportion of the necessary time and (b) the same grammar is to be used repeatedly. Where those conditions don't apply, other function signatures may be more appealing. Both Norm Tovey-Walsh's nineml processor and my Aparecium processor use the same parsing machinery to parse the input grammar and then the input string, and this interface would help in cases where the same grammar is used repeatedly. John Lumley's parser and Steven Pemberton's parser, on the other hand, use hand-tuned parsers for the input grammars, and grammar-preparation time is really not dominant.

In the design of the user-facing function interface for Aparecium (my ixml processor, intended for use in XSLT and XQuery but currently implemented only in XQuery), I included several functions, some of which may be possibilities we may wish to consider:

  • aparecium:parse-string($input-strngi as xs:string, $input-grammar as xs:string) as element()
  • aparecium:parse-resource($input-uri as xs:string, $grammar-uri as xs:string) as element()

I assume that an optimizing QT processor may be able to detect that the same grammar is used multiple times and avoid parsing the grammar repeatedly for repeated calls.

In Aparecium I also included functions for compiling a grammar and for parsing an input string using a compiled grammar. I won't list them here because I think MK's suggestion of returning a function item is better, but I do agree with Norm that it would be convenient to be able to supply the grammar in any of several forms:

  • a string conforming to the ixml specification grammar
  • an XML element / XDM node conforming to the ixml specification (a 'visible-XML' grammar, we sometimes call this in the ixml community group to avoid confusing with other grammar forms)
  • a URI pointing to an document (text/plain or other) with an ixml grammar in invisible-XML form
  • a URI pointing to an XML document with a visible-XML grammar

When I started working on ixml, I also thought it might be nice to have an invisible-XML function similar in its way to doc(), which would accept a URI pointing to an input string, dereference it, use information in the HTTP header to find an appropriate grammar, fetch the grammar, and return the parse result. (Steven Pemberton's 2013 paper describes using the HTTP header to point to an ixml grammar; failing that, my idea was to get the MIME type and for the ixml implementation to have a library of grammars for often-used MIME types.) That currently seems like a bit of a reach to me, but I mention it here because I still think it would be a nice idea, and no group is better positioned than this one to make it happen.

@johnlumley
Copy link

johnlumley commented Nov 16, 2022 via email

@ChristianGruen
Copy link
Contributor

I wonder if we should confront users with the fact that a grammar may be compiled only once. Isn’t it better and more elegant to treat this as a processor-specific optimization issue?

There are many other use cases I can think of in which query compilers and optimizers may decide to do things only once during the runtime of a query without making this explicit (XSLT stylesheets; SQL statements; regular expression patterns; documents resulting from deterministic functions such as fn:doc; etc.), and I believe it should be fairly easy for implementations to cache compiled grammars for grammar strings if they are repeatedly used.

@michaelhkay
Copy link
Contributor Author

Caching always has the disadvantage that it involves guesswork; there's a substantial memory cost in caching a large grammar in the case where it isn't used again. I think that capturing the compiled grammar in a function is a much more elegant approach (which could well be used elsewhere, e.g. for fn:transform).

@ChristianGruen ChristianGruen added the Propose for V4.0 The WG should consider this item critical to 4.0 label Jun 20, 2023
@ChristianGruen ChristianGruen removed the Propose for V4.0 The WG should consider this item critical to 4.0 label Nov 8, 2023
@ChristianGruen
Copy link
Contributor

Accepted at meeting 052.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

5 participants