@odk/xpath: initial XPath implementation #3

eyelidlessness · 2023-10-17T00:26:09Z

What's in the box

An XPath implementation, covering almost all of the behavior in Enketo's current evaluator (openrosa-xpath-evaluator [ORXE]) with a few very specific exceptions:

Backwards-compatible support for custom functions. The interface provides a different mechanism which should support this use case if we decide it's important.
A small handful of test failures where the actual expected behavior isn't totally clear, namely:
- Date[time] strings lacking expected zero padding (i.e. date("1970-1-2")). I have a vague memory where I had thought I saw discussion of this, but I've since been unable to find it.
- Bare equality comparison of date and datetime strings outside of any casting context (i.e. "2018-06-25" = "2018-06-25T00:00:00.000-07:00" as a complete expression). I think a case could be made for this, but it creates a grammar ambiguiuty I'm not entirely comfortable with. And supporting it would require some compromises I'd prefer to justify before diving in for a fix.
- Formatting date[time] strings expressed as a colloquial/localized format (i.e. format-date('Mon Oct 16 2023 15:57:39 GMT-0700 (Pacific Daylight Time)', '%e | %a')). The string as provided is a serialization produced (and parseable) by JavaScript's Date object; my suspicion is that's why this is supported. If it is actually expected, it shouldn't be a huge lift, but my inclination is to avoid it otherwise.
- Exponent notation in numeric literals (i.e. int(7.922021953507237e-12)). This is already supported when casting from a string literal, but as a numeric literal it deviates from the XPath grammar. I was able to find JavaRosa test cases for the string-number cast, but I didn't see any for the numeric literal case. Again, my suspicion as to why this was supported in Enketo is the likely casting mechanism (Number(sourceLiteral) or similar). If we do want to support this, casting isn't so much an issue, but it would either require deviation in the tree-sitter grammar as it's currently a parse error, or some effort towards handling it as a parse error.
- Ambiguity between runtimes on support for time strings with greater than 60 seconds (i.e. decimal-time("06:00:60.000-07:00")). The tests brought in from Enketo expect this to be invalid. Chrome/Chromium/Node and Firefox all treat it as invalid, Safari/WebKit does not.
Possibly a lingering set of time-sensitive bugs. But I have a feeling that was actually a discrepancy between CI's time zone and the time zone specified at runtime.

What else is in the box:

As implied above: Node support (tested with jsdom, the default DOM compatibility layer used by Vitest; others are currently untested and may vary significantly depending on their divergence from native DOM implementations). As should be obvious, the implementation is currently explicitly tied to DOM APIs—but notably, it is not tied to an underlying XPath implementation.
Also as implied above: time zone is parameterized at runtime, rather than derived entirely from the environment (though that remains the default).
Conversion of all of Enketo/ORXE's integration tests to TypeScript, along with some significant improvements to test isolation (i.e. fixture setup is test/suite local, rather than implicitly test-run-global) and clarity of test parameterization. All of this, I believe, could pretty easily make its way back to ORXE if we so choose.
A pretty thorough exercise/validation of tree-sitter-xpath from a correctness/completion standpoint.

What's not in the box

Several functions specified in the ODK spec are not yet implemented, as they were not in ORXE:

current
indexed-repeat
instance
pulldata
version

These were all implemented in Enketo Core, for reasons either specified there in comments or for somewhat guessable reasons. Whatever the historical reasons, I believe these can all reasonably be implemented here. I chose to defer their implementation to limit the already quite sizeable scope of this current init, and to leave them open for design discussion as some probably warrant (and as they may dovetail with corresponding work on related XForms-spec functionality).

Why/goals

Normally I like to start PR notes like this on "why?" But I think the above "what?" fit better first in this case.

This was originally going to be a personal exercise, to validate tree-sitter-xpath (as in, "it's not a library if it doesn't have users"). This remained a goal as the implementation felt more worth pursuing for its own sake. (Conclusion: I believe this has accomplished that!)
Accomplish a real/tangible/usable/completeable thing in this early effort toward a potential rewrite. (Conclusion: here's the PR, complete-ish pending failures discussed above and likely cleanup/rework in review)
Accomplish a thing which could potentially also benefit Enketo. (Conclusion: apart from lacking support for custom functions, it should be a drop-in replacement. It would be more work to integrate support for the remaining functions.)
Establish some early precedent for certain principles of design and approach (details below).

Notes on design and approach

Domain focus

I wanted to take this as an early opportunity to establish focus on the problem domain as a key design principle. In this case, "the problem domain" is a set of technical specifications, which isn't as great an exercise of the principle as some other areas we'll get to. But it's a principle that's informed several aspects of the design.

Naming: as much as possible, where a name is used in the pertinent specs or where the spec language suggests obvious naming for particular concepts, that is the naming used in code. Where derivative types or values are used, their names are intentionally derivative of the underlying spec concepts. (There is at least one particular area of concern on this front, LocationPathEvaluation, which is further discussed in code comments.)

Domain logic: as much as possible, implementation logic is intended to follow the language and/or inferrable spirit of the respective aspects of the specs under implementation. While comments are presently sparse, should we desire more extensive commentary the expectation is that spec language should be able to reasonably describe these implementation details. There are cases where implementation is more platform specific, generally where the underlying APIs have important semantic differences. And in a few cases (which we can expect to grow as appropriate), there are implementation considerations specifically for their performance characteristics.

Compliance: specific effort has been made to ensure layers of implementation are compliant with the specs they’re concerned with; where specs differ (ie where the ODK XForms spec notes exceptions to the standards they reference), those divergences are achieved by explicit extension and composition mechanisms rather than baked into lower level spec-conforming aspects. A notable example is datetimes: where they have implications for grammar or function behavior, those are achieved at a higher level than the implementation of underlying XPath data types and built-in functions. And at least some preliminary effort has been made to position future extensibility as analogous to the mechanisms in underlying/related specs (for example some notion of support for function namespacing).

Scoped configurability/extensibility

To the extent the evaluator is sensitive to contextual state—and to the extent that its behavior is extensible—this is achieved in a local instantiation of the evaluation environment. There is still room for improvement on this approach! But the intent is to establish clear patterns of contextual customization from the outset, and to eschew hard coding non-spec assumptions into spec-focused code as a general rule. Exceptions to that rule should ideally have to meet a high bar to be justified.

Performance

The design as implemented is performance-minded, if not currently optimal. Which is to say that explicit care was taken to choose a general approach and specific aspects of implementation, which are some combination of:

Known to perform well in the general case (strong emphasis on monomorphization, using known-fast platform capabilities)
Likely to perform well by deferring or avoiding work where possible (eg lazy evaluation, obvious opportunities to short-circuit)
Likely to perform well by reducing redundant work (eg caching likely-expensive computations)

Early prototyping revealed several clear optimization opportunities, which informed further efforts, but for the most part I put that on hold as I focused on “make it work” and “make it right”. TL;DR, this is probably fast-ish, can probably be even faster, and was intentionally built with further optimization in mind.

To set/inspire initial expectations: in my early prototyping, I saw significantly better overall performance than the browsers' native XPath evaluators for the same expressions. Having been some time since I last measured, I expect performance has degraded from that high bar. But I also believe "faster than native" is a realistic potentiality, if we choose to priotirize that.

Regardless, I want to make benchmarking a part of our automated suite of quality control tools, but I think that's best saved for later.

Maintainability and beyond

Here I don't (only) mean the ability to reasonably deliver bug fixes, or add functionality. Although I do think this is a good first effort on that front.

I also wanted a design which lends itself to tooling to aid future development and integration. The tooling we already use, e.g. code navigation, runtime debugging. And tooling we may want to develop for our users or ourselves, e.g. potential to introduce static analysis to help understand expressions' behavior, or to document information flow and relationships between parts of a form. Or even to, say, help building something like a form builder 😉

…nning tests

Generally defer to `number` function (overridden in xforms function library) for date/datetime-like values in argument positions and numeric comparisons

Whatever cache it’s hitting is probably wrong

It’s not recommended, per https://playwright.dev/docs/ci#caching-browsers

…across browsers

…tation behavior

This seems like it was probably the cause of several time-of-day-specific test failures. Setting it (temporarily, I hope) in CI should probably mean we can keep testing WebKit (which isn’t otherwise affected by e.g. the `TZ` environment variable like the other browsers)

Also bumps Node versions

Annotations might be good for pointing to the remaining test failures for discussion

…onsibilities

I’ve missed formatting so many times without this. It should also be a check in CI.

A few names are suspect :/

Fix build p2

Tests are left as they’re still part of the original suite

lognaturel · 2023-10-19T19:20:32Z

Date[time] strings lacking expected zero padding (i.e. date("1970-1-2")).

This does work in JavaRosa so there are definitely forms using it. It would be good to support but seems fine to just file an issue.

Bare equality comparison of date and datetime strings outside of any casting context (i.e. "2018-06-25" = "2018-06-25T00:00:00.000-07:00" as a complete expression)

I think this is expected to be string comparison and evaluate to false().

my suspicion is that's why this is supported

Yes, I don't think that's expected to be supported. Or very useful, frankly!

exponent notation in numeric literals (i.e. int(7.922021953507237e-12)

I don't think it's expected to be supported.

decimal-time("06:00:60.000-07:00")

I'd expect it to be invalid. JavaRosa agrees.

lognaturel · 2023-10-19T17:53:38Z

packages/odk-xpath/CHANGELOG.md

@@ -0,0 +1 @@
+# @odk/xpath


I'd rather not have changelogs if we can help it. Seems git and Github are sufficient sources of history. I know some people really want the changes to be baked into the source but I am not one of them.

Also discussed in chat: I believe this is just a placeholder and will hopefully be updated by changeset when preparing a release (this is how it appears from other packages using the tool). I also don't want to keep manual changelogs, so if that turns out to be wrong I'd gladly remove it (and others)

lognaturel · 2023-10-19T17:54:25Z

packages/odk-xpath/README.md

+- Context-based extensibility of behavior beyond the XPath 1.0 function library
+- [ODK XForms](https://getodk.github.io/xforms-spec/) XPath extensions
+- Non-browser environments (provided a subset of standard DOM APIs)[^1]
+- Minor accommodations for certain [Enketo](https://github.com/enketo/openrosa-xpath-evaluator) function aliases[^2]


Link to monorepo? I think we're still planning on archiving that repo shortly.

lognaturel · 2023-10-19T18:04:32Z

packages/odk-xpath/README.md

+
+The DOM standard is very complex, and compatibility libraries sometimes differ in terms of standards compliance and compatibility with the major browser implementations. It is likely, perhaps inevitable, that there will be edge cases and minor compatiblity issues when using such a library. At time of writing, no such issues are known in our tested environments[^1], but a couple of minor issues have already been found and fixed during the initial test setup.
+
+[^1]: Currently tested in Node, with [jsdom](https://github.com/jsdom/jsdom). There may be issues with other runtimes and/or DOM compatibility APIs. Node compatibility **does not** require any native extensions. DOM compatibility **does not** require any underlying XPath evaluation functionality (though it does currently rely on global constants like `XPathResult`).


This is a stylistic nit but I feel like all this content should be inline rather than in footnotes ¹. ^1 could be a Environments tested in section.

Other than that, great readme. 👍

Footnotes

I love DFW ↩

Might be good to augment with some of the design considerations from the PR description and some things we discussed about deferring to native like the current Enketo evaluator does vs not

eyelidlessness · 2023-10-19T21:32:54Z

I think I've addressed everything here except design notes in the README. I'll merge for now and add an issue to expand on that.

eyelidlessness force-pushed the init/odk-xpath branch from 57ae811 to aa3ec29 Compare October 18, 2023 20:08

eyelidlessness mentioned this pull request Oct 18, 2023

Initial project scaffolding #1

Merged

eyelidlessness and others added 28 commits October 18, 2023 14:31

(WIP) init @odk/xpath evaluator

0d7c7bc

Temporarily skip lint and building types, just to hopefully get CI ru…

9c6ab45

…nning tests

Fix tooling, lint errors, finish setting up initial @odk/xpath tests

a80f139

Fix CI playwright install/caching, restore CI linting

13ccedb

Quite a few date/time fixes

6749fd2

Generally defer to `number` function (overridden in xforms function library) for date/datetime-like values in argument positions and numeric comparisons

Force build in CI

c49b89e

Whatever cache it’s hitting is probably wrong

Fix outstanding type errors, reapply formatting

20e8005

Install webkit deps

9024126

Don’t try to cache Playwright browsers

6cac1e9

It’s not recommended, per https://playwright.dev/docs/ci#caching-browsers

Test assertion was wrong. Oddly, fixing gets more consistent results …

c9df5b9

…across browsers

Fix sorting in jsdom

0e8bca1

Skip DOM tests… as in tests of non-evaluator, underlying DOM implemen…

f2a6799

…tation behavior

Skip customm function tests temporarily(?)

153d6b2

Out of curiosity, is GHA setting TZ?

a4f3c17

Compile out BigInt polyfill

7acdd2f

Don’t include Vite config in @odk/xpath build:types

35c767f

@odk/xpath CI matrix (Node versions, 3 major browser engines)

80a11c9

Also bumps Node versions

Vitest GHA reporter

7f29426

Annotations might be good for pointing to the remaining test failures for discussion

Playwright *with* deps

c8b9461

Commentary on the awkard LocationPathEvaluation name/overlapping resp…

860df9e

…onsibilities

Format on save in VSCode, autoformat again

df6a2be

I’ve missed formatting so many times without this. It should also be a check in CI.

Break up expression evaluators, improve naming consistency

9b4208b

A few names are suspect :/

Fix @odk/xpath build

15b15bd

Fix build p2

Test & finish specifying Evaluator root node, fix unprefixed name test

355000e

Remove ORXE custom functions interface

2f47ea4

Tests are left as they’re still part of the original suite

Add changesets

1ac513b

Add initial @odk/xpath README

682e858

eyelidlessness force-pushed the init/odk-xpath branch from 0f76d1e to 682e858 Compare October 18, 2023 21:31

lognaturel reviewed Oct 19, 2023

View reviewed changes

eyelidlessness added 4 commits October 19, 2023 13:45

Fix (@odk/xpath): support unpadded month/day in date expressions

99541c5

Fail/explain tests determined out of scope/unsupported in review

36b87a6

Fix: consistently handle time values with >60s across runtimes

db4fe7d

@odk/xpath: address most README feedback besides design notes

dca1909

eyelidlessness merged commit 575ff32 into main Oct 19, 2023
26 checks passed

eyelidlessness mentioned this pull request Oct 19, 2023

Add design notes to @odk/xpath README #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@odk/xpath: initial XPath implementation #3

@odk/xpath: initial XPath implementation #3

eyelidlessness commented Oct 17, 2023 •

edited

Loading

lognaturel commented Oct 19, 2023

lognaturel Oct 19, 2023

eyelidlessness Oct 19, 2023

lognaturel Oct 19, 2023

lognaturel Oct 19, 2023

lognaturel Oct 19, 2023

eyelidlessness commented Oct 19, 2023


		The DOM standard is very complex, and compatibility libraries sometimes differ in terms of standards compliance and compatibility with the major browser implementations. It is likely, perhaps inevitable, that there will be edge cases and minor compatiblity issues when using such a library. At time of writing, no such issues are known in our tested environments[^1], but a couple of minor issues have already been found and fixed during the initial test setup.

		[^1]: Currently tested in Node, with [jsdom](https://github.com/jsdom/jsdom). There may be issues with other runtimes and/or DOM compatibility APIs. Node compatibility does not require any native extensions. DOM compatibility does not require any underlying XPath evaluation functionality (though it does currently rely on global constants like `XPathResult`).

@odk/xpath: initial XPath implementation #3

@odk/xpath: initial XPath implementation #3

Conversation

eyelidlessness commented Oct 17, 2023 • edited Loading

What's in the box

What's not in the box

Why/goals

Notes on design and approach

Domain focus

Scoped configurability/extensibility

Performance

Maintainability and beyond

lognaturel commented Oct 19, 2023

lognaturel Oct 19, 2023

Choose a reason for hiding this comment

eyelidlessness Oct 19, 2023

Choose a reason for hiding this comment

lognaturel Oct 19, 2023

Choose a reason for hiding this comment

lognaturel Oct 19, 2023

Choose a reason for hiding this comment

Footnotes

lognaturel Oct 19, 2023

Choose a reason for hiding this comment

eyelidlessness commented Oct 19, 2023

eyelidlessness commented Oct 17, 2023 •

edited

Loading