Skip to content

Releases: mihaelamj/PureXML

PureXML 0.4.7

26 Jun 01:11
fab8008

Choose a tag to compare

XSD identity-constraint validation made linear. Backward compatible, validation results identical.

Fixed

  • xs:unique/xs:key/xs:keyref validation was quadratic over a wide list of targets: the duplicate check compared each value tuple against every tuple already seen (and a keyref against every referenced key), and the error-location helper rescanned the parent's whole child list to position each target. Duplicate detection now keys tuples in a Set by their raw string when field equality provably reduces to a raw string == string comparison (whitespace-preserving lexical types, which is most ids); value-space types (numeric, boolean, date), whitespace-collapsing types, and QName-bearing values keep the exact pairwise comparison, so which documents pass is unchanged. Each target is positioned through a per-parent, per-name index built once. A 16000-item xs:unique document drops from ~4.0s to ~0.12s (~34x), now linear (#312).

Found by scale-testing the identity-constraint feature at 2x/4x/8x sizes, a feature the default benchmark corpus does not exercise. Equivalence verified by an adversarial closed-form proof and the W3C XSTS datatype/identity conformance corpus (unchanged).

PureXML 0.4.6

25 Jun 23:42
e080876

Choose a tag to compare

XPath string-function and xsl:number quadratic fixes. Backward compatible, results identical.

Fixed

  • translate() is linear (the from-to map is built once), and contains()/substring-before()/substring-after() use a linear-time Knuth-Morris-Pratt search instead of matching the needle at every haystack position — closing an O(n×m) denial-of-service vector on repetitive untrusted strings (#308).
  • xsl:number level="any" was cubic over a wide fan-out (each preceding sibling rescanned the child list); now quadratic by walking the child slice directly (#309).
  • xsl:number levels single/multiple were quadratic numbering a wide list; now linear via a per-(parent, count pattern) rank cache. An 8000-item default xsl:number: ~3.5s -> ~0.04s (~90x) (#310).

Found by scale-testing the string functions and numbering at 2x/4x/8x — workload shapes the default corpus does not exercise.

Verification

Full suite (1753 tests), the W3C XML conformance, RELAX NG, Apache Xalan XSLT gold-output, and C14N corpora, plus the WASM build, all green across macOS, Linux, and Windows. swiftformat and swiftlint --strict clean.

Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md

PureXML 0.4.5

25 Jun 21:08
f6802c1

Choose a tag to compare

Two more XPath document-order quadratic fixes. Backward compatible, results identical.

Fixed

  • Sorting an element's attributes into document order was O(K²) in the attribute count (a linear scan per attribute); now O(K) via a cached per-owner name-to-index map. An 8000-attribute element: ~0.18s -> ~0.01s (~16x) (#305).
  • The following-sibling/preceding-sibling axes scanned a wide parent's child list to locate the context node per call, and materialized every sibling before filtering. They now use the shared sibling-index cache and fuse the node test into the walk (#306).

These complete a sweep of the XPath document-order width-quadratics (alongside firstInDocumentOrder, key()/id()/EXSLT set functions, and the following/preceding axes in 0.4.3–0.4.4); every XPath-using subsystem (XSLT, Schematron, XPointer) benefits.

Verification

Full suite (1740 tests), the W3C XML conformance, RELAX NG, Apache Xalan XSLT gold-output, and C14N corpora, plus the WASM build, all green across macOS, Linux, and Windows. swiftformat and swiftlint --strict clean.

Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md

PureXML 0.4.4

25 Jun 19:23
2abb130

Choose a tag to compare

following/preceding axis quadratic fix. Backward compatible, results identical.

Fixed

The following/preceding axes were quadratic over many context nodes: each call rebuilt the whole document node list and linearly searched it, then materialized the entire span before the node test filtered it (#302, #303). They now share a per-evaluation cache of the ordered node list and a node-to-index map, following:: skips the context's contiguous subtree by index arithmetic, and the node test is fused into the walk so only matches are materialized. A 200-context following:: query over a 20k-element document drops ~16s -> ~6s (~2.6x).

Verification

Full suite (1735 tests), the W3C XML conformance, RELAX NG, Apache Xalan XSLT gold-output, and C14N corpora, plus the WASM build, all green across macOS, Linux, and Windows. swiftformat and swiftlint --strict clean.

Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md

PureXML 0.4.3

25 Jun 17:15
8abaf57

Choose a tag to compare

XSLT/XPath quadratic fixes. Backward compatible, output identical.

Fixed (quadratic -> linear over wide fan-outs)

  • string(@x) / value-of (single-node string extraction) no longer computes a document-order key that scanned the node's siblings (#299): a representative 20k-element transform drops ~2.3s -> ~0.65s (~3.5x).
  • key(), id(), and EXSLT set:leading/set:trailing sort through the cached sortedByDocumentOrder() instead of sorted(by: precedes), and key() deduplicates through a set (#300): a key()-driven transform drops ~1.15s -> ~0.43s (~2.7x).

Both fixes are in shared XPath document-order helpers, so XSLT, Schematron, and XPointer all benefit.

Verification

Full suite (1731 tests), W3C XML conformance, RELAX NG, Apache Xalan XSLT gold-output (byte-for-byte), and C14N corpora, plus the WASM build, all green across macOS, Linux, and Windows. swiftformat and swiftlint --strict clean.

Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md

PureXML 0.4.2

25 Jun 14:15
1ae71c3

Choose a tag to compare

Post-0.4.1 parse and serialize performance. Backward compatible, identical results.

Performance (vs libxml2, 20k-item generated corpus)

  • parse ~9x -> ~5.8x slower than libxml2:
    • bulk-copy literal runs in the entity decoder (#291)
    • split the name prefix from a scanned colon offset, no second pass (#292)
    • adopt prepared children when materializing the tree (#293)
    • scan for entity-decoder markers at the byte level (#294)
    • build the TreeNode tree directly from the event stream, eliminating the value-tree intermediate (#295)
  • serialize ~3.0x -> ~2.3x slower than libxml2:
    • bulk-copy verbatim runs when escaping output (#296)
    • find the characters to escape by scanning bytes, not graphemes (#297)

Verification

Full local suite (1725 tests), the W3C XML conformance, RELAX NG, Apache Xalan XSLT gold-output, and C14N canonicalization corpora, and the WASM cross-platform build, all green across macOS, Linux, and Windows. Every change is covered by a dedicated differential test (decode, colon-placement, tree-structure, escaping, all with multibyte-boundary cases) and an adversarial equivalence review. swiftformat and swiftlint --strict clean.

Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md

PureXML 0.4.1

25 Jun 08:46
c62e02f

Choose a tag to compare

Post-0.4.0 XPath performance work. Backward compatible, identical query results.

Performance (xpath vs libxml2, 20k-item generated corpus)

~5.5x -> ~1.3x slower than libxml2 across five passes:

  • single-accumulator descendant traversal, replacing per-node array concatenation (#285)
  • fuse the node test into the descendant walk, so a rejected node is never wrapped or reference-counted (#286)
  • the same node-test fusion for the attribute axis (#287)
  • share the focus-independent evaluation state (variables, functions, namespaces, budget) through one Environment reference instead of copying three dictionaries per node (#288)
  • compile the step node test once per query rather than per node, dropping a per-node namespace-dictionary retain on the dominant traversal, the single largest pass (#289)

Verification

Full local suite (1707 tests), Apache Xalan XSLT gold-output corpus, and the WASM cross-platform build, all green across macOS, Linux, and Windows. Each correctness-sensitive change (the three node-test fusions and the compiled node test) is covered by a dedicated differential test asserting the fast path selects exactly what the unfused path does, plus an adversarial equivalence review. swiftformat and swiftlint --strict clean.

Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md

PureXML 0.4.0

25 Jun 03:48
f561ddb

Choose a tag to compare

Performance across all three benchmarked operations plus one XSD conformance fix, all backward compatible.

Performance (vs libxml2, 20k-item generated corpus)

  • parse ~30x → ~9x slower than libxml2: byte-level scanning fast paths, byte-after-< markup dispatch, ampersand-free attribute decode skip (#275, #276, #278)
  • serialize ~5.5x → ~2.7x: escaping fast path for values that need no escaping (#279)
  • xpath ~13x → ~5.5x: node-set de-duplication skip on per-context-disjoint axes, // descendant fusion, and redundant sort/dedup elimination for path results (#280#283)

Fixed

  • An xsi:type blocked by block/blockDefault is now rejected when it reaches a union declared type through a blocked member (cvc-elt.4.3 / cos-st-derived-OK 2.2.4). W3C XSTS invalid-instances-accepted 15 → 14, with valid-schemas-rejected and valid-instances-rejected held at 0 — the false-positive direction guarded hardest (#277).

Verification

Full local suite (1695 tests), Apache Xalan XSLT gold-output corpus, W3C XML conformance corpora (xmltest/OASIS/Sun/Eduni/IBM/Japanese), RELAX NG spec suite, the full W3C XSTS archive, and the WASM cross-platform build — all green. The correctness-sensitive changes (the xsi:type fix, the // fusion, and the single-step sort skip) each passed an adversarial differential review. swiftformat and swiftlint --strict clean.

Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md

PureXML 0.3.0

24 Jun 15:21
3aca6c3

Choose a tag to compare

A conformance-focused release: 37 Apache Xalan XSLT 1.0 cases closed since 0.2.0 (baseline 61 to 24).

XSLT / XPath

  • HTML output method (16.2): < left literal in attribute values; doctype named HTML; non-null-namespace elements serialized as XML; non-ASCII percent-escaping in URI-valued attributes.
  • document() base-URI resolution (12.1): two-argument and node-set forms, string-form against the stylesheet element's base, and document('') in included/imported files.
  • xsl:key union match patterns now index every branch; key() returns all non-unique matches.
  • xsl:namespace-alias keeps the literal prefix and remaps only the URI (7.1.1).
  • xsl:number: format is an attribute value template; level="single" counts the from-boundary node when it also matches count.
  • xsl:strip-space/preserve-space match by namespace; xml:space="preserve" honored in the stylesheet.
  • disable-output-escaping ignored off text nodes (16.4); axis-prefixed match-pattern priority (5.5).
  • XPath: a lone / parses as the root before a terminator; large integer-valued numbers format without a decimal point.

Emitting

  • Processing-instruction data containing ?> is split with a space (7.3 recovery).

Verification

Green across the full local suite (1655 tests) and all authoritative corpora: Apache Xalan XSLT, the W3C XML 1.0 suite (xmltest, OASIS, IBM, Sun, Eduni, Japanese), the W3C XSD Schema Test Suite (14k+ groups), and the RELAX NG spec suite. swiftformat/swiftlint clean; builds and tests under WASI. Pure Swift, no dependencies, no Foundation in the library.

See CHANGELOG.md for the full list.

PureXML 0.2.0

15 Jun 23:28

Choose a tag to compare

PureXML 0.2.0 Pre-release
Pre-release

Second pre-release of PureXML — a pure-Swift XML parser/emitter with an XSD 1.0 schema engine, XPath/XSLT support, and a WASI-compatible build (no external dependencies; macOS, Linux, WASI).

Highlights

  • Schema-validity campaign: W3C XSTS settled baselines 1 / 266 / 171 / 155 (valid schema rejected / invalid schema accepted / valid instance rejected / invalid instance accepted). Invalid-schemas-accepted down from 394 at 0.1.0 to 266.
  • New rules: facet applicability, schema element order and import constraints, cross-container name uniqueness, notation validation, namespace-aware reference resolution, complexContent attribute inheritance/restriction, NameAndTypeOK type derivation (partial), keyref arity, and component child-content shape checks.
  • Robustness: generative schema/parser fuzz harness (no crashes or hangs over the current corpus).

XSTS baselines (settled expectations)

Bucket Count
Valid schemas rejected 1 (particlesZ001, spec-ambiguous)
Invalid schemas accepted 266
Valid instances rejected 171
Invalid instances accepted 155

Run opt-in: XSTS_ROOT=/path/to/xmlschema2006-11-06 swift test -c release --filter XSTS

Status

Pre-1.0. Remaining gates: schema differential oracle, proven bounds (not caps), located line/column diagnostics. See docs/production-readiness.md and CHANGELOG.md.