Releases: mihaelamj/PureXML
PureXML 0.4.7
XSD identity-constraint validation made linear. Backward compatible, validation results identical.
Fixed
xs:unique/xs:key/xs:keyrefvalidation was quadratic over a wide list of targets: the duplicate check compared each value tuple against every tuple already seen (and akeyrefagainst every referenced key), and the error-location helper rescanned the parent's whole child list to position each target. Duplicate detection now keys tuples in aSetby their raw string when field equality provably reduces to a rawstring == stringcomparison (whitespace-preserving lexical types, which is most ids); value-space types (numeric, boolean, date), whitespace-collapsing types, and QName-bearing values keep the exact pairwise comparison, so which documents pass is unchanged. Each target is positioned through a per-parent, per-name index built once. A 16000-itemxs:uniquedocument drops from ~4.0s to ~0.12s (~34x), now linear (#312).
Found by scale-testing the identity-constraint feature at 2x/4x/8x sizes, a feature the default benchmark corpus does not exercise. Equivalence verified by an adversarial closed-form proof and the W3C XSTS datatype/identity conformance corpus (unchanged).
PureXML 0.4.6
XPath string-function and xsl:number quadratic fixes. Backward compatible, results identical.
Fixed
translate()is linear (the from-to map is built once), andcontains()/substring-before()/substring-after()use a linear-time Knuth-Morris-Pratt search instead of matching the needle at every haystack position — closing an O(n×m) denial-of-service vector on repetitive untrusted strings (#308).xsl:number level="any"was cubic over a wide fan-out (each preceding sibling rescanned the child list); now quadratic by walking the child slice directly (#309).xsl:numberlevelssingle/multiplewere quadratic numbering a wide list; now linear via a per-(parent, count pattern) rank cache. An 8000-item defaultxsl:number: ~3.5s -> ~0.04s (~90x) (#310).
Found by scale-testing the string functions and numbering at 2x/4x/8x — workload shapes the default corpus does not exercise.
Verification
Full suite (1753 tests), the W3C XML conformance, RELAX NG, Apache Xalan XSLT gold-output, and C14N corpora, plus the WASM build, all green across macOS, Linux, and Windows. swiftformat and swiftlint --strict clean.
Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md
PureXML 0.4.5
Two more XPath document-order quadratic fixes. Backward compatible, results identical.
Fixed
- Sorting an element's attributes into document order was O(K²) in the attribute count (a linear scan per attribute); now O(K) via a cached per-owner name-to-index map. An 8000-attribute element: ~0.18s -> ~0.01s (~16x) (#305).
- The
following-sibling/preceding-siblingaxes scanned a wide parent's child list to locate the context node per call, and materialized every sibling before filtering. They now use the shared sibling-index cache and fuse the node test into the walk (#306).
These complete a sweep of the XPath document-order width-quadratics (alongside firstInDocumentOrder, key()/id()/EXSLT set functions, and the following/preceding axes in 0.4.3–0.4.4); every XPath-using subsystem (XSLT, Schematron, XPointer) benefits.
Verification
Full suite (1740 tests), the W3C XML conformance, RELAX NG, Apache Xalan XSLT gold-output, and C14N corpora, plus the WASM build, all green across macOS, Linux, and Windows. swiftformat and swiftlint --strict clean.
Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md
PureXML 0.4.4
following/preceding axis quadratic fix. Backward compatible, results identical.
Fixed
The following/preceding axes were quadratic over many context nodes: each call rebuilt the whole document node list and linearly searched it, then materialized the entire span before the node test filtered it (#302, #303). They now share a per-evaluation cache of the ordered node list and a node-to-index map, following:: skips the context's contiguous subtree by index arithmetic, and the node test is fused into the walk so only matches are materialized. A 200-context following:: query over a 20k-element document drops ~16s -> ~6s (~2.6x).
Verification
Full suite (1735 tests), the W3C XML conformance, RELAX NG, Apache Xalan XSLT gold-output, and C14N corpora, plus the WASM build, all green across macOS, Linux, and Windows. swiftformat and swiftlint --strict clean.
Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md
PureXML 0.4.3
XSLT/XPath quadratic fixes. Backward compatible, output identical.
Fixed (quadratic -> linear over wide fan-outs)
string(@x)/value-of(single-node string extraction) no longer computes a document-order key that scanned the node's siblings (#299): a representative 20k-element transform drops ~2.3s -> ~0.65s (~3.5x).key(),id(), and EXSLTset:leading/set:trailingsort through the cachedsortedByDocumentOrder()instead ofsorted(by: precedes), andkey()deduplicates through a set (#300): a key()-driven transform drops ~1.15s -> ~0.43s (~2.7x).
Both fixes are in shared XPath document-order helpers, so XSLT, Schematron, and XPointer all benefit.
Verification
Full suite (1731 tests), W3C XML conformance, RELAX NG, Apache Xalan XSLT gold-output (byte-for-byte), and C14N corpora, plus the WASM build, all green across macOS, Linux, and Windows. swiftformat and swiftlint --strict clean.
Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md
PureXML 0.4.2
Post-0.4.1 parse and serialize performance. Backward compatible, identical results.
Performance (vs libxml2, 20k-item generated corpus)
- parse ~9x -> ~5.8x slower than libxml2:
- bulk-copy literal runs in the entity decoder (#291)
- split the name prefix from a scanned colon offset, no second pass (#292)
- adopt prepared children when materializing the tree (#293)
- scan for entity-decoder markers at the byte level (#294)
- build the
TreeNodetree directly from the event stream, eliminating the value-tree intermediate (#295)
- serialize ~3.0x -> ~2.3x slower than libxml2:
Verification
Full local suite (1725 tests), the W3C XML conformance, RELAX NG, Apache Xalan XSLT gold-output, and C14N canonicalization corpora, and the WASM cross-platform build, all green across macOS, Linux, and Windows. Every change is covered by a dedicated differential test (decode, colon-placement, tree-structure, escaping, all with multibyte-boundary cases) and an adversarial equivalence review. swiftformat and swiftlint --strict clean.
Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md
PureXML 0.4.1
Post-0.4.0 XPath performance work. Backward compatible, identical query results.
Performance (xpath vs libxml2, 20k-item generated corpus)
~5.5x -> ~1.3x slower than libxml2 across five passes:
- single-accumulator descendant traversal, replacing per-node array concatenation (#285)
- fuse the node test into the descendant walk, so a rejected node is never wrapped or reference-counted (#286)
- the same node-test fusion for the attribute axis (#287)
- share the focus-independent evaluation state (variables, functions, namespaces, budget) through one
Environmentreference instead of copying three dictionaries per node (#288) - compile the step node test once per query rather than per node, dropping a per-node namespace-dictionary retain on the dominant traversal, the single largest pass (#289)
Verification
Full local suite (1707 tests), Apache Xalan XSLT gold-output corpus, and the WASM cross-platform build, all green across macOS, Linux, and Windows. Each correctness-sensitive change (the three node-test fusions and the compiled node test) is covered by a dedicated differential test asserting the fast path selects exactly what the unfused path does, plus an adversarial equivalence review. swiftformat and swiftlint --strict clean.
Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md
PureXML 0.4.0
Performance across all three benchmarked operations plus one XSD conformance fix, all backward compatible.
Performance (vs libxml2, 20k-item generated corpus)
- parse ~30x → ~9x slower than libxml2: byte-level scanning fast paths, byte-after-
<markup dispatch, ampersand-free attribute decode skip (#275, #276, #278) - serialize ~5.5x → ~2.7x: escaping fast path for values that need no escaping (#279)
- xpath ~13x → ~5.5x: node-set de-duplication skip on per-context-disjoint axes,
//descendant fusion, and redundant sort/dedup elimination for path results (#280–#283)
Fixed
- An
xsi:typeblocked byblock/blockDefaultis now rejected when it reaches a union declared type through a blocked member (cvc-elt.4.3 / cos-st-derived-OK 2.2.4). W3C XSTSinvalid-instances-accepted15 → 14, withvalid-schemas-rejectedandvalid-instances-rejectedheld at 0 — the false-positive direction guarded hardest (#277).
Verification
Full local suite (1695 tests), Apache Xalan XSLT gold-output corpus, W3C XML conformance corpora (xmltest/OASIS/Sun/Eduni/IBM/Japanese), RELAX NG spec suite, the full W3C XSTS archive, and the WASM cross-platform build — all green. The correctness-sensitive changes (the xsi:type fix, the // fusion, and the single-step sort skip) each passed an adversarial differential review. swiftformat and swiftlint --strict clean.
Full changelog: https://github.com/mihaelamj/PureXML/blob/main/CHANGELOG.md
PureXML 0.3.0
A conformance-focused release: 37 Apache Xalan XSLT 1.0 cases closed since 0.2.0 (baseline 61 to 24).
XSLT / XPath
- HTML output method (16.2):
<left literal in attribute values; doctype namedHTML; non-null-namespace elements serialized as XML; non-ASCII percent-escaping in URI-valued attributes. document()base-URI resolution (12.1): two-argument and node-set forms, string-form against the stylesheet element's base, anddocument('')in included/imported files.xsl:keyunionmatchpatterns now index every branch;key()returns all non-unique matches.xsl:namespace-aliaskeeps the literal prefix and remaps only the URI (7.1.1).xsl:number:formatis an attribute value template;level="single"counts thefrom-boundary node when it also matchescount.xsl:strip-space/preserve-spacematch by namespace;xml:space="preserve"honored in the stylesheet.disable-output-escapingignored off text nodes (16.4); axis-prefixed match-pattern priority (5.5).- XPath: a lone
/parses as the root before a terminator; large integer-valued numbers format without a decimal point.
Emitting
- Processing-instruction data containing
?>is split with a space (7.3 recovery).
Verification
Green across the full local suite (1655 tests) and all authoritative corpora: Apache Xalan XSLT, the W3C XML 1.0 suite (xmltest, OASIS, IBM, Sun, Eduni, Japanese), the W3C XSD Schema Test Suite (14k+ groups), and the RELAX NG spec suite. swiftformat/swiftlint clean; builds and tests under WASI. Pure Swift, no dependencies, no Foundation in the library.
See CHANGELOG.md for the full list.
PureXML 0.2.0
Second pre-release of PureXML — a pure-Swift XML parser/emitter with an XSD 1.0 schema engine, XPath/XSLT support, and a WASI-compatible build (no external dependencies; macOS, Linux, WASI).
Highlights
- Schema-validity campaign: W3C XSTS settled baselines 1 / 266 / 171 / 155 (valid schema rejected / invalid schema accepted / valid instance rejected / invalid instance accepted). Invalid-schemas-accepted down from 394 at 0.1.0 to 266.
- New rules: facet applicability, schema element order and import constraints, cross-container name uniqueness, notation validation, namespace-aware reference resolution, complexContent attribute inheritance/restriction, NameAndTypeOK type derivation (partial), keyref arity, and component child-content shape checks.
- Robustness: generative schema/parser fuzz harness (no crashes or hangs over the current corpus).
XSTS baselines (settled expectations)
| Bucket | Count |
|---|---|
| Valid schemas rejected | 1 (particlesZ001, spec-ambiguous) |
| Invalid schemas accepted | 266 |
| Valid instances rejected | 171 |
| Invalid instances accepted | 155 |
Run opt-in: XSTS_ROOT=/path/to/xmlschema2006-11-06 swift test -c release --filter XSTS
Status
Pre-1.0. Remaining gates: schema differential oracle, proven bounds (not caps), located line/column diagnostics. See docs/production-readiness.md and CHANGELOG.md.