SIMD-accelerated XML parsing with full XPath 1.0 support for Elixir.
SimdXml parses XML into a flat structural index (~16 bytes per tag) using SIMD instructions, then evaluates XPath expressions against it using array operations. No DOM tree, no atom creation from untrusted input, no XXE vulnerabilities.
Wraps the simdxml Rust crate via Rustler NIFs with precompiled binaries for all major platforms.
def deps do
[{:simdxml, "~> 0.1.0"}]
endPrecompiled NIF binaries are provided for macOS (Apple Silicon, Intel), Linux
(x86_64, aarch64, musl), and Windows. Set SIMDXML_BUILD=1 to compile from
source if needed.
# Parse
doc = SimdXml.parse!("<library><book lang='en'><title>Elixir</title></book></library>")
# Query with XPath
SimdXml.xpath_text!(doc, "//title")
#=> ["Elixir"]
# Navigate elements (Enumerable)
root = SimdXml.Document.root(doc)
Enum.map(root, & &1.tag)
#=> ["book"]
# Attributes
[book] = SimdXml.Element.children(root)
SimdXml.Element.get(book, "lang")
#=> "en"Build XPath queries with Elixir pipes instead of strings:
import SimdXml.Query
query = descendant("book") |> where_attr("lang", "en") |> child("title") |> text()
SimdXml.query!(doc, query)
#=> ["Elixir"]
# Inspect the generated XPath
SimdXml.Query.to_xpath(query)
#=> "//book[@lang='en']/title/text()"Queries are composable data structures — extract common fragments and reuse them:
books = descendant("book")
english = books |> where_attr("lang", "en")
titles = english |> child("title") |> text()
authors = english |> child("author") |> text()Compile once, evaluate against many documents:
query = SimdXml.compile!("//title")
SimdXml.eval_text!(doc1, query)
SimdXml.eval_text!(doc2, query)
# Optimized short-circuit operations
SimdXml.eval_count!(doc, query) #=> 1
SimdXml.eval_exists?(doc, query) #=> {:ok, true}Compiled queries are NIF resources — safe to share across processes, store in ETS, or hold in module attributes.
Process thousands of documents with bloom filter prescanning:
query = SimdXml.compile!("//claim")
{:ok, results} = SimdXml.Batch.eval_text_bloom(xml_binaries, query)Documents that cannot contain the target tags are skipped without parsing.
For simple //tagname extraction at memory bandwidth — no structural index:
scanner = SimdXml.Quick.new("claim")
SimdXml.Quick.extract_first(scanner, xml) #=> "First claim text"
SimdXml.Quick.exists?(scanner, xml) #=> true
SimdXml.Quick.count(scanner, xml) #=> 42SimdXml.Result.one(doc, "//title") #=> "Elixir"
SimdXml.Result.fetch(doc, "//title") #=> {:ok, "Elixir"}
SimdXml.Result.all(doc, "//title") #=> ["Elixir"]| SimdXml | SweetXml | Saxy | |
|---|---|---|---|
| Parser | SIMD Rust NIF | xmerl (Erlang) | Pure Elixir SAX |
| XPath | Full 1.0 | Full 1.0 (via xmerl) | None |
| Memory | ~16 bytes/tag | ~350 bytes/node | Streaming |
| Atom safety | Strings only | Creates atoms | Strings only |
| XXE safe | No DTD processing | Vulnerable by default | No DTD processing |
| API | Combinators + XPath | ~x sigil |
SAX handlers |
| Batch | Bloom-filtered | No | No |
Full API docs and interactive Livebook guides:
MIT