Skip to content

simdxml/simdxml-elixir

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimdXml

SIMD-accelerated XML parsing with full XPath 1.0 support for Elixir.

SimdXml parses XML into a flat structural index (~16 bytes per tag) using SIMD instructions, then evaluates XPath expressions against it using array operations. No DOM tree, no atom creation from untrusted input, no XXE vulnerabilities.

Wraps the simdxml Rust crate via Rustler NIFs with precompiled binaries for all major platforms.

Installation

def deps do
  [{:simdxml, "~> 0.1.0"}]
end

Precompiled NIF binaries are provided for macOS (Apple Silicon, Intel), Linux (x86_64, aarch64, musl), and Windows. Set SIMDXML_BUILD=1 to compile from source if needed.

Quick start

# Parse
doc = SimdXml.parse!("<library><book lang='en'><title>Elixir</title></book></library>")

# Query with XPath
SimdXml.xpath_text!(doc, "//title")
#=> ["Elixir"]

# Navigate elements (Enumerable)
root = SimdXml.Document.root(doc)
Enum.map(root, & &1.tag)
#=> ["book"]

# Attributes
[book] = SimdXml.Element.children(root)
SimdXml.Element.get(book, "lang")
#=> "en"

Query combinators

Build XPath queries with Elixir pipes instead of strings:

import SimdXml.Query

query = descendant("book") |> where_attr("lang", "en") |> child("title") |> text()

SimdXml.query!(doc, query)
#=> ["Elixir"]

# Inspect the generated XPath
SimdXml.Query.to_xpath(query)
#=> "//book[@lang='en']/title/text()"

Queries are composable data structures — extract common fragments and reuse them:

books = descendant("book")
english = books |> where_attr("lang", "en")
titles = english |> child("title") |> text()
authors = english |> child("author") |> text()

Compiled queries

Compile once, evaluate against many documents:

query = SimdXml.compile!("//title")

SimdXml.eval_text!(doc1, query)
SimdXml.eval_text!(doc2, query)

# Optimized short-circuit operations
SimdXml.eval_count!(doc, query)     #=> 1
SimdXml.eval_exists?(doc, query)    #=> {:ok, true}

Compiled queries are NIF resources — safe to share across processes, store in ETS, or hold in module attributes.

Batch processing

Process thousands of documents with bloom filter prescanning:

query = SimdXml.compile!("//claim")
{:ok, results} = SimdXml.Batch.eval_text_bloom(xml_binaries, query)

Documents that cannot contain the target tags are skipped without parsing.

Quick grep mode

For simple //tagname extraction at memory bandwidth — no structural index:

scanner = SimdXml.Quick.new("claim")
SimdXml.Quick.extract_first(scanner, xml)    #=> "First claim text"
SimdXml.Quick.exists?(scanner, xml)          #=> true
SimdXml.Quick.count(scanner, xml)            #=> 42

Result helpers

SimdXml.Result.one(doc, "//title")           #=> "Elixir"
SimdXml.Result.fetch(doc, "//title")         #=> {:ok, "Elixir"}
SimdXml.Result.all(doc, "//title")           #=> ["Elixir"]

Why SimdXml?

SimdXml SweetXml Saxy
Parser SIMD Rust NIF xmerl (Erlang) Pure Elixir SAX
XPath Full 1.0 Full 1.0 (via xmerl) None
Memory ~16 bytes/tag ~350 bytes/node Streaming
Atom safety Strings only Creates atoms Strings only
XXE safe No DTD processing Vulnerable by default No DTD processing
API Combinators + XPath ~x sigil SAX handlers
Batch Bloom-filtered No No

Documentation

Full API docs and interactive Livebook guides:

License

MIT

About

SIMD-accelerated XML parsing with full XPath 1.0 for Elixir. Rustler NIF wrapping simdxml.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors