Skip to content

samatild/evtxparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

evtxparser

evtxparser is a focused Python CLI that turns .evtx files into CSV with a streaming, low-overhead pipeline. No GUI, no schema discovery pass, no giant in-memory objects: open the log, walk the records once, write rows.

Goals

  • Fast by default: stream records directly from the EVTX file to CSV.
  • Simple output: stable columns for common system fields plus compact payload columns.
  • Publishable package: pyproject.toml, console entry point, tests, CI, and PyPI-ready metadata.

Installation

pip install evtxparser

For optional parser and JSON speedups:

pip install "evtxparser[speedups]"

Usage

Export one file to stdout:

evtxparser Security.evtx

Export one file to disk:

evtxparser Security.evtx --output security.csv

Export a directory of logs:

evtxparser /mnt/logs --recursive --output logs.csv

Force a live progress bar on stderr while exporting:

evtxparser Security.evtx --output security.csv --progress

Use multiple CPU cores for a big export:

evtxparser Security.evtx --output security.csv --workers 8

Keep the raw XML in the final CSV column for maximum fidelity:

evtxparser Security.evtx --include-xml --output security.csv

Output schema

Column Description
source_file Input EVTX path used for the row
record_number EVTX record number
timestamp Record FILETIME converted to ISO-8601
event_id Windows event ID
event_qualifiers Optional event qualifiers from <EventID Qualifiers="...">
event_version Event version
event_level Event level
event_task Event task
event_opcode Event opcode
event_keywords Event keywords
channel Event log channel
provider_name Provider name
provider_guid Provider GUID
computer Computer name
user_id SID from the Security node
process_id Process ID from Execution
thread_id Thread ID from Execution
activity_id Correlation activity ID
related_activity_id Correlation related activity ID
event_data Compact JSON array for ordered EventData items
user_data Compact XML for UserData payloads
raw_xml Included only when --include-xml is set

event_data is intentionally stored as an ordered JSON array instead of a flattened CSV explosion. That preserves duplicate keys, unnamed fields, and original ordering without a costly pre-scan.

Why this layout is fast

  1. python-evtx memory-maps the source file.
  2. Records are processed one at a time and written immediately.
  3. The CSV header is fixed, so there is no schema inference pass.
  4. The hot path extracts fields directly from the rendered record XML instead of building a second XML tree.
  5. Event-specific payloads stay compact in two columns instead of forcing dynamic columns.

Progress display

When stderr is interactive, evtxparser shows a live progress bar automatically. It writes progress to stderr, so CSV output stays clean on stdout.

  • --progress: always show the progress bar
  • --no-progress: disable the progress bar

Parallel export

evtxparser can use multiple processes to go faster on large exports.

  • --workers 0: auto-select based on CPU count and available EVTX chunks
  • --workers 1: single-process mode
  • --workers N: use exactly N worker processes, capped by available work

The parallel path preserves CSV row order. It uses worker processes, not threads, because the EVTX parsing hot path is pure Python and benefits more from escaping the GIL.

Development

Create a virtual environment, then install the package with dev tools:

python -m pip install -e ".[dev,speedups]"

Run tests:

pytest

Build distributions:

python -m build

Publishing

The package metadata is defined in pyproject.toml, and the repository includes GitHub Actions workflows for CI and PyPI publishing. Once the repository exists and trusted publishing is configured on PyPI, a GitHub release can publish the package without changing the project layout.

About

`evtxparser` is a focused Python CLI that turns `.evtx` files into CSV with a streaming, low-overhead pipeline. No GUI, no schema discovery pass, no giant in-memory objects: open the log, walk the records once, write rows.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages