Skip to content

v0.8.0

Choose a tag to compare

@harumiWeb harumiWeb released this 22 Apr 13:36
· 12 commits to main since this release
91777b0

v0.8.0 Release Notes

This release publishes the April 2026 extraction work: stronger pure-Python rich
extraction in light mode, corrected print-area defaults across public
entrypoints, and LibreOffice / OOXML resilience hardening.

Highlights

  • light now acts as the pure-Python OOXML-rich baseline for .xlsx /
    .xlsm, so non-COM environments can emit best-effort:
    • shapes
    • connectors / arrows
    • charts
  • light now keeps print_areas by default across:
    • extract(...)
    • process_excel(...)
    • ExStructEngine
    • CLI extraction and --print-areas-dir
  • libreoffice now seeds the same OOXML baseline first and then applies UNO
    enrichment when available, so fallback paths preserve already recovered rich
    artifacts where safe.
  • LibreOffice workbook lifecycle handling is more robust for custom
    session_factory integrations via typed workbook handles and session-owned
    close semantics.
  • OOXML drawing parsing is more resilient and more efficient:
    • malformed or corrupt drawing parts now fail per sheet instead of dropping
      healthy workbook siblings
    • worksheet metrics are read with streaming XML parsing
    • row/column offset lookups now use cached cumulative offsets

Compatibility Notes

  • No new extraction CLI commands were added in v0.8.0.
  • light mode behavior changed intentionally:
    • previous releases treated light as cells + table candidates only
    • v0.8.0 adds best-effort OOXML shapes / connectors / charts for OOXML
      workbooks and keeps print areas by default
  • .xls remains outside the new OOXML-rich baseline; the new non-COM rich path
    applies to .xlsx / .xlsm.
  • Serialized backend metadata may now report python_ooxml provenance when
    backend metadata output is enabled.
  • MCP tool names and payload shapes remain compatible; the release changes the
    extraction content available behind existing interfaces rather than adding a
    new transport contract.

Notes

  • The repository docs/build path still has a pre-existing mkdocstrings
    failure in docs/api.md; this issue was already reproducible before the
    v0.8.0 extraction work and is not introduced by this release.
  • Review-driven hardening after the initial implementation also restored
    process_excel() auto-filter behavior, corrected stale README / architecture
    wording, and prevented OOXML baseline seeding failures from crashing the
    LibreOffice pipeline.