Skip to content

v0.6.0

Choose a tag to compare

@github-actions github-actions released this 07 May 11:47
· 5 commits to refs/heads/main since this release

BabelDOC v0.6.0

Highlights

  • BabelDOC's translation pipeline now uses a substantially rewritten PDF parser.
    This is a large internal change aimed at making PDF translation more reliable
    across complex real-world files.
    BabelDOC 的翻译主链路现在使用了大幅重写后的 PDF 解析器。这是一次较大的内部改造,
    目标是在复杂真实 PDF 上提升翻译可靠性。
  • PDF reconstruction is more robust for documents that rely on advanced PDF
    drawing features such as graphics state, Form XObjects, clipping paths, soft
    masks, shadings, and inline images.
    对依赖复杂 PDF 绘制特性的文档,PDF 重建现在更加稳健,包括图形状态、Form XObject、
    裁剪路径、软蒙版、渐变填充和内联图片等场景。
  • Several fixes in this release reduce invalid generated PDF streams and missing
    resource references in intermediate and final outputs.
    本版本修复了若干会导致中间或最终输出 PDF 流无效、资源引用缺失的问题。

PDF Reliability

  • Replaced the parser used by the main translation flow with the rewritten
    parser implementation.
    翻译主流程使用的 PDF 解析器已切换到重写后的实现。
  • Improved serialization of PDF tokens when BabelDOC regenerates content
    streams, including PDF names, keywords, strings, arrays, dictionaries,
    indirect references, booleans, and inline-image parameters.
    BabelDOC 重新生成 PDF 内容流时,对 PDF token 的序列化更稳健,包括名称、关键字、
    字符串、数组、字典、间接引用、布尔值和内联图片参数。
  • Improved resource handling when rewriting page and Form XObject streams,
    especially for ExtGState and shading resources.
    重写页面和 Form XObject 内容流时,资源处理更加完整,尤其是 ExtGState 和 Shading
    资源。
  • Improved preservation of clipping and graphics-state behavior around images,
    Form XObjects, shading paints, and soft masks.
    对图片、Form XObject、渐变绘制和软蒙版周围的裁剪与图形状态行为保留得更好。

Compatibility Notes

  • The previous parser remains available for explicit compatibility tooling, but
    normal translation now uses the rewritten parser.
    旧解析器仍保留给显式兼容工具使用;普通翻译流程现在使用重写后的解析器。
  • table_model is deprecated and ignored with a warning; BabelDOC no longer
    loads RapidOCR table-text detection resources at runtime.
    table_model 已废弃,传入时会被忽略并给出警告;BabelDOC 运行时不再加载 RapidOCR
    表格文本检测资源。

Changes

  • No changes

Contributors

@awwaawwa