v7.2.0 #96
harshankur
announced in
Announcements
v7.2.0
#96
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
v7.2.0: 🏗️ Parser Enhancements, Granular HTML Generator Controls, and Strict AST Typings
I am thrilled to announce the release of officeParser v7.2.0! This major update brings a massive architectural upgrade to the AST, empowering developers with deeper insight into document layout, embedded metadata, and bulletproof TypeScript integrations.
As we pave the way for building advanced RAG architectures, deep-document search systems, and robust AI parsing pipelines on top of officeParser, v7.2.0 guarantees that every piece of document intelligence—from slide masters to hidden footnotes—is logically structured and heavily typed.
Warning
Soft Breaking Change: Notes Placement
If your application iterates over
ast.contentto manually extract footnotes, endnotes, or slide speaker notes, you will need to update your logic. These nodes are no longer appended to the main content array. They are now structurally nested inside thenotes[]array of their logical parent or preceding text node.🌟 Key Pillars of the v7.2.0 Update
1. Structural Notes Attachment
Previously, footnotes, endnotes, and slide speaker notes were flattened and appended to the end of the document content. In v7.2.0, these notes are now strictly attached to their logical parent or preceding sibling nodes via a new
node.notes[]array.Note: The legacy
putNotesAtLastconfig flag is now deprecated.2. Auxiliary Content (Headers, Footers, Slide Masters)
The new
ast.auxiliaryproperty unlocks out-of-band document templates!officeParsernow automatically extracts headers and footers from Word documents (ast.auxiliary.headers/footers), and Slide Masters from PowerPoint presentations (ast.auxiliary.slideMasters). These are neatly separated from the main sequential document flow.3. Native & Custom Document Properties
The
OfficeMetadatainterface has been radically upgraded. Alongside canonical metadata fields (title, author, dates),officeParsernow exposes format-specific verbatim metadata viaast.metadata.nativeProperties(e.g.,<meta>tags in HTML,app.xmlstats in DOCX, XMP dicts in PDF) and user-defined variables viaast.metadata.customProperties.4. Discriminated Unions & Strict AST Typings
The generic
OfficeContentNodeinterface has been completely refactored into a strict TypeScript Discriminated Union. This unlocks precise, compile-time type narrowing pernode.type(e.g., safely accessingSlideMetadataonly whentype === 'slide'), eliminating the need for generic fallback assertions across your application.5. Interactive HTML Spreadsheet Layouts & DOM Injections
The HTML Generator just got significantly smarter:
.col-resizer) to dynamically resize rows and columns in the browser.HtmlGeneratorConfigwithcontainerWidth,customCss, and DOMinjections(head/body hook insertions).🛠 Getting Started
npm install officeparser@7.2.0Example of using the new Discriminated Unions, Auxiliary nodes, and Structural Notes:
🔗 Full Changelog: View v7.2.0 Details
🔗 Documentation & Visualizer: officeparser.harshankur.com
❤️ Supporting the Future of Document Infrastructure
Since 2019, officeParser has been maintained as a voluntary project, growing to support over 10 million downloads and 300,000+ weekly installations.
As I build the ultimate document-to-AI pipeline, I seek professional sustainability to fund officeParser's next milestones:
If officeParser powers your production workflows or AI pipelines, please consider supporting its development:
👉 GitHub Sponsors
👉 Buy Me A Coffee
Changes: v7.1.0..v7.2.0
This discussion was created from the release v7.2.0.
Beta Was this translation helpful? Give feedback.
All reactions