A Node.js library for parsing PowerPoint (PPTX) files and extracting text content. This library maintains text formatting, line breaks, and paragraph structures from the original presentation.
-
Extract text content from PPTX files with preserved formatting
-
Parse PPTX structure into manageable JavaScript objects
-
Access raw XML content of presentation components
-
Written in TypeScript for type safety
-
Promise-based API
-
Preserves line breaks and paragraph formatting
-
Minimal dependencies
npm install node-pptx-parserOnce the package is installed you can you it with import or require statements like this:
// ESM import:
import PptxParser from "node-pptx-parser";
// CommonJs require:
const PptxParser = require("node-pptx-parser").default;import PptxParser from "node-pptx-parser";
async function main() {
const parser = new PptxParser("presentation.pptx");
try {
// Extract text from all slides
const textContent = await parser.extractText();
// Print text from each slide
textContent.forEach((slide) => {
console.log(`\nSlide ${slide.id}:`);
console.log(slide.text.join("\n"));
});
} catch (error) {
console.error("Error:", error.message);
}
}
main();import PptxParser from "node-pptx-parser";
async function main() {
const parser = new PptxParser("presentation.pptx");
try {
// Get complete parsed presentation content
const parsedContent = await parser.parse();
// Access presentation structure
console.log(parsedContent.presentation.parsed);
// Access individual slides
parsedContent.slides.forEach((slide) => {
console.log(`Slide ${slide.id}:`, slide.parsed);
});
// Access raw XML if needed
console.log(parsedContent.presentation.xml);
} catch (error) {
console.error("Error:", error.message);
}
}
main();The main class for parsing PPTX files.
constructor(filePath: string)Creates a new instance of PptxParser.
filePath: Path to the PPTX file to be parsed
async parse(): Promise<ParsedPresentation>Parses the entire PPTX file and returns its content.
- Returns: Promise resolving to a
ParsedPresentationobject containing the complete presentation structure
async extractText(): Promise<SlideTextContent[]>Extracts formatted text content from all slides.
- Returns: Promise resolving to an array of
SlideTextContentobjects
interface ParsedPresentation {
presentation: {
path: string;
xml: string;
parsed: any;
};
relationships: {
path: string;
xml: string;
parsed: any;
};
slides: ParsedSlide[];
}interface ParsedSlide {
id: string;
path: string;
xml: string;
parsed: any;
}interface SlideTextContent extends ParsedSlide {
text: string[];
}The library throws errors in the following cases:
-
Invalid PPTX file structure
-
File reading errors
-
XML parsing errors
Example error handling:
try {
const parser = new PptxParser("presentation.ppt");
const content = await parser.extractText();
} catch (error) {
if (error.message.includes("Invalid PPTX file structure")) {
console.error("The PPTX file is corrupted or invalid");
} else {
console.error("An error occurred:", error.message);
}
}- unzipper: For extracting PPTX files
- xml2js: For parsing XML content
MIT
Contributions are welcome! Please feel free to submit a Pull Request.