An elephant never forgets your document structure — convert
.docxto semantic HTML and Markdown in pure PHP. Inspired by mammoth.js.
Attribution. This library is an independent PHP port inspired by mammoth.js by Michael Williamson. It adopts mammoth's core philosophy (extract semantics, not formatting) and style-mapping DSL. It is not affiliated with or endorsed by the mammoth.js project.
- Pure PHP, no
LibreOffice/Pandoc/ shell-out — only the standarddom,libxml,mbstring,xmlreader,zipextensions. - HTML and Markdown output from the same intermediate model.
- Headings (h1–h6), paragraphs, runs with bold / italic / strike /
superscript / subscript, hyperlinks, nested lists (mixed
ul/ol), tables (colspan+rowspanfromgridSpanandvMerge), images (embedded as data URIs by default, customisable handler), footnotes and endnotes (with backlinks), comments, content controls (w:sdt), symbol fonts (Wingdings / Webdings / Symbol via dingbat-to-unicode), legacy hyperlink complex fields (w:fldChar). - Mammoth-compatible style-mapping DSL:
p[style-name='Heading 1'] => h1:fresh,b => strong,r.Code => code,comment-reference => sup,highlight[color='yellow'] => mark,br[type='page'] => hr,table.Grid => table.fancy,=> !for ignore. extractRawTextfor plain-text indexing pipelines.embedStyleMap/readEmbeddedStyleMapfor mammoth-compatible in-document style maps.- CLI
bin/elephant-phpfor.docx → HTML/Markdownfrom the terminal.
See ROADMAP.md for the gap with mammoth.js (mostly DSL
edge cases, checkbox form fields, and OMML equations — all areas where
mammoth itself has limitations).
composer require endless-creativity/elephant-phpRequires PHP 8.2+.
use EndlessCreativity\ElephantPhp\Converter;
$result = (new Converter())->convertToHtml('/path/to/file.docx');
echo $result->value; // semantic HTML
foreach ($result->messages as $message) {
fwrite(STDERR, "[{$message->type->value}] {$message->message}\n");
}$result = (new Converter())->convertToMarkdown('/path/to/file.docx');
file_put_contents('article.md', $result->value);$result = (new Converter())->extractRawText('/path/to/file.docx');
// Paragraphs are separated by "\n\n", everything else just contributes
// its descendant text. Useful for indexing/search pipelines.
echo $result->value;Mammoth's DSL is supported as a list of rule strings. Rules are tried in order; the first match wins. The default heading map (Heading 1..6 → h1..h6) is appended after your rules.
$converter = new Converter(styleMap: [
"p[style-name='Aside'] => aside.callout",
"p[style-name='Quote'] => blockquote > p:fresh",
"r[style-name='Code'] => code",
"comment-reference => sup", // opt in to comments
"highlight[color='yellow'] => mark.yellow",
"br[type='page'] => hr",
"p[style-name='List Paragraph'] =>", // silence common warning
]);
$html = $converter->convertToHtml('/path/to/file.docx')->value;By default, images are embedded as data: URIs. Plug in your own
ImageHandler to write to disk / S3 / a CDN / whatever, returning the
final <img> attributes.
use EndlessCreativity\ElephantPhp\Document\Image;
use EndlessCreativity\ElephantPhp\Image\ImageHandler;
$handler = new class implements ImageHandler {
public function attributes(Image $image): array {
$bytes = ($image->readBytes)();
$hash = hash('sha256', $bytes);
$ext = ['image/png' => 'png', 'image/jpeg' => 'jpg'][$image->contentType] ?? 'bin';
$path = "uploads/{$hash}.{$ext}";
file_put_contents(__DIR__ . "/public/{$path}", $bytes);
return ['src' => "/{$path}"];
}
};
$converter = new Converter(imageHandler: $handler);Mammoth supports embedding the style map as a part of the docx itself
under mammoth/style-map. Read / write round-trips:
// Write
$bytes = Converter::embedStyleMap('/in.docx', "p[style-name='Aside'] => p.aside");
file_put_contents('/out.docx', $bytes);
// Read
$rules = Converter::readEmbeddedStyleMap('/out.docx');vendor/bin/elephant-php /path/to/file.docx # → HTML to stdout
vendor/bin/elephant-php --markdown /path/to/file.docx # → Markdown to stdout
vendor/bin/elephant-php /path/to/file.docx out.html # → HTML to fileConversion warnings are written to stderr regardless of the output
destination.
The following messages are expected and don't indicate malfunctions:
Unrecognised paragraph style: 'List Paragraph' (Style ID: ListParagraph)— Word applies this to list items; the default style map only covers headings. Addp[style-name='List Paragraph'] =>to silence.Unrecognised run style: 'FootnoteReference'— same mechanism insidefootnotes.xml.Image of type image/bmp is unlikely to display in web browsers— informational; the<img>is still emitted.
composer install
composer test # Pest
composer stan # PHPStan level 8
composer format # Laravel PintProject guidance and porting conventions live in
CLAUDE.md. Limitations and roadmap items vs mammoth in
ROADMAP.md. Contributions welcome — see
CONTRIBUTING.md.
BSD-2-Clause. Copyright © 2026 Endless Creativity (PHP port) and © 2013 Michael Williamson (mammoth.js, from which this work derives its algorithmic structure and test fixtures).