elephant-php-neuron

A Neuron AI FileDataLoader reader for .docx documents, powered by elephant-php.

Drop it into a Neuron RAG pipeline and .docx files become embeddable documents alongside the bundled PDF, HTML and plain-text readers.

Installation

composer require endless-creativity/elephant-php-neuron

Requires PHP 8.2+. No external binaries needed (unlike PdfReader).

Usage

use NeuronAI\RAG\DataLoader\FileDataLoader;
use EndlessCreativity\ElephantPhpNeuron\DocxReader;

$documents = FileDataLoader::for(__DIR__.'/knowledge')
    ->addReader('docx', new DocxReader())
    ->getDocuments();

MyRAG::make()->addDocuments($documents);

Pass a directory and Neuron walks it, picking the right reader per extension; pass a single file to ingest just that one.

Output format

By default the reader returns plain text via Converter::extractRawText() — paragraphs separated by "\n\n", no markup. This is usually what you want for embeddings: less syntactic noise, more semantic signal per token.

If you'd rather preserve headings, lists and links — for example because your splitter or post-processor relies on Markdown structure — request Markdown explicitly through the reader options:

FileDataLoader::for($path)
    ->addReader('docx', new DocxReader())
    ->getDocuments(['format' => DocxReader::FORMAT_MARKDOWN]);

The $options array is forwarded by FileDataLoader to every reader, so the same flag is in effect for the whole loading pass.

Limitations

Only OOXML .docx is supported. Legacy binary .doc (Word 97–2003) is not handled by elephant-php and therefore not by this reader either.
Images embedded in the document are dropped during text extraction. This is intentional for RAG — embeddings are text-only.
Conversion warnings emitted by elephant-php (Result::messages) are currently silenced. If you need them, open an issue.

License

BSD-2-Clause. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
phpstan.neon		phpstan.neon
phpunit.xml		phpunit.xml
pint.json		pint.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

elephant-php-neuron

Installation

Usage

Output format

Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

elephant-php-neuron

Installation

Usage

Output format

Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages