Skip to content

littleben/word2md-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

word2md-cli

npm license

Convert .docx files to Markdown from the command line. Optional image OCR via PaddleX.

Web version: word2md.net — drag & drop in browser, no install needed.

Install

pnpm add -g word2md-cli
# or run without install
npx word2md-cli input.docx

Usage

word2md input.docx                         # → input.md next to source
word2md input.docx -o out.md               # custom output
word2md input.docx --stdout                # to stdout
word2md a.docx b.docx c.docx -d out/       # batch mode
word2md input.docx --format text           # plain text (strip markdown)

Image OCR

Pass --ocr with PaddleX credentials to extract text from images inside the docx:

export PADDLEX_OCR_URL="https://..."
export PADDLEX_OCR_TOKEN="..."
word2md input.docx --ocr

Or pass flags directly:

word2md input.docx --ocr \
  --paddlex-url https://... \
  --paddlex-token xxx \
  --ocr-concurrency 4

Without --ocr, images are stripped.

Develop

pnpm install
pnpm dev -- sample.docx --stdout
pnpm build

About

Convert .docx to Markdown from CLI, with optional image OCR (PaddleX).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors