Free and open-source word counting software for translators.
WordCounter is built to rival paid counting tools while staying simple, transparent, and free forever.
Created by Michael Beijer for real translation workflows.
Translators often need reliable counts across multiple file types, but many tools are locked behind subscriptions or expensive licenses. WordCounter is an alternative:
- Free to use
- Open source
- Focused on practical translator needs
- Easy to inspect, adapt, and improve
0.6.0
- Batch counts supported files:
.docx,.pptx,.xlsx(core).pdf(optional — requirespdfminer.six)- With optional Apache Tika: 50+ additional formats including
.doc,.xls,.ppt,.rtf,.odt,.odp,.ods,.html,.xml,.txt,.epub,.srt,.xliff,.tmx,.po, images (OCR), and many more
- Calculates per-file metrics:
- Words
- Characters
- Characters (no spaces)
- Numbers
- Number percentage
- Sentences
- Paragraphs
- Estimated pages
- Cross-document repetition analysis:
- Detects identical segments repeated across all files in a batch
- First occurrence = unique (full rate), subsequent = repetitions (reduced/zero rate)
- Per-file and total breakdown of unique vs repeated words
- Translation formats use native segments; other formats use sentence splitting
- Includes billing panel:
- Bill by words, characters, or estimated pages
- Separate rate for unique content and repetitions (rep. rate = 0 to exclude)
- Rate, currency, discount, tax
- Running total amount
- Exports results:
- CSV export
- Markdown export (with full document text included)
- Fixed-width clipboard report (great for Gmail with a monospace font)
- Reports include extracted document text below the count data
Browse…lets you choose either individual files or a folder.Countruns counts directly from selected files or the selected folder.Add files…,Remove selected, andRemove allsupport quick list refinement.
Windows 64-bit: Download the latest release from the Releases page. Extract the ZIP anywhere and run WordCounter.exe. No Java, Python, or other dependencies required — the JRE and Apache Tika are bundled.
Requires Python 3.10+ (3.12 tested).
Install dependencies:
pip install python-docx python-pptx openpyxl pdfminer.sixpdfminer.six is optional if you do not need PDF support.
Tika unlocks support for legacy Office (.doc, .xls, .ppt), OpenDocument, RTF, HTML, EPUB, subtitles, translation formats (XLIFF, TMX, PO), and more — including OCR for images if Tesseract is installed.
Requires Java (JRE 8+). On first run, tika-python downloads the Tika server JAR (~70 MB).
pip install tikaWithout Tika, WordCounter still works for .docx, .pptx, .xlsx, and .pdf.
From the project root:
python WordCounter.pyA starter test set is included in:
test_assets/
It contains sample Word documents, a PowerPoint file, and a PDF for quick verification.
WordCounter aims to be a free, open-source replacement for commercial word counting tools used by translators. Here is how it compares:
| Tool | Price | Platform | Formats | Invoicing | Status | Notes |
|---|---|---|---|---|---|---|
| WordCounter | Free (MIT) | Windows, macOS, Linux | 4 core + 50 via Tika | Billing panel (rate/tax/discount) | Active | Open source, lightweight, cross-platform via Python |
| AnyCount | EUR 89-399/yr or EUR 199-399 perpetual | Windows only | 70+ formats incl. OCR, CAT files, URLs | No (separate via TO3000) | Active | Most feature-rich; expensive; heavy (6 GB RAM) |
| PractiCount | ~USD 60 one-time | Windows only | 20+ formats | Yes, built-in with client DB | Low activity | Good value; dated UI; last major update references Office 2016 |
| FineCount | EUR 39/yr subscription | Windows only | ~15 formats | Basic quoting/invoicing | Maintenance mode | No major version update since 2018; subscription-only |
| CountAnything | Free | Windows only | ~12 formats | No | Dormant | Freeware but not open source; bare-bones; tiny user base |
Key differences:
- All four commercial/freeware alternatives are Windows-only. WordCounter runs anywhere Python does.
- None of the alternatives are open source. WordCounter can be inspected, modified, and extended by anyone.
- AnyCount is the most powerful but also the most expensive, targeting agencies and high-volume translators who need 70+ format support, OCR, and CAT file counting.
- PractiCount offers the best value for translators who want integrated invoicing with a one-time purchase, but development has slowed and the UI feels dated.
- FineCount was a popular mid-range option, but development appears stalled since 2018 and it requires an ongoing subscription with no perpetual license option.
- CountAnything is the only other free option, but it is closed-source, minimally maintained, and lacks billing features.
- Fuzzy match bands (e.g. 75-99% similarity) for near-repetitions
- Better PDF structure extraction and cleanup
- Persist user profiles and presets
- Cross-platform packaged binaries
- Plugin architecture for custom counting rules
Issues, suggestions, and pull requests are welcome.
If you are a translator, your real-world feedback is especially valuable.
This project is licensed under the MIT License. See the LICENSE file for details.