Skip to content
A simple command line script for getting a more accurate word count on LaTeX projects
Python
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE
README.md
texwc

README.md

texwc

A simple command line script for getting a more accurate word count on LaTeX projects. This is basically a wrapper for detex | wc with support for configuration files for projects, so that a word count can be obtained from the terminal simply be entering texwc.

Installation

Put texwc on your path (e.g. ~/bin).

Dependencies

detex (sudo apt install texlive-extra-utils, or see OpenDetex for a more recent version)

Usage

  • From the terminal, run texwc [path], where path is the path of a .tex file.
  • To get a word count from multiple files, specify the path of a .texwc config file for path, or the path of a directory containing a config file. If no value is specified for path, the current working directory is used.
  • Config files can be generated with the -i option (see below for details).
  • By default, the \input and \include LaTeX commands are ignored. This is to allow control over which included files should be counted (e.g., appendices and title pages are usually not included in a word count). To expand these commands, use the --with-includes option.
  • The output will show line, word and character counts for each specified file as well as a total:
    $ texwc
    LINES WORDS CHARS FILE
       35   595  3965 chapters/01_introduction
      285  5370 33619 chapters/02_background
      220  3002 18913 chapters/03_methodology
      339  4106 25191 chapters/04_implementation
      305  1669 10659 chapters/05_results
       25   814  5156 chapters/06_conclusion
     1209 15556 97503 TOTAL
    

Options

The following options can be specified to modify the behaviour of the script:

  • -h/--help: Print help message with usage information.
  • -i/--init: Initialise directory with a default config file containing all .tex files in this directory.
  • -r/--recursive: Recursively include .tex files in subdirectories when initialising (only with -i).
  • -w/--with-includes: Expand \input and \include commands. (This takes precedence over detex-options.)
  • -p/--print-text: Print output of detex instead of word count. This can be useful to ensure that the correct text is included in the word count, e.g. that the right environments are being ignored.
  • --plain: Print in plain text, without formatting by ANSI escape sequences.

Config file

A .texwc config file contains a JSON object representing configuration options. The fields of this file are:

  • "files" (required): A list of relative .tex file paths to be included in the word count.
  • "detex-options": A list of options to be passed to detex. See the detex documentation for details.
  • "ignore-envs": A list of environments to exclude from the word count. (Please note that detex only allows 10 environments to be included in this list, so you may need to remove environments you don't use.)

Here is an example of a typical .texwc file:

{
    "detex-options": [
        "-l",
        "-n"
    ],
    "ignore-envs": [
        "array",
        "eqnarray",
        "equation",
        "figure",
        "table",
        "verbatim",
        "lstlisting",
        "sidewaystable"
    ],
    "files": [
        "chapters/01_introduction",
        "chapters/02_background",
        "chapters/03_methodology",
        "chapters/04_implementation",
        "chapters/05_results",
        "chapters/06_conclusion"
    ]
}
You can’t perform that action at this time.