LaTeX file checking tool.
This Perl script reads your .tex files and looks for potential problems, such as doubled words ("the the") and many other bugs. Put it in your directory of .tex files and run it to look for common mistakes. You can also use it on raw text files, just use the -r option to disable LaTeX-specific warnings.
As an example, here is a snippet of a .tex file; first, look it over yourself:
Here's what chex_latex.pl finds in a fraction of a second:
You might disagree with some of the problems flagged, but with chex_latex.pl you are at least aware of all of them. A few minutes wading through these can catch errors hard to notice otherwise.
The chex_latex.pl script tests for:
- Doubled words, such as "the the."
- Grammatical goofs or clunky phrasing, as well as rules for formal writing, such as not using contractions.
- Potential inter-word vs. inter-sentence spacing problems.
- Any figure \label's that do not have any \ref's, and vice versa.
- \index markers that get opened but not closed, or vice versa.
- Misspellings used in computer graphics, e.g., "tesselation" and "frustrum" (and it's easy to add your own).
- Any \bibitems's that do not have any \cite's, and vice versa.
- And much else - more than 300 tests in all.
This script is in no way foolproof, and will natter about all sorts of things you may not care about. Since it's a Perl script, it's easy for you to delete or modify any tests that you don't like.
Installation and Use
Install Perl from say https://www.activestate.com/activeperl, put chex_latex.pl somewhere (easiest is to put it in the directory with your .tex files, else you need to specify the path to this file), go to the directory where your .tex files are and then:
and all .tex files in your directory and subdirectories will be read and checked for this and that. If you run this command in your downloaded repository, you should get the error list shown at the top of this page for the testfile.tex file included.
To run on a single file, here shown on Windows with an absolute path:
perl chex_latex.pl C:\Users\you\your_thesis\chapter1.tex
For all files in a directory, here shown with a relative path:
perl chex_latex.pl work_files/my-thesis-master
This script is one used for the book ''Real-Time Rendering'' and so has a bunch of book-specific rules. Blithely ignore our opinions or, better yet, comment out the warning lines you don't like in the script (the program's just a text file, nothing complex). You can also add "% chex_latex" to the end of any line in your .tex file in order to have this script skip some error tests on it, e.g.:
This method of using data is reasonable. % don't flag "data is" - chex_latex
The "chex_latex" says the line is OK and won't be tested. Beware, though: if you make any other errors on this line in the future, they also won't be tested.
The options are:
-d - turn off dash tests for '-' or '--' flagged as needing to be '---'. -f - turn off formal writing check; allows contractions and other informal usage. -p - turn ON picky style check, which looks for more style problems but is not so reliable. -s - turn off style check; looks for poor usage, punctuation, and consistency. -u - turn off U.S. style tests for putting commas and periods inside quotes.
So if you want all the tests, do:
perl chex_latex.pl -p [directory or files]
If you want the bare minimum, do:
perl chex_latex.pl -dfsu [directory or files]
If a message confuses you, look in the Perl script itself, as there are comments about some of the issues.
To run this checker against plain text files, just specify the files, as normal:
perl chex_latex.pl my_text_file.txt another_text_file.txt
If any file is found that does not end in ".tex," the LaTeX-specific tests will be disabled (for all files, so don't mix .tex with .txt).
Two other more obscure options:
Instead of adding a comment "% chex_latex" to lines you want the script to ignore, you could change the keyword to something else, e.g. "-O ignore_lint" would ignore all lines where you put "% ignore_lint" in a comment.
By default, the file "refs.tex" is the one that contains \bibitem entries. Our book uses these, just about no one else does. If you actually do use \bibitem, this one is worth setting to your references file. It will tell you whether you reference something that doesn't exist in the references file, and whether any references in the file are not used in the text.
Thanks to John Owens for providing a bunch of theses and technical articles for testing.
Bonus Tool: Aspell Sorter for Batch Spell Checking
Interactive spell checkers are fine for small documents, but for long ones I find it tedious to step through every word flagged as not being in the dictionary. Most of the time these are names, and for each hit I have to choose "ignore/add/fix" or whatever. I just want to toss in *.tex files and get a long list back of what words failed. Here's how I do it. My contribution is a little Perl script at the end that consolidates results.
After installing, I first put all .tex files into one test file. For example, on Windows:
type *.tex > alltext.txt
On linuxy systems:
cat *.tex > alltext.txt
Say that file is now in C:\temp. I then run Aspell on this file by going to the Aspell directory and doing this:
bin\aspell list -t < C:\temp\alltext.txt > C:\temp\alltypos.txt
This gives a long file of misspelled (or, more likely, not found, such as names) words, in order encountered. The same author's name will show up a bunch of times, code bits will get listed again and again, and other spurious problems flagged. I find it much faster to look at a sorted list of typos, showing each word just once. This can cut down the number of words you need to examine by a factor of five.
To make such a list, use the script aspell_sorter.pl:
perl aspell_sorter.pl alltypos.txt > spell_check.txt
which simply sorts the words in the alltypos.txt file, removing duplicates and giving a count. The file produced first lists all capitalized words (it is easy to skim past authors that way), then all lowercase. Sometimes the Aspell dictionaries leave words out, flagging false positives. You can avoid many of these by taking this output spell_check.txt file and pasting its contents into MS Word, for example, which will give a red underline only to words it thinks are misspelled.
There are lots of false positives, such as authors names, so I'll usually start by looking at the end of the spell_check.txt file, where the lowercase words hang out. Also, you can modify the script itself by setting $spellcount = 1 (or any other value, for the maximum number of repeats). If set, only words "misspelled" one time will be listed. You risk missing some word that is consistently misspelled, but the list is often considerably shorter (I've found it 2-3 times shorter), as false positives found more than once are culled out.
That's it - nothing fancy, but it has saved me a considerable amount of time and turned up some typos I would probably not have found otherwise. I can also save the results file, so if I later change .tex files, I can make a new spell_check.txt and do a "diff" to see if I've introduced any new errors.
Aspell also works on plaintext files, so if you can extract your text into a simple text file you can use this process to perform batch spell checking on anything. For other free tools, see my blog post.