Skip to content

Commit

Permalink
implement a TeX lexer (closes #10)
Browse files Browse the repository at this point in the history
  • Loading branch information
jneen committed Sep 12, 2012
1 parent c9ce486 commit 65c40d6
Show file tree
Hide file tree
Showing 4 changed files with 231 additions and 0 deletions.
2 changes: 2 additions & 0 deletions lib/rouge.rb
Expand Up @@ -18,8 +18,10 @@ def highlight(text, lexer, formatter)
load load_dir.join('rouge/text_analyzer.rb')
load load_dir.join('rouge/token.rb')
load load_dir.join('rouge/lexer.rb')

load load_dir.join('rouge/lexers/text.rb')
load load_dir.join('rouge/lexers/diff.rb')
load load_dir.join('rouge/lexers/tex.rb')

load load_dir.join('rouge/lexers/shell.rb')

Expand Down
75 changes: 75 additions & 0 deletions lib/rouge/lexers/tex.rb
@@ -0,0 +1,75 @@
module Rouge
module Lexers
class TeX < RegexLexer
tag 'tex'
aliases 'TeX', 'LaTeX', 'latex'

filenames '*.tex', '*.aux', '*.toc'
mimetypes 'text/x-tex', 'text/x-latex'

def self.analyze_text(text)
return 1 if text =~ /\A\s*\\documentclass/
return 1 if text =~ /\A\s*\\input/
return 1 if text =~ /\A\s*\\documentstyle/
return 1 if text =~ /\A\s*\\relax/
end

command = /\\([a-z]+|\s+|.)/i

state :general do
rule /%.*$/, 'Comment'
rule /[{}&_^]/, 'Name.Builtin'
end

state :root do
rule /\\\[/, 'Literal.String.Backtick', :displaymath
rule /\\\(/, 'Literal.String', :inlinemath
rule /\$\$/, 'Literal.String.Backtick', :displaymath
rule /\$/, 'Literal.String', :inlinemath
rule /\\(begin|end)\{.*?\}/, 'Name.Tag'

rule /(\\verb)\b(.)/ do |m|
group 'Name.Builtin'
group 'Name.Constant'
delim = Regexp.escape(m[2])

push do
rule /#{delim}/, 'Name.Constant', :pop!
rule /[^#{delim}]+/m, 'Literal.String.Other'
end
end

rule command, 'Keyword', :command
mixin :general
rule /[^\\$%&_^{}]+/, 'Text'
end

state :math do
rule command, 'Name.Variable'
mixin :general
rule /[0-9]+/, 'Literal.Number'
rule /[-=!+*\/()\[\]]/, 'Operator'
rule /[^=!+*\/()\[\]\\$%&_^{}0-9-]+/, 'Name.Builtin'
end

state :inlinemath do
rule /\\\)/, 'Literal.String', :pop!
rule /\$/, 'Literal.String', :pop!
mixin :math
end

state :displaymath do
rule /\\\]/, 'Literal.String.Backtick', :pop!
rule /\$\$/, 'Literal.String.Backtick', :pop!
rule /\$/, 'Name.Builtin'
mixin :math
end

state :command do
rule /\[.*?\]/, 'Name.Attribute'
rule /\*/, 'Keyword'
rule(//) { pop! }
end
end
end
end
22 changes: 22 additions & 0 deletions spec/lexers/tex_spec.rb
@@ -0,0 +1,22 @@
describe Rouge::Lexers::TeX do
let(:subject) { Rouge::Lexers::TeX.new }

describe 'guessing' do
include Support::Guessing

it 'guesses by filename' do
assert_guess :filename => 'foo.tex'
assert_guess :filename => 'foo.toc'
assert_guess :filename => 'foo.aux'
end

it 'guesses by mimetype' do
assert_guess :mimetype => 'text/x-tex'
assert_guess :mimetype => 'text/x-latex'
end

it 'guesses by source' do
assert_guess :source => '\\documentclass{article}'
end
end
end
132 changes: 132 additions & 0 deletions spec/visual/samples/tex
@@ -0,0 +1,132 @@
\documentclass{article}

\begin{document}



\centerline{\sc \large A Simple Sample \LaTeX\ File}
\vspace{.5pc}
\centerline{\sc Stupid Stuff I Wish Someone Had Told Me Four Years Ago}
\centerline{\it (Read the .tex file along with this or it won't
make much sense)}
\vspace{2pc}

The first thing to realize about \LaTeX\ is that it is not ``WYSIWYG''.
In other words, it isn't a word processor; what you type into your
.tex file is not what you'll see in your .dvi file. For example,
\LaTeX\ will completely ignore extra
spaces within a line of your .tex file.
Pressing return
in
the
middle
of
a
line
will not register in your .dvi file. However, a double carriage-return
is read as a paragraph break.

Like this. But any carriage-returns after the first two will be
completely ignored; in other words, you


can't

add






more




space


between




lines, no matter how many times you press return in your .tex file.

In order to add vertical space you have to use ``vspace''; for example,
you could add an inch of space by typing \verb|\vspace{1in}|, like this:
\vspace{1in}

To get three lines of space you would type \verb|\vspace{3pc}|
(``pc'' stands for ``pica'', a font-relative size), like this:
\vspace{3pc}

Notice that \LaTeX\ commands are always preceeded by a backslash.
Some commands, like \verb|\vspace|, take arguments (here, a length) in
curly brackets.

The second important thing to notice about \LaTeX\ is that you type
in various ``environments''...so far we've just been typing regular
text (except for a few inescapable usages of \verb|\verb| and the
centered, smallcaps, large title). There are basically two ways that
you can enter and/or exit an environment;
\vspace{1pc}

\centerline{this is the first way...}

\begin{center}
this is the second way.
\end{center}

\noindent Actually there is one more way, used above; for example,
{\sc this way}. The way that you get in and out of environment varies
depending on which kind of environment you want; for example, you use
\verb|\underline| ``outside'', but \verb|\it| ``inside'';
notice \underline{this} versus {\it this}.

The real power of \LaTeX\ (for us) is in the math environment. You
push and pop out of the math environment by typing \verb|$|. For
example, $2x^3 - 1 = 5$ is typed between dollar signs as
\verb|$2x^3 - 1 = 5$|. Perhaps a more interesting example is
$\lim_{N \to \infty} \sum_{k=1}^N f(t_k) \Delta t$.

You can get a fancier, display-style math
environment by enclosing your equation with double dollar signs.
This will center your equation, and display sub- and super-scripts in
a more readable fashion:

$$\lim_{N \to \infty} \sum_{k=1}^N f(t_k) \Delta t.$$

If you don't want your equation to be centered, but you want the nice
indicies and all that, you can use \verb|\displaystyle| and get your
formula ``in-line''; using our example this is
$\displaystyle \lim_{N \to \infty} \sum_{k=1}^N f(t_k) \Delta t.$ Of
course this can screw up your line spacing a little bit.

There are many more things to know about \LaTeX\ and we can't
possibly talk about them all here.
You can use \LaTeX\ to get tables, commutative diagrams, figures,
aligned equations, cross-references, labels, matrices, and all manner
of strange things into your documents. You can control margins,
spacing, alignment, {\it et cetera} to higher degrees of accuracy than
the human eye can percieve. You can waste entire days typesetting
documents to be ``just so''. In short, \LaTeX\ rules.

The best way to learn \LaTeX\ is by example. Get yourself a bunch
of .tex files, see what kind of output they produce, and figure out how
to modify them to do what you want. There are many template and
sample files on the department \LaTeX\ page and in real life in the
big binder that should be in the computer lab somewhere. Good luck!











\end{document}

0 comments on commit 65c40d6

Please sign in to comment.