Skip to content
Frédéric Wang edited this page Oct 3, 2014 · 5 revisions

General description

TeXZilla is a Javascript LaTeX-to-MathML converter compatible with Unicode. MathML is the standard markup to render mathematical formulas on Web pages (and related media such as ebooks, mobile apps, mail clients etc), Javascript is the programming language natively supported by Web rendering engines and Unicode the standard for character encoding.

LaTeX is a typesetting system that is particularly popular for producing technical and scientific documentation. In particular, it defines a syntax to easily write mathematical formulas on the keyboard. This syntax is currently the de facto standard for that task, although other simple syntaxes are commonly used such as ASCIIMathML or Mathematica.

However, note that the LaTeX language is not really a standard in the strict sense. There are many LaTeX commands, macros and packages and new ones are always invented to cover the need of different communities. TeXZilla is intended to remain small and will not support all the imaginable LaTeX commands, even less those used in non-math mode. For a LaTeX-to-XML converter with a very good coverage we recommend to check LaTeXML instead.

The reference LaTeX support used by TeXZilla is the one provided by itex2MML, which should cover what the majority of people need to write mathematical formulas.

Basic syntax and Unicode characters

The idea of LaTeX idea is to use special commands (with a backslash prefix) followed by arguments to do special layout. The braces can be used to do some grouping. For example, \frac{a^2+b}{13} is used to write a fraction or \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix} to write matrices. Note that the reserved characters can be obtained by special commands for example \backslash \{ \} for backslashes and braces. You will easily find LaTeX tutorials on the Web but again, note that the reference syntax is the one of itex2MML, which might differ from the one of the usual LaTeX distributions.

LaTeX tries to interpret the mathematical tokens with basic semantics and this has impact on the mathematical rendering. For example the + is an operator and will have special spacing, the two digits 13 form a single number and the letter a is a mathematical variable displayed in italic. TeXZilla tries to generalize that to any unicode characters, for example you can write ∑_{n=1}^{+∞} \frac{1}{n^2} = \frac{π^2}{6} or س = \frac{-ب\pm\sqrt{ب^٢-٤اج}}{٢ا} and the summation symbol will be interpreted as an operator while the Arabic letter س is interpreted as a mathematical variable.

Numbers

The following commands or Unicode characters are interpreted as numbers (<mn> in MathML):

  • \infty, \infinity and , all of them producing the infinity sign.

  • A nonempty sequence of roman digits 0123456789, optionally followed by a dot . plus another nonempty sequence of roman digits. For example, 121 or 0.2031.

  • A nonempty sequence of Arabic digits ٠١٢٣٤٥٦٧٨٩, optionally followed by a Arabic decimal separator ٫ plus another nonempty sequence of Arabic digits. For example, ١٢١ or ٠٫٢٠٣١.

  • A nonempty sequences of bold 𝟎𝟏𝟐𝟑𝟒𝟓𝟔𝟕𝟖𝟗, double-struck 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡, sans-serif 𝟢𝟣𝟤𝟥𝟦𝟧𝟨𝟩𝟪𝟫, sans-serif bold 𝟬𝟭𝟮𝟯𝟰𝟱𝟲𝟳𝟴𝟵 or monospace 𝟶𝟷𝟸𝟹𝟺𝟻𝟼𝟽𝟾𝟿 digits.

As indicated on the itex2MML page, numbers are very culture-dependent. You can use the \itexnum{...} command or the \mn{...} command to force TeXZilla to interpret the argument as a number.

Identifiers

The following commands or Unicode characters are interpreted as single-char identifiers (<mi> in MathML):

  • In general, all the Unicode characters with mathclass "Alphabetic" in the W3C's unicode.xml file (modulo some changes to align with itex2MML commands below) and their associated AMS LaTeX commands to produce them, if any. This includes in particular the Arabic letters غظضذخثتشرقصفعسنملكيطحزوهدجب as well as Mathematical Alphanumeric Symbols.

  • LaTeX commands producing a single greek letter: \alpha (= α), \beta (= β), \gamma (= γ), \delta (= δ), \epsilon (= ϵ), \backepsilon (= ϶), \varepsilon (= ε), \zeta (= ζ), \eta (= η), \theta (= θ), \vartheta (= ϑ), \iota (= ι), \kappa (= κ), \varkappa (= ϰ), \lambda (= λ), \mu (= μ), \nu (= ν), \xi (= ξ), \omicron (= ), \pi (= π), \varpi (= ϖ), \rho (= ρ), \varrho (= ϱ), \sigma (= σ), \varsigma (= ς), \tau (= τ), \upsilon (= υ), \phi (= ϕ), \varphi (= φ), \chi (= χ), \psi (= ψ), \omega (= ω), \Alpha (= Α), \Beta (= Β), \Gamma (= Γ), \Delta (= Δ), \Zeta (= Ζ), \Eta (= Η), \Theta (= Θ), \Iota (= Ι), \Kappa (= Κ), \Lambda (= Λ), \Mu (= Μ), \Nu (= Ν), \Xi (= Ξ), \Pi (= Π), \Rho (= Ρ), \Sigma (= Σ), \Tau (= Τ), \Upsilon (= ϒ), \Phi (= Φ), \Psi (= Ψ), \Omega (= Ω), \digamma (= ϝ), \mho (= ).

  • LaTeX commands producing a single symbol: \aleph (= ), \beth (= ), \ell (= ), \hbar (= ), \Im (= ), (\imath = ı), \jmath (= ȷ), \eth (= ð), \Re (= ), \wp (= ), \emptyset (= \varnothing = ).

  • LaTeX commands \$, \%, \& producing the ASCII characters $, % and &.

You can use the \mi{...} command to force TeXZilla to interpret the argument as an identifier.

Operators

The following commands or Unicode characters are interpreted as operators (<mo> in MathML):

  • In general, all the sequence of characters from the MathML operator dictionary (modulo some changes to align with itex2MML commands below) and their associated AMS LaTeX commands to produce them, if any. This includes in particular some ASCII characters like +, -, !, = etc. Note however that the special backslash and brace operators should be written with the commands \backslash \{ \}.

  • Delimiters: (, ), [, ], /, \lbrace (= \{), \rbrace (= \}), \langle (= \lang = ), \rangle (= \rang = ), \llangle (= ), \rrangle (= ), \lceil (= ), \rceil (= ), \lmoustache (= ), \rmoustache (= ), \lfloor (= ), \rfloor (= ), \uparrow (= ), \downarrow (= ), \updownarrow (= ), \vert (= |), \Vert (= \| = ).

  • Arrows: \rightarrow (= \to = ), \longrightarrow (= ), \Rightarrow (= \implies = ), \hookrightarrow (= \embedsin = ), \mapsto (= \map = ), \leftarrow (= ), \longleftarrow (= ), \Leftarrow (= \impliedby = ), \hookleftarrow (= ), \leftrightarrow (= ), \Leftrightarrow (= ), \Longleftrightarrow (= \iff = ), \nearrow (= \nearr = ), \nwarrow (= \nwarr = ), \searrow (= \searr = ), \swarrow (= \swarr = ), \neArrow (= \neArr = ), \nwArrow (= \nwArr = ), \seArrow (= \seArr = ), \swArrow (= \swArr = ), \darr (= ), \Downarrow (= ), \uparr (= ), \Uparrow (= ), \downuparrow (= \duparr = \updarr = ), \Updownarrow (= ), \leftsquigarrow (= ), \rightsquigarrow (= ), \leftrightsquigarrow (= ), \upuparrows (= ), \rightleftarrows (= ), \rightrightarrows (= ), \dashleftarrow (= ), \dashrightarrow (= ), \curvearrowleft (= ), \curvearrowbotright (= ), \downdownarrows (= ), \leftleftarrows (= ), \leftrightarrows (= ), \righttoleftarrow (= ), \lefttorightarrow (= ), \circlearrowleft (= ), \circlearrowright (= ), \curvearrowright (= ), \leftarrowtail (= ), \rightarrowtail (= ), \leftrightsquigarrow (= ), \Lleftarrow (= ), \Rrightarrow (= ), \looparrowleft (= ), \looparrowright (= ), \Lsh (= ), \Rsh (= ), \twoheadleftarrow (= ), \twoheadrightarrow (= ), \nLeftarrow (= ), \nleftarrow (= ), \nLeftrightarrow (= ), \nleftrightarrow (= ), \nRightarrow (= ), \nrightarrow (= ), \leftharpoonup (= ), \leftharpoondown (= ), \rightharpoonup (= ), \rightharpoondown (= ), \downharpoonleft (= ), \downharpoonright (= ), \leftrightharpoons (= ), \rightleftharpoons (= ), \upharpoonleft (= ), \upharpoonright (= ).

  • Miscellaneous operators: \amalg (= ⨿), \angle (= ), \measuredangle (= ), \sphericalangle (= ), \approx (= ), \approxeq (= ), \thickapprox (= ), \ast (= ), \asymp (= ), \backslash, \because (= ), \between (= ), \bottom (= \bot = ), \boxminus (= \minusb = ), \boxplus (= \plusb = ), \boxtimes (= \timesb = ), \boxdot (= ), \bowtie (= ), \bullet (= ), \cap (= \intersection = ), \cup (= \union = ), \Cap (= ), \Cup (= ), \cdot (= ), \circledast (= ), \circledcirc (= ), \clubsuit (= ), \curlyvee (= ), \curlywedge (= ), \diamondsuit (= ), \divideontimes (= ), \dotplus (= ), \heartsuit (= ), \spadesuit (= ), \circ (= ), \bigcirc (= ), \cong (= ), \ncong (= ), \dagger (= ), \ddagger (= ), \dashv (= ), \Vdash (= ), \vDash (= ), \nvDash (= ), \VDash (= ), \nVDash (= ), \vdash (= ), \nvdash (= ), \Vvdash (= ), \Diamond (= ), \diamond (= ), \div (= ÷), \equiv (= ), \nequiv (= ), \eqcirc (= ), \neq (= \ne = ), \Bumpeq (= ), \bumpeq (= ), \circeq (= ), \doteq (= ), \doteqdot (= ), \fallingdotseq (= ), \risingdotseq (= ), \exists (= ), \nexists (= ), \flat (= ), \forall (= ), \frown (= ), \smallfrown (= ), \gt (= >), \ngtr (= ), \gg (= ), \ggg (= ), \geq (= \ge = ), \ngeq (= ), \geqq (= ), \ngeqq (= ⩾̸), \geqslant (= ), \ngeqslant (= ⩾̸), \eqslantgtr (= ), \gneq (= ), \gneqq (= ), \gnapprox (= ), \gnsim (= ), \gtrapprox (= ), \gtrsim (= ), \gtrdot (= ), \gtreqless (= ), \gtreqqless (= ), \gtrless (= ), \gvertneqq (= ≩︀), \in (= ), \notin (= ), \ni (= ), \notni (= ), \intercal (= ), \invamp ( = \parr = ), \lhd (= ), \unlhd (= ), \leftthreetimes (= ), \rightthreetimes (= ), \lt (= <), \nless (= ), \ll (= ), \lll (= ), \leq ( = \le = ), \nleq (= ), \leqq (= ), \nleqq (= ⩽̸), \leqslant (= ), \nleqslant (= ⩽̸), \eqslantless (= ), \lessapprox (= ), \lessdot (= ), \lesseqgtr (= ), \lesseqqgtr (= ), \lessgtr (= ), \lesssim (= ), \lnapprox (= ), \lneq (= ), \lneqq (= ), \lnsim (= ), \ltimes (= ), \lvertneqq (= ≨︀), \lozenge (= ), \blacklozenge (= ), \mid ( = \shortmid = ), \nmid (= ), \nshortmid (= ), \models (= ), \multimap (= ), \nabla ( = \Del = ), \natural (= ), \not ( = \neg = ¬), \odot (= ), \odash ( = \circleddash = ), \otimes (= ), \oplus (= ), \ominus (= ), \oslash (= ), \parallel (= ), \nparallel (= ), \shortparallel (= ), \nshortparallel (= ), \partial (= ), \Perp ( = \Vbar = ), \perp (= ), \pitchfork (= ), \pm (= ±), \mp (= ), \prec (= ), \nprec (= ), \precapprox (= ), \precnapprox (= ), \preceq (= ), \npreceq (= ⪯̸), \preccurlyeq (= ), \curlyeqprec (= ), \precsim (= ), \precnsim (= ), \propto (= ), \varpropto (= ), \rhd (= ), \unrhd (= ), \rtimes (= ), \setminus (= ), \smallsetminus (= ), \sharp (= ), \sim (= ), \nsim (= ), \backsim (= ), \simeq (= ), \backsimeq (= ), \thicksim (= ), \smile (= ), \smallsmile (= ), \sslash (= ), \subset (= ), \nsubset (= ), \subseteq (= ), \nsubseteq (= ), \subseteqq (= ), \nsubseteqq (= ), \subsetneq (= ), \subsetneqq (= ), \varsubsetneq (= ⊊︀), \varsubsetneqq (= ⫋︀), \Subset (= ), \succ (= ), \nsucc (= ), \succeq (= ), \nsucceq (= ⪰̸), \succapprox (= ), \succnapprox (= ), \succcurlyeq (= ), \curlyeqsucc (= ), \succsim (= ), \succnsim (= ), \supset (= ), \nsupset (= ), \supseteq (= ), \nsupseteq (= ), \supseteqq (= ), \supsetneq (= ), \supsetneqq (= ), \varsupsetneq (= ⊋︀), \varsupsetneqq (= ⫌︀), \Supset (= ), \square ( = \Box = □)), \blacksquare (= \qed = ), \sqcup (= ), \sqcap (= ), \sqsubset (= ), \sqsubseteq (= ), \sqsupset (= ), \sqsupseteq (= ), \star (= ), \bigstar (= ), \therefore (= ), \times (= ×), \top (= ), \triangle (= ), \triangledown (= ), \triangleleft (= ), \triangleright (= ), \blacktriangle (= ), \blacktriangledown (= ), \bigtriangleup (= ), \bigtriangledown (= ), \blacktriangleleft (= ), \blacktriangleright (= ), \ntriangleleft (= ), \ntriangleright (= ), \ntrianglelefteq (= ), \ntrianglerighteq (= ), \trianglelefteq (= ), \trianglerighteq (= ), \triangleq (= ), \vartriangleleft (= ), \vartriangleright (= ), \uplus (= ), \vee (= ), \veebar (= ), \wedge (= ), \barwedge (= ), \doublebarwedge (= ), \wr (= ), \coloneqq (= ), \Coloneqq (= ), \coloneq (= ), \Coloneq (= ∷−), \eqqcolon (= ), \Eqqcolon (= =∷), \eqcolon (= ), \Eqcolon (= −∷), \colonapprox (= ∶≈), \Colonapprox (= ∷≈), \colonsim (= ∶∼), \Colonsim (= ∷∼), \dblcolon (= ).

  • Dots: \dots (= \ldots = ), \cdots (= ), \ddots (= ), \udots (= ), \vdots (= ), \colon (= :).

  • Large Math Operators and Integrals: \bigcup (= \Union = ), \bigcap (= \Intersection = ), \bigodot (= ), \bigoplus (= \Oplus = ), \bigotimes (= \Otimes = ), \bigsqcup (= ), \bigsqcap (= ), \biginterleave (= ), \biguplus (= ), \bigwedge (= \Wedge = ), \bigvee (= \Vee = ), \coprod (= \coproduct = ), \prod (= \product = ), \sum (= ), \int (= \integral = ), \iint (= \doubleintegral = ), \iiint (= \tripleintegral = ), \iiiint (= \quadrupleintegral = ), \oint (= \conint = \contourintegral = ).

You can use the \mo{...} command to force TeXZilla to interpret the argument as an operator. Note that the spacing of operators is taken from the MathML operator dictionary and the default might not be good for custom operators. Hence you might instead want to try the commands \operatorname{...}, \mathop{...}, \mathbin{...}, \mathrel{...}` to define operators with spacing.

Main differences with itex2MML

You are invited to take a look at the itex2MML commands to get a good summary of what is supported. The main differences are the following:

  • By default, TeXZilla follows the LaTeX convention that xy is interpreted as two variable names while itex2MML instead focuses on making sin one single function name. However, you can use setItexIdentifierMode to modify TeXZilla's behavior and align on itex2MML's one. See also what is indicated on the itex2MML page.

  • TeXZilla was first designed to be used in Javascript programs while itex2MML was designed to be used as a stream filter for HTML markup. As a consequence some special characters used in HTML markup might be interpreted differently. For example, the itex2MML page indicates that a < b is not supported, while it is accepted by TeXZilla. Similarly, the \begin{svg} ... \end{svg} environment (and its associated \includegraphics command) that allows to embed SVG markup is not supported by TeXZilla

  • The space commands \rlap, \llap, \ulap and \dlap are not supported by TeXZilla. Note that they are not mentioned on the itex2MML page, where the \mathrlap, \mathllap and \mathclap commands are preferred.

  • The <maction> commands \fghilight, \fghighlight, \bghilight, and \bghighlight are not supported by TeXZilla. Note that they generate actiontype attributes that are not mentioned in the MathML3 specification and a fortiori not supported by MathML rendering engines.

  • TeXZilla supports arbitrary Unicode characters and tries to interpret their semantics using the information from the W3C's unicode.xml file and other custom rules (or otherwise fallback to <mtext>). You can always override this semantics using the commands \mi (mathematical identifier), \mn (number), \mo (operator), \ms (string) and \mtext (text). These are not supported by itex2MML, which operates on ASCII input instead.