Permalink
Cannot retrieve contributors at this time
Fetching contributors…
| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" | |
| "http://www.w3.org/TR/html4/strict.dtd"> | |
| <html> | |
| <head> | |
| <title>Unicode and Encodings — Pygments</title> | |
| <meta http-equiv="content-type" content="text/html; charset=utf-8"> | |
| <style type="text/css"> | |
| body { | |
| background-color: #f2f2f2; | |
| margin: 0; | |
| padding: 0; | |
| font-family: 'Georgia', serif; | |
| color: #111; | |
| } | |
| #content { | |
| background-color: white; | |
| padding: 20px; | |
| margin: 20px auto 20px auto; | |
| max-width: 800px; | |
| border: 4px solid #ddd; | |
| } | |
| h1 { | |
| font-weight: normal; | |
| font-size: 40px; | |
| color: #09839A; | |
| } | |
| h2 { | |
| font-weight: normal; | |
| font-size: 30px; | |
| color: #C73F00; | |
| } | |
| h1.heading { | |
| margin: 0 0 30px 0; | |
| } | |
| h2.subheading { | |
| margin: -30px 0 0 45px; | |
| } | |
| h3 { | |
| margin-top: 30px; | |
| } | |
| table.docutils { | |
| border-collapse: collapse; | |
| border: 2px solid #aaa; | |
| margin: 0.5em 1.5em 0.5em 1.5em; | |
| } | |
| table.docutils td { | |
| padding: 2px; | |
| border: 1px solid #ddd; | |
| } | |
| p, li, dd, dt, blockquote { | |
| font-size: 15px; | |
| color: #333; | |
| } | |
| p { | |
| line-height: 150%; | |
| margin-bottom: 0; | |
| margin-top: 10px; | |
| } | |
| hr { | |
| border-top: 1px solid #ccc; | |
| border-bottom: 0; | |
| border-right: 0; | |
| border-left: 0; | |
| margin-bottom: 10px; | |
| margin-top: 20px; | |
| } | |
| dl { | |
| margin-left: 10px; | |
| } | |
| li, dt { | |
| margin-top: 5px; | |
| } | |
| dt { | |
| font-weight: bold; | |
| } | |
| th { | |
| text-align: left; | |
| } | |
| a { | |
| color: #990000; | |
| } | |
| a:hover { | |
| color: #c73f00; | |
| } | |
| pre { | |
| background-color: #f9f9f9; | |
| border-top: 1px solid #ccc; | |
| border-bottom: 1px solid #ccc; | |
| padding: 5px; | |
| font-size: 13px; | |
| font-family: Bitstream Vera Sans Mono,monospace; | |
| } | |
| tt { | |
| font-size: 13px; | |
| font-family: Bitstream Vera Sans Mono,monospace; | |
| color: black; | |
| padding: 1px 2px 1px 2px; | |
| background-color: #f0f0f0; | |
| } | |
| cite { | |
| /* abusing <cite>, it's generated by ReST for `x` */ | |
| font-size: 13px; | |
| font-family: Bitstream Vera Sans Mono,monospace; | |
| font-weight: bold; | |
| font-style: normal; | |
| } | |
| #backlink { | |
| float: right; | |
| font-size: 11px; | |
| color: #888; | |
| } | |
| div.toc { | |
| margin: 0 0 10px 0; | |
| } | |
| div.toc h2 { | |
| font-size: 20px; | |
| } | |
| .syntax .hll { background-color: #ffffcc } | |
| .syntax { background: #ffffff; } | |
| .syntax .c { color: #888888 } /* Comment */ | |
| .syntax .err { color: #a61717; background-color: #e3d2d2 } /* Error */ | |
| .syntax .k { color: #008800; font-weight: bold } /* Keyword */ | |
| .syntax .cm { color: #888888 } /* Comment.Multiline */ | |
| .syntax .cp { color: #cc0000; font-weight: bold } /* Comment.Preproc */ | |
| .syntax .c1 { color: #888888 } /* Comment.Single */ | |
| .syntax .cs { color: #cc0000; font-weight: bold; background-color: #fff0f0 } /* Comment.Special */ | |
| .syntax .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */ | |
| .syntax .ge { font-style: italic } /* Generic.Emph */ | |
| .syntax .gr { color: #aa0000 } /* Generic.Error */ | |
| .syntax .gh { color: #333333 } /* Generic.Heading */ | |
| .syntax .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */ | |
| .syntax .go { color: #888888 } /* Generic.Output */ | |
| .syntax .gp { color: #555555 } /* Generic.Prompt */ | |
| .syntax .gs { font-weight: bold } /* Generic.Strong */ | |
| .syntax .gu { color: #666666 } /* Generic.Subheading */ | |
| .syntax .gt { color: #aa0000 } /* Generic.Traceback */ | |
| .syntax .kc { color: #008800; font-weight: bold } /* Keyword.Constant */ | |
| .syntax .kd { color: #008800; font-weight: bold } /* Keyword.Declaration */ | |
| .syntax .kn { color: #008800; font-weight: bold } /* Keyword.Namespace */ | |
| .syntax .kp { color: #008800 } /* Keyword.Pseudo */ | |
| .syntax .kr { color: #008800; font-weight: bold } /* Keyword.Reserved */ | |
| .syntax .kt { color: #888888; font-weight: bold } /* Keyword.Type */ | |
| .syntax .m { color: #0000DD; font-weight: bold } /* Literal.Number */ | |
| .syntax .s { color: #dd2200; background-color: #fff0f0 } /* Literal.String */ | |
| .syntax .na { color: #336699 } /* Name.Attribute */ | |
| .syntax .nb { color: #003388 } /* Name.Builtin */ | |
| .syntax .nc { color: #bb0066; font-weight: bold } /* Name.Class */ | |
| .syntax .no { color: #003366; font-weight: bold } /* Name.Constant */ | |
| .syntax .nd { color: #555555 } /* Name.Decorator */ | |
| .syntax .ne { color: #bb0066; font-weight: bold } /* Name.Exception */ | |
| .syntax .nf { color: #0066bb; font-weight: bold } /* Name.Function */ | |
| .syntax .nl { color: #336699; font-style: italic } /* Name.Label */ | |
| .syntax .nn { color: #bb0066; font-weight: bold } /* Name.Namespace */ | |
| .syntax .py { color: #336699; font-weight: bold } /* Name.Property */ | |
| .syntax .nt { color: #bb0066; font-weight: bold } /* Name.Tag */ | |
| .syntax .nv { color: #336699 } /* Name.Variable */ | |
| .syntax .ow { color: #008800 } /* Operator.Word */ | |
| .syntax .w { color: #bbbbbb } /* Text.Whitespace */ | |
| .syntax .mf { color: #0000DD; font-weight: bold } /* Literal.Number.Float */ | |
| .syntax .mh { color: #0000DD; font-weight: bold } /* Literal.Number.Hex */ | |
| .syntax .mi { color: #0000DD; font-weight: bold } /* Literal.Number.Integer */ | |
| .syntax .mo { color: #0000DD; font-weight: bold } /* Literal.Number.Oct */ | |
| .syntax .sb { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Backtick */ | |
| .syntax .sc { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Char */ | |
| .syntax .sd { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Doc */ | |
| .syntax .s2 { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Double */ | |
| .syntax .se { color: #0044dd; background-color: #fff0f0 } /* Literal.String.Escape */ | |
| .syntax .sh { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Heredoc */ | |
| .syntax .si { color: #3333bb; background-color: #fff0f0 } /* Literal.String.Interpol */ | |
| .syntax .sx { color: #22bb22; background-color: #f0fff0 } /* Literal.String.Other */ | |
| .syntax .sr { color: #008800; background-color: #fff0ff } /* Literal.String.Regex */ | |
| .syntax .s1 { color: #dd2200; background-color: #fff0f0 } /* Literal.String.Single */ | |
| .syntax .ss { color: #aa6600; background-color: #fff0f0 } /* Literal.String.Symbol */ | |
| .syntax .bp { color: #003388 } /* Name.Builtin.Pseudo */ | |
| .syntax .vc { color: #336699 } /* Name.Variable.Class */ | |
| .syntax .vg { color: #dd7700 } /* Name.Variable.Global */ | |
| .syntax .vi { color: #3333bb } /* Name.Variable.Instance */ | |
| .syntax .il { color: #0000DD; font-weight: bold } /* Literal.Number.Integer.Long */ | |
| </style> | |
| </head> | |
| <body> | |
| <div id="content"> | |
| <h1 class="heading">Pygments</h1> | |
| <h2 class="subheading">Unicode and Encodings</h2> | |
| <a id="backlink" href="index.html">« Back To Index</a> | |
| <p>Since Pygments 0.6, all lexers use unicode strings internally. Because of that | |
| you might encounter the occasional <cite>UnicodeDecodeError</cite> if you pass strings with the | |
| wrong encoding.</p> | |
| <p>Per default all lexers have their input encoding set to <cite>latin1</cite>. | |
| If you pass a lexer a string object (not unicode), it tries to decode the data | |
| using this encoding. | |
| You can override the encoding using the <cite>encoding</cite> lexer option. If you have the | |
| <a class="reference external" href="http://chardet.feedparser.org/">chardet</a> library installed and set the encoding to <tt class="docutils literal">chardet</tt> if will ananlyse | |
| the text and use the encoding it thinks is the right one automatically:</p> | |
| <div class="syntax"><pre><span class="kn">from</span> <span class="nn">pygments.lexers</span> <span class="kn">import</span> <span class="n">PythonLexer</span> | |
| <span class="n">lexer</span> <span class="o">=</span> <span class="n">PythonLexer</span><span class="p">(</span><span class="n">encoding</span><span class="o">=</span><span class="s">'chardet'</span><span class="p">)</span> | |
| </pre></div> | |
| <p>The best way is to pass Pygments unicode objects. In that case you can't get | |
| unexpected output.</p> | |
| <p>The formatters now send Unicode objects to the stream if you don't set the | |
| output encoding. You can do so by passing the formatters an <cite>encoding</cite> option:</p> | |
| <div class="syntax"><pre><span class="kn">from</span> <span class="nn">pygments.formatters</span> <span class="kn">import</span> <span class="n">HtmlFormatter</span> | |
| <span class="n">f</span> <span class="o">=</span> <span class="n">HtmlFormatter</span><span class="p">(</span><span class="n">encoding</span><span class="o">=</span><span class="s">'utf-8'</span><span class="p">)</span> | |
| </pre></div> | |
| <p><strong>You will have to set this option if you have non-ASCII characters in the | |
| source and the output stream does not accept Unicode written to it!</strong> | |
| This is the case for all regular files and for terminals.</p> | |
| <p>Note: The Terminal formatter tries to be smart: if its output stream has an | |
| <cite>encoding</cite> attribute, and you haven't set the option, it will encode any | |
| Unicode string with this encoding before writing it. This is the case for | |
| <cite>sys.stdout</cite>, for example. The other formatters don't have that behavior.</p> | |
| <p>Another note: If you call Pygments via the command line (<cite>pygmentize</cite>), | |
| encoding is handled differently, see <a class="reference external" href="./cmdline.html">the command line docs</a>.</p> | |
| <p><em>New in Pygments 0.7</em>: the formatters now also accept an <cite>outencoding</cite> option | |
| which will override the <cite>encoding</cite> option if given. This makes it possible to | |
| use a single options dict with lexers and formatters, and still have different | |
| input and output encodings.</p> | |
| </div> | |
| </body> | |
| <!-- generated on: 2013-01-09 17:48:43.988472 | |
| file id: unicode --> | |
| </html> |