Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 108 lines (83 sloc) 4.621 kb
387be22 @mitsuhiko Added unicode chapter to the docs. This fixes #67
authored
1 Unicode in Flask
2 ================
3
4 Flask like Jinja2 and Werkzeug is totally unicode based when it comes to
5 text. Not only these libraries, also the majority of web related Python
6 libraries that deal with text. If you don't know unicode so far, you
7 should probably read `The Absolute Minimum Every Software Developer
8 Absolutely, Positively Must Know About Unicode and Character Sets
9 <http://www.joelonsoftware.com/articles/Unicode.html>`_. This part of the
10 documentation just tries to cover the very basics so that you have a
31d3596 @florentx Fix typos
florentx authored
11 pleasant experience with unicode related things.
387be22 @mitsuhiko Added unicode chapter to the docs. This fixes #67
authored
12
13 Automatic Conversion
14 --------------------
15
16 Flask has a few assumptions about your application (which you can change
17 of course) that give you basic and painless unicode support:
18
19 - the encoding for text on your website is UTF-8
20 - internally you will always use unicode exclusively for text except
21 for literal strings with only ASCII character points.
22 - encoding and decoding happens whenever you are talking over a protocol
23 that requires bytes to be transmitted.
24
25 So what does this mean to you?
26
27 HTTP is based on bytes. Not only the protocol, also the system used to
28 address documents on servers (so called URIs or URLs). However HTML which
29 is usually transmitted on top of HTTP supports a large variety of
30 character sets and which ones are used, are transmitted in an HTTP header.
31 To not make this too complex Flask just assumes that if you are sending
32 unicode out you want it to be UTF-8 encoded. Flask will do the encoding
33 and setting of the appropriate headers for you.
34
35 The same is true if you are talking to databases with the help of
36 SQLAlchemy or a similar ORM system. Some databases have a protocol that
37 already transmits unicode and if they do not, SQLAlchemy or your other ORM
38 should take care of that.
39
40 The Golden Rule
41 ---------------
42
43 So the rule of thumb: if you are not dealing with binary data, work with
44 unicode. What does working with unicode in Python 2.x mean?
45
46 - as long as you are using ASCII charpoints only (basically numbers,
47 some special characters of latin letters without umlauts or anything
48 fancy) you can use regular string literals (``'Hello World'``).
49 - if you need anything else than ASCII in a string you have to mark
50 this string as unicode string by prefixing it with a lowercase `u`.
51 (like ``u'Hänsel und Gretel'``)
52 - if you are using non-unicode characters in your Python files you have
53 to tell Python which encoding your file uses. Again, I recommend
03c4bb4 @SimonSapin Typo fix.
SimonSapin authored
54 UTF-8 for this purpose. To tell the interpreter your encoding you can
387be22 @mitsuhiko Added unicode chapter to the docs. This fixes #67
authored
55 put the ``# -*- coding: utf-8 -*-`` into the first or second line of
56 your Python source file.
31d3596 @florentx Fix typos
florentx authored
57 - Jinja is configured to decode the template files from UTF-8. So make
bc662a5 @mitsuhiko Added a section about unicode and editors. This fixes #74
authored
58 sure to tell your editor to save the file as UTF-8 there as well.
387be22 @mitsuhiko Added unicode chapter to the docs. This fixes #67
authored
59
60 Encoding and Decoding Yourself
61 ------------------------------
62
63 If you are talking with a filesystem or something that is not really based
64 on unicode you will have to ensure that you decode properly when working
65 with unicode interface. So for example if you want to load a file on the
31d3596 @florentx Fix typos
florentx authored
66 filesystem and embed it into a Jinja2 template you will have to decode it
67 from the encoding of that file. Here the old problem that text files do
387be22 @mitsuhiko Added unicode chapter to the docs. This fixes #67
authored
68 not specify their encoding comes into play. So do yourself a favour and
31d3596 @florentx Fix typos
florentx authored
69 limit yourself to UTF-8 for text files as well.
387be22 @mitsuhiko Added unicode chapter to the docs. This fixes #67
authored
70
31d3596 @florentx Fix typos
florentx authored
71 Anyways. To load such a file with unicode you can use the built-in
387be22 @mitsuhiko Added unicode chapter to the docs. This fixes #67
authored
72 :meth:`str.decode` method::
73
74 def read_file(filename, charset='utf-8'):
75 with open(filename, 'r') as f:
76 return f.read().decode(charset)
77
78 To go from unicode into a specific charset such as UTF-8 you can use the
79 :meth:`unicode.encode` method::
80
81 def write_file(filename, contents, charset='utf-8'):
82 with open(filename, 'w') as f:
83 f.write(contents.encode(charset))
bc662a5 @mitsuhiko Added a section about unicode and editors. This fixes #74
authored
84
85 Configuring Editors
86 -------------------
87
88 Most editors save as UTF-8 by default nowadays but in case your editor is
89 not configured to do this you have to change it. Here some common ways to
90 set your editor to store as UTF-8:
91
92 - Vim: put ``set enc=utf-8`` to your ``.vimrc`` file.
93
94 - Emacs: either use an encoding cookie or put this into your ``.emacs``
95 file::
96
97 (prefer-coding-system 'utf-8)
98 (setq default-buffer-file-coding-system 'utf-8)
99
100 - Notepad++:
101
102 1. Go to *Settings -> Preferences ...*
103 2. Select the "New Document/Default Directory" tab
104 3. Select "UTF-8 without BOM" as encoding
105
106 It is also recommended to use the Unix newline format, you can select
31d3596 @florentx Fix typos
florentx authored
107 it in the same panel but this is not a requirement.
Something went wrong with that request. Please try again.