Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 530 lines (391 sloc) 22.939 kb
acc918f9 »
2012-09-24 Initial import of djangobook from private SVN repo.
1 ====================
2 Chapter 19: Security
3 ====================
4
5 The Internet can be a scary place.
6
7 These days, high-profile security gaffes seem to crop up on a daily basis. We've
8 seen viruses spread with amazing speed, swarms of compromised computers wielded as
9 weapons, a never-ending arms race against spammers, and many, many reports of
10 identify theft from hacked Web sites.
11
12 As Web developers, we have a duty to do what we can to combat these forces
13 of darkness. Every Web developer needs to treat security as a fundamental
14 aspect of Web programming. Unfortunately, it turns out that implementing security is *hard*
15 -- attackers need to find only a single vulnerability, but defenders have to
16 protect every single one.
17
18 Django attempts to mitigate this difficulty. It's designed to automatically
19 protect you from many of the common security mistakes that new (and even
20 experienced) Web developers make. Still, it's important to understand what
21 these problems are, how Django protects you, and -- most important -- the
22 steps you can take to make your code even more secure.
23
24 First, though, an important disclaimer: We do not intend to present a
25 definitive guide to every known Web security exploit, and so we won't try to
26 explain each vulnerability in a comprehensive manner. Instead, we'll give a
27 short synopsis of security problems as they apply to Django.
28
29 The Theme of Web Security
30 =========================
31
32 If you learn only one thing from this chapter, let it be this:
33
34 Never -- under any circumstances -- trust data from the browser.
35
36 You *never* know who's on the other side of that HTTP connection. It might be
37 one of your users, but it just as easily could be a nefarious cracker looking
38 for an opening.
39
40 Any data of any nature that comes from the browser needs to be treated with a
41 healthy dose of paranoia. This includes data that's both "in band" (i.e.,
42 submitted from Web forms) and "out of band" (i.e., HTTP headers, cookies,
43 and other request information). It's trivial to spoof the request metadata that
44 browsers usually add automatically.
45
46 Every one of the vulnerabilities discussed in this chapter stems directly from
47 trusting data that comes over the wire and then failing to sanitize that data
48 before using it. You should make it a general practice to continuously ask,
49 "Where does this data come from?"
50
51 SQL Injection
52 =============
53
54 *SQL injection* is a common exploit in which an attacker alters Web page
55 parameters (such as ``GET``/``POST`` data or URLs) to insert arbitrary SQL
56 snippets that a naive Web application executes in its database directly. It's
57 probably the most dangerous -- and, unfortunately, one of the most common --
58 vulnerabilities out there.
59
60 This vulnerability most commonly crops up when constructing SQL "by hand" from
61 user input. For example, imagine writing a function to gather a list of
62 contact information from a contact search page. To prevent spammers from reading
63 every single email in our system, we'll force the user to type in someone's
64 username before providing her email address::
65
66 def user_contacts(request):
67 user = request.GET['username']
68 sql = "SELECT * FROM user_contacts WHERE username = '%s';" % username
69 # execute the SQL here...
70
71 .. note::
72
73 In this example, and all similar "don't do this" examples that follow,
74 we've deliberately left out most of the code needed to make the functions
75 actually work. We don't want this code to work if someone accidentally
76 takes it out of context.
77
78 Though at first this doesn't look dangerous, it really is.
79
80 First, our attempt at protecting our entire email list will fail with a
81 cleverly constructed query. Think about what happens if an attacker types ``"'
82 OR 'a'='a"`` into the query box. In that case, the query that the string
83 interpolation will construct will be::
84
85 SELECT * FROM user_contacts WHERE username = '' OR 'a' = 'a';
86
87 Because we allowed unsecured SQL into the string, the attacker's added ``OR``
88 clause ensures that every single row is returned.
89
90 However, that's the *least* scary attack. Imagine what will happen if the
91 attacker submits ``"'; DELETE FROM user_contacts WHERE 'a' = 'a'"``. We'll end
92 up with this complete query::
93
94 SELECT * FROM user_contacts WHERE username = ''; DELETE FROM user_contacts WHERE 'a' = 'a';
95
96 Yikes! Where'd our contact list go?
97
98 The Solution
99 ------------
100
101 Although this problem is insidious and sometimes hard to spot, the solution is
102 simple: *never* trust user-submitted data, and *always* escape it when passing
103 it into SQL.
104
105 The Django database API does this for you. It automatically escapes all
106 special SQL parameters, according to the quoting conventions of the database
107 server you're using (e.g., PostgreSQL or MySQL).
108
109 For example, in this API call::
110
111 foo.get_list(bar__exact="' OR 1=1")
112
113 Django will escape the input accordingly, resulting in a statement like this::
114
115 SELECT * FROM foos WHERE bar = '\' OR 1=1'
116
117 Completely harmless.
118
119 This applies to the entire Django database API, with a couple of exceptions:
120
121 * The ``where`` argument to the ``extra()`` method (see Appendix C).
122 That parameter accepts raw SQL by design.
123
124 * Queries done "by hand" using the lower-level database API.
125
126 In each of these cases, it's easy to keep yourself protected. In each case,
127 avoid string interpolation in favor of passing in *bind parameters*. That is,
128 the example we started this section with should be written as follows::
129
130 from django.db import connection
131
132 def user_contacts(request):
133 user = request.GET['username']
134 sql = "SELECT * FROM user_contacts WHERE username = %s;"
135 cursor = connection.cursor()
136 cursor.execute(sql, [user])
137 # ... do something with the results
138
139 The low-level ``execute`` method takes a SQL string with ``%s`` placeholders
140 and automatically escapes and inserts parameters from the list passed as the
141 second argument. You should *always* construct custom SQL this way.
142
143 Unfortunately, you can't use bind parameters everywhere in SQL; they're not
144 allowed as identifiers (i.e., table or column names). Thus, if you need to,
145 say, dynamically construct a list of tables from a ``POST`` variable, you'll
146 need to escape that name in your code. Django provides a function,
147 ``django.db.backend.quote_name``, which will escape the identifier according
148 to the current database's quoting scheme.
149
150 Cross-Site Scripting (XSS)
151 ==========================
152
153 *Cross-site scripting* (XSS), is found in Web applications that fail to
154 escape user-submitted content properly before rendering it into HTML. This
155 allows an attacker to insert arbitrary HTML into your Web page, usually in the
156 form of ``<script>`` tags.
157
158 Attackers often use XSS attacks to steal cookie and session information, or to trick
159 users into giving private information to the wrong person (aka *phishing*).
160
161 This type of attack can take a number of different forms and has almost
162 infinite permutations, so we'll just look at a typical example. Consider this
163 extremely simple "Hello, World" view::
164
165 def say_hello(request):
166 name = request.GET.get('name', 'world')
167 return render_to_response("hello.html", {"name" : name})
168
169 This view simply reads a name from a ``GET`` parameter and passes that name to
170 the ``hello.html`` template. We might write a template for this view as follows::
171
172 <h1>Hello, {{ name }}!</h1>
173
174 So if we accessed ``http://example.com/hello/name=Jacob``, the rendered page
175 would contain this::
176
177 <h1>Hello, Jacob!</h1>
178
179 But wait -- what happens if we access
180 ``http://example.com/hello/name=<i>Jacob</i>``? Then we get this::
181
182 <h1>Hello, <i>Jacob</i>!</h1>
183
184 Of course, an attacker wouldn't use something as benign as ``<i>`` tags; he
185 could include a whole set of HTML that hijacked your page with arbitrary
186 content. This type of attack has been used to trick users into entering data
187 into what looks like their bank's Web site, but in fact is an XSS-hijacked form
188 that submits their back account information to an attacker.
189
190 The problem gets worse if you store this data in the database and later display it
191 it on your site. For example, MySpace was once found to be vulnerable to an XSS
192 attack of this nature. A user inserted JavaScript into his profile that automatically
193 added him as your friend when you visited his profile page. Within a few days, he had
194 millions of friends.
195
196 Now, this may sound relatively benign, but keep in mind that this attacker
197 managed to get *his* code -- not MySpace's -- running on *your* computer. This
198 violates the assumed trust that all the code on MySpace is actually written
199 by MySpace.
200
201 MySpace was extremely lucky that this malicious code didn't automatically
202 delete viewers' accounts, change their passwords, flood the site with spam, or
203 any of the other nightmare scenarios this vulnerability unleashes.
204
205 The Solution
206 ------------
207
208 The solution is simple: *always* escape *any* content that might have come
209 from a user. If we simply rewrite our template as follows::
210
211 <h1>Hello, {{ name|escape }}!</h1>
212
213 then we're no longer vulnerable. You should *always* use the ``escape`` tag
214 (or something equivalent) when displaying user-submitted content on your site.
215
216 .. admonition:: Why Doesn't Django Just Do This for You?
217
218 Modifying Django to automatically escape all variables displayed in
219 templates is a frequent topic of discussion on the Django developer
220 mailing list.
221
222 So far, Django's templates have avoided this behavior because it subtly
223 changes what should be relatively straightforward behavior
224 (displaying variables). It's a tricky issue and a difficult tradeoff to
225 evaluate. Adding hidden implicit behavior is against Django's core ideals
226 (and Python's, for that matter), but security is equally important.
227
228 All this is to say, then, that there's a fair chance Django will grow
229 some form of auto-escaping (or nearly auto-escaping) behavior in the
230 future. It's a good idea to check the official Django documentation for the
231 latest in Django features; it will always be more up to date than this book,
232 especially the print edition.
233
234 Even if Django does add this feature, however, you should *still* be in
235 the habit of asking yourself, at all times, "Where does this data come from?" No
236 automatic solution will ever protect your site from XSS attacks 100% of
237 the time.
238
239 Cross-Site Request Forgery
240 ==========================
241
242 Cross-site request forgery (CSRF) happens when a malicious Web site tricks users
243 into unknowingly loading a URL from a site at which they're already authenticated --
244 hence taking advantage of their authenticated status.
245
246 Django has built-in tools to protect from this kind of attack. Both the attack
247 itself and those tools are covered in great detail in `Chapter 14`_.
248
249 .. _Chapter 14: ../chapter14/
250
251 Session Forging/Hijacking
252 =========================
253
254 This isn't a specific attack, but rather a general class of attacks on a
255 user's session data. It can take a number of different forms:
256
257 * A *man-in-the-middle* attack, where an attacker snoops on session data
258 as it travels over the wire (or wireless) network.
259
260 * *Session forging*, where an attacker uses a session ID
261 (perhaps obtained through a man-in-the-middle attack) to pretend to be
262 another user.
263
264 An example of these first two would be an attacker in a coffee shop using
265 the shop's wireless network to capture a session cookie. She could then use that
266 cookie to impersonate the original user.
267
268 * A *cookie-forging* attack, where an attacker overrides the supposedly
269 read-only data stored in a cookie. `Chapter 12`_ explains in detail how
270 cookies work, and one of the salient points is that it's trivial for
271 browsers and malicious users to change cookies without your knowledge.
272
273 There's a long history of Web sites that have stored a cookie like
274 ``IsLoggedIn=1`` or even ``LoggedInAsUser=jacob``. It's dead simple to
275 exploit these types of cookies.
276
277 On a more subtle level, though, it's never a good idea to trust anything
278 stored in cookies; you never know who's been poking at them.
279
280 * *Session fixation*, where an attacker tricks a user into setting or
281 reseting the user's session ID.
282
283 For example, PHP allows session identifiers to be passed in the URL
284 (e.g.,
285 ``http://example.com/?PHPSESSID=fa90197ca25f6ab40bb1374c510d7a32``). An
286 attacker who tricks a user into clicking a link with a hard-coded
287 session ID will cause the user to pick up that session.
288
289 Session fixation has been used in phishing attacks to trick users into entering
290 personal information into an account the attacker owns. He can
291 later log into that account and retrieve the data.
292
293 * *Session poisoning*, where an attacker injects potentially dangerous
294 data into a user's session -- usually through a Web form that the user
295 submits to set session data.
296
297 A canonical example is a site that stores a simple user preference (like
298 a page's background color) in a cookie. An attacker could trick a user
299 into clicking a link to submit a "color" that actually contains an
300 XSS attack; if that color isn't escaped, the user could again
301 inject malicious code into the user's environment.
302
303 .. _Chapter 12: ../chapter12/
304
305 The Solution
306 ------------
307
308 There are a number of general principles that can protect you from these attacks:
309
310 * Never allow session information to be contained in the URL.
311
312 Django's session framework (see `Chapter 12`_) simply doesn't allow
313 sessions to be contained in the URL.
314
315 * Don't store data in cookies directly; instead, store a session ID
316 that maps to session data stored on the back-end.
317
318 If you use Django's built-in session framework (i.e.,
319 ``request.session``), this is handled automatically for you. The only
320 cookie that the session framework uses is a single session ID; all the
321 session data is stored in the database.
322
323 * Remember to escape session data if you display it in the template. See
324 the earlier XSS section, and remember that it applies to any user-created
325 content as well as any data from the browser. You should treat session
326 information as being user created.
327
328 * Prevent attackers from spoofing session IDs whenever possible.
329
330 Although it's nearly impossible to detect someone who's hijacked a
331 session ID, Django does have built-in protection against a brute-force
332 session attack. Session IDs are stored as hashes (instead of sequential
333 numbers), which prevents a brute-force attack, and a user will always get
334 a new session ID if she tries a nonexistent one, which prevents session
335 fixation.
336
337 Notice that none of those principles and tools prevents man-in-the-middle
338 attacks. These types of attacks are nearly impossible to detect. If your site
339 allows logged-in users to see any sort of sensitive data, you should *always*
340 serve that site over HTTPS. Additionally, if you have an SSL-enabled site,
341 you should set the ``SESSION_COOKIE_SECURE`` setting to ``True``; this will
342 make Django only send session cookies over HTTPS.
343
344 Email Header Injection
345 =======================
346
347 SQL injection's less well-known sibling, *email header injection*, hijacks
348 Web forms that send email. An attacker can use this technique to send spam via
349 your mail server. Any form that constructs email headers from Web form data is
350 vulnerable to this kind of attack.
351
352 Let's look at the canonical contact form found on many sites. Usually this
353 sends a message to a hard-coded email address and, hence, doesn't appear
354 vulnerable to spam abuse at first glance.
355
356 However, most of these forms also allow the user to type in his own subject
357 for the email (along with a "from" address, body, and sometimes a few other
358 fields). This subject field is used to construct the "subject" header of the
359 email message.
360
361 If that header is unescaped when building the email message, an attacker could
362 submit something like ``"hello\ncc:spamvictim@example.com"`` (where ``"\n``" is
363 a newline character). That would make the constructed email headers turn into::
364
365 To: hardcoded@example.com
366 Subject: hello
367 cc: spamvictim@example.com
368
369 Like SQL injection, if we trust the subject line given by the user, we'll
370 allow him to construct a malicious set of headers, and he can use our
371 contact form to send spam.
372
373 The Solution
374 ------------
375
376 We can prevent this attack in the same way we prevent SQL injection: always
377 escape or validate user-submitted content.
378
379 Django's built-in mail functions (in ``django.core.mail``) simply do not allow
380 newlines in any fields used to construct headers (the from and to addresses,
381 plus the subject). If you try to use ``django.core.mail.send_mail`` with a
382 subject that contains newlines, Django will raise a ``BadHeaderError``
383 exception.
384
385 If you do not use Django's built-in mail functions to send email, you'll need
386 to make sure that newlines in headers either cause an error or are stripped.
387 You may want to examine the ``SafeMIMEText`` class in ``django.core.mail`` to
388 see how Django does this.
389
390 Directory Traversal
391 ===================
392
393 *Directory traversal* is another injection-style attack, wherein a malicious
394 user tricks filesystem code into reading and/or writing files that the Web
395 server shouldn't have access to.
396
397 An example might be a view that reads files from the disk without carefully
398 sanitizing the file name::
399
400 def dump_file(request):
401 filename = request.GET["filename"]
402 filename = os.path.join(BASE_PATH, filename)
403 content = open(filename).read()
404
405 # ...
406
407 Though it looks like that view restricts file access to files beneath
408 ``BASE_PATH`` (by using ``os.path.join``), if the attacker passes in a
409 ``filename`` containing ``..`` (that's two periods, a shorthand for
410 "the parent directory"), she can access files "above" ``BASE_PATH``. It's only
411 a matter of time before she can discover the correct number of dots to
412 successfully access, say, ``../../../../../etc/passwd``.
413
414 Anything that reads files without proper escaping is vulnerable to this
415 problem. Views that *write* files are just as vulnerable, but the consequences
416 are doubly dire.
417
418 Another permutation of this problem lies in code that dynamically loads
419 modules based on the URL or other request information. A well-publicized
420 example came from the world of Ruby on Rails. Prior to mid-2006,
421 Rails used URLs like ``http://example.com/person/poke/1`` directly to
422 load modules and call methods. The result was that a
423 carefully constructed URL could automatically load arbitrary code,
424 including a database reset script!
425
426 The Solution
427 ------------
428
429 If your code ever needs to read or write files based on user input, you need
430 to sanitize the requested path very carefully to ensure that an attacker isn't
431 able to escape from the base directory you're restricting access to.
432
433 .. note::
434
435 Needless to say, you should *never* write code that can read from any
436 area of the disk!
437
438 A good example of how to do this escaping lies in Django's built-in static
439 content-serving view (in ``django.views.static``). Here's the relevant code::
440
441 import os
442 import posixpath
443
444 # ...
445
446 path = posixpath.normpath(urllib.unquote(path))
447 newpath = ''
448 for part in path.split('/'):
449 if not part:
450 # strip empty path components
451 continue
452
453 drive, part = os.path.splitdrive(part)
454 head, part = os.path.split(part)
455 if part in (os.curdir, os.pardir):
456 # strip '.' and '..' in path
457 continue
458
459 newpath = os.path.join(newpath, part).replace('\\', '/')
460
461 Django doesn't read files (unless you use the ``static.serve``
462 function, but that's protected with the code just shown), so this
463 vulnerability doesn't affect the core code much.
464
465 In addition, the use of the URLconf abstraction means that Django will *never*
466 load code you've not explicitly told it to load. There's no way to create a
467 URL that causes Django to load something not mentioned in a URLconf.
468
469 Exposed Error Messages
470 ======================
471
472 During development, being able to see tracebacks and errors live in your
473 browser is extremely useful. Django has "pretty" and informative debug
474 messages specifically to make debugging easier.
475
476 However, if these errors get displayed once the site goes live, they can
477 reveal aspects of your code or configuration that could aid an attacker.
478
479 Furthermore, errors and tracebacks aren't at all useful to end users. Django's
480 philosophy is that site visitors should never see application-related error
481 messages. If your code raises an unhandled exception, a site visitor should
482 not see the full traceback -- or *any* hint of code snippets or Python
483 (programmer-oriented) error messages. Instead, the visitor should see a
484 friendly "This page is unavailable" message.
485
486 Naturally, of course, developers need to see tracebacks to debug problems in
487 their code. So the framework should hide all error messages from the public,
488 but it should display them to the trusted site developers.
489
490 The Solution
491 ------------
492
493 Django has a simple flag that controls the display of these error messages. If
494 the ``DEBUG`` setting is set to ``True``, error messages will be displayed in
495 the browser. If not, Django will render return an HTTP 500 ("Internal server
496 error") message and render an error template that you provide. This error
497 template is called ``500.html`` and should live in the root of one of your
498 template directories.
499
500 Because developers still need to see errors generated on a live site, any
501 errors handled this way will send an email with the full traceback to any
502 addresses given in the ``ADMINS`` setting.
503
504 Users deploying under Apache and mod_python should also make sure they have
505 ``PythonDebug Off`` in their Apache conf files; this will suppress any errors
506 that occur before Django has had a chance to load.
507
508 A Final Word on Security
509 ========================
510
511 We hope all this talk of security problems isn't too intimidating. It's true
512 that the Web can be a wild and wooly world, but with a little bit of foresight,
513 you can have a secure Web site.
514
515 Keep in mind that Web security is a constantly changing field; if you're
516 reading the dead-tree version of this book, be sure to check more up to date
517 security resources for any new vulnerabilities that have been discovered. In
518 fact, it's always a good idea to spend some time each week or month
519 researching and keeping current on the state of Web application security. It's
520 a small investment to make, but the protection you'll get for your site and
521 your users is priceless.
522
523 What's Next
524 ===========
525
526 In the `next chapter`_, we'll finally cover the subtleties of deploying Django:
527 how to launch a production site and how to set it up for scalability.
528
529 .. _Chapter 20: ../chapter20/
Something went wrong with that request. Please try again.