Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 976 lines (710 sloc) 37.819 kB
acc918f @jacobian Initial import of djangobook from private SVN repo.
jacobian authored
1 =======================================
2 Chapter 11: Generating Non-HTML Content
3 =======================================
4
5 Usually when we talk about developing Web sites, we're talking about producing
6 HTML. Of course, there's a lot more to the Web than HTML; we use the Web
7 to distribute data in all sorts of formats: RSS, PDFs, images, and so forth.
8
9 So far we've focused on the common case of HTML production, but in this chapter
10 we'll take a detour and look at using Django to produce other types of content.
11
12 Django has convenient built-in tools that you can use to produce some common
13 non-HTML content:
14
15 * RSS/Atom syndication feeds
16
17 * Sitemaps (an XML format originally developed by Google that gives hints to
18 search engines)
19
20 We'll examine each of those tools a little later on, but first we'll cover the basic principles.
21
22 The basics: views and MIME-types
23 ================================
24
25 Remember this from Chapter 3?
26
27 A view function, or *view* for short, is simply a Python function that takes
28 a Web request and returns a Web response. This response can be the HTML
29 contents of a Web page, or a redirect, or a 404 error, or an XML document,
30 or an image...or anything, really.
31
32 More formally, a Django view function *must*
33
34 * Accept an ``HttpRequest`` instance as its first argument
35
36 * Return an ``HttpResponse`` instance
37
38 The key to returning non-HTML content from a view lies in the ``HttpResponse``
39 class, specifically the ``mimetype`` constructor argument. By tweaking the MIME
40 type, we can indicate to the browser that we've returned a response of a
41 different format.
42
43 For example, let's look at a view that returns a PNG image. To
44 keep things simple, we'll just read the file off the disk::
45
46 from django.http import HttpResponse
47
48 def my_image(request):
49 image_data = open("/path/to/my/image.png", "rb").read()
50 return HttpResponse(image_data, mimetype="image/png")
51
52 That's it! If you replace the image path in the ``open()`` call with a path to
53 a real image, you can use this very simple view to serve an image, and the
54 browser will display it correctly.
55
56 The other important thing to keep in mind is that ``HttpResponse`` objects
57 implement Python's standard file API. This means that you can use an
58 ``HttpResponse`` instance in any place Python (or a third-party library) expects
59 a file.
60
61 For an example of how that works, let's take a look at producing CSV with
62 Django.
63
64 Producing CSV
65 =============
66
67 CSV is a simple data format usually used by spreadsheet software. It's basically
68 a series of table rows, with each cell in the row separated by a comma (CSV
69 stands for *comma-separated values*). For example, here's some data on "unruly"
70 airline passengers in CSV format::
71
72 Year,Unruly Airline Passengers
73 1995,146
74 1996,184
75 1997,235
76 1998,200
77 1999,226
78 2000,251
79 2001,299
80 2002,273
81 2003,281
82 2004,304
83 2005,203
84
85 .. note::
86
87 The preceding listing contains real numbers; they come courtesy of the US Federal
88 Aviation Administration. See
89 http://www.faa.gov/data_statistics/passengers_cargo/unruly_passengers/.
90
91 Though CSV looks simple, it's not a format that's ever been formally defined.
92 Different pieces of software produce and consume different variants of CSV,
93 making it a bit tricky to use. Luckily, Python comes with a standard CSV
94 library, ``csv``, that is pretty much bulletproof.
95
96 Because the ``csv`` module operates on file-like objects, it's a snap to use
97 an ``HttpResponse`` instead::
98
99 import csv
100 from django.http import HttpResponse
101
102 # Number of unruly passengers each year 1995 - 2005. In a real application
103 # this would likely come from a database or some other back-end data store.
104 UNRULY_PASSENGERS = [146,184,235,200,226,251,299,273,281,304,203]
105
106 def unruly_passengers_csv(request):
107 # Create the HttpResponse object with the appropriate CSV header.
108 response = HttpResponse(mimetype='text/csv')
109 response['Content-Disposition'] = 'attachment; filename=unruly.csv'
110
111 # Create the CSV writer using the HttpResponse as the "file"
112 writer = csv.writer(response)
113 writer.writerow(['Year', 'Unruly Airline Passengers'])
114 for (year, num) in zip(range(1995, 2006), UNRULY_PASSENGERS):
115 writer.writerow([year, num])
116
117 return response
118
119 The code and comments should be pretty clear, but a few things deserve special
120 mention:
121
122 * The response is given the ``text/csv`` MIME type (instead of the default
123 ``text/html``). This tells browsers that the document is a CSV file.
124
125 * The response gets an additional ``Content-Disposition`` header, which
126 contains the name of the CSV file. This header (well, the "attachment"
127 part) will instruct the browser to prompt for a location to save the
128 file (instead of just displaying it). This file name is arbitrary; call
129 it whatever you want. It will be used by browsers in the Save As
130 dialog.
131
132 * Hooking into the CSV-generation API is easy: just pass ``response`` as
133 the first argument to ``csv.writer``. The ``csv.writer`` function
134 expects a filelike object, and ``HttpResponse`` objects fit the bill.
135
136 * For each row in your CSV file, call ``writer.writerow``, passing it an
137 iterable object such as a list or tuple.
138
139 * The CSV module takes care of quoting for you, so you don't have to worry
140 about escaping strings with quotes or commas in them. Just pass
141 information to ``writerow()``, and it will do the right thing.
142
143 This is the general pattern you'll use any time you need to return non-HTML
144 content: create an ``HttpResponse`` response object (with a special MIME type),
145 pass it to something expecting a file, and then return the response.
146
147 Let's look at a few more examples.
148
149 Generating PDFs
150 ===============
151
152 Portable Document Format (PDF) is a format developed by Adobe that's used to
153 represent printable documents, complete with pixel-perfect formatting,
154 embedded fonts, and 2D vector graphics. You can think of a PDF document as the
155 digital equivalent of a printed document; indeed, PDFs are usually used when
156 you need to give a document to someone else to print.
157
158 You can easily generate PDFs with Python and Django thanks to the excellent
159 open source ReportLab library (http://www.reportlab.org/rl_toolkit.html).
160 The advantage of generating PDF files dynamically is that you can create
161 customized PDFs for different purposes -- say, for different users or
162 different pieces of content.
163
164 For example, we used Django and ReportLab at KUSports.com to generate
165 customized, printer-ready NCAA tournament brackets.
166
167 Installing ReportLab
168 --------------------
169
170 Before you do any PDF generation, however, you'll need to install ReportLab.
171 It's usually pretty simple: just download and install the library from
172 http://www.reportlab.org/downloads.html.
173
174 The user guide (naturally available only as a PDF file) at
175 http://www.reportlab.org/rsrc/userguide.pdf has additional installation
176 instructions.
177
178 .. note::
179
180 If you're using a modern Linux distribution, you might want to check your
181 package management utility before installing ReportLab. Most
182 package repositories have added ReportLab.
183
184 For example, if you're using the (excellent) Ubuntu distribution, a simple
185 ``apt-get install python-reportlab`` will do the trick nicely.
186
187 Test your installation by importing it in the Python interactive interpreter::
188
189 >>> import reportlab
190
191 If that command doesn't raise any errors, the installation worked.
192
193 Writing Your View
194 -----------------
195
196 Like CSV, generating PDFs dynamically with Django is easy because the ReportLab
197 API acts on filelike objects.
198
199 Here's a "Hello World" example::
200
201 from reportlab.pdfgen import canvas
202 from django.http import HttpResponse
203
204 def hello_pdf(request):
205 # Create the HttpResponse object with the appropriate PDF headers.
206 response = HttpResponse(mimetype='application/pdf')
207 response['Content-Disposition'] = 'attachment; filename=hello.pdf'
208
209 # Create the PDF object, using the response object as its "file."
210 p = canvas.Canvas(response)
211
212 # Draw things on the PDF. Here's where the PDF generation happens.
213 # See the ReportLab documentation for the full list of functionality.
214 p.drawString(100, 100, "Hello world.")
215
216 # Close the PDF object cleanly, and we're done.
217 p.showPage()
218 p.save()
219 return response
220
221 A few notes are in order:
222
223 * Here we use the ``application/pdf`` MIME type. This tells browsers that
224 the document is a PDF file, rather than an HTML file. If you leave off
225 this information, browsers will probably interpret the response as HTML,
226 which will result in scary gobbledygook in the browser window.
227
228 * Hooking into the ReportLab API is easy: just pass ``response`` as the
229 first argument to ``canvas.Canvas``. The ``Canvas`` class expects a
230 filelike object, and ``HttpResponse`` objects fit the bill.
231
232 * All subsequent PDF-generation methods are called on the PDF
233 object (in this case, ``p``), not on ``response``.
234
235 * Finally, it's important to call ``showPage()`` and ``save()`` on the PDF
236 file (or else you'll end up with a corrupted PDF file).
237
238 Complex PDFs
239 ------------
240
241 If you're creating a complex PDF document (or any large data blob), consider
242 using the ``cStringIO`` library as a temporary holding place for your PDF
243 file. The ``cStringIO`` library provides a file-like object interface that is
244 written in C for maximum efficiency.
245
246 Here's the previous "Hello World" example rewritten to use ``cStringIO``::
247
248 from cStringIO import StringIO
249 from reportlab.pdfgen import canvas
250 from django.http import HttpResponse
251
252 def hello_pdf(request):
253 # Create the HttpResponse object with the appropriate PDF headers.
254 response = HttpResponse(mimetype='application/pdf')
255 response['Content-Disposition'] = 'attachment; filename=hello.pdf'
256
257 temp = StringIO()
258
259 # Create the PDF object, using the StringIO object as its "file."
260 p = canvas.Canvas(temp)
261
262 # Draw things on the PDF. Here's where the PDF generation happens.
263 # See the ReportLab documentation for the full list of functionality.
264 p.drawString(100, 100, "Hello world.")
265
266 # Close the PDF object cleanly.
267 p.showPage()
268 p.save()
269
270 # Get the value of the StringIO buffer and write it to the response.
271 response.write(temp.getvalue())
272 return response
273
274 Other Possibilities
275 ===================
276
277 There's a whole host of other types of content you can generate in Python.
278 Here are a few more ideas and some pointers to libraries you could use to
279 implement them:
280
281 * *ZIP files*: Python's standard library ships with the
282 ``zipfile`` module, which can both read and write compressed ZIP files.
283 You could use it to provide on-demand archives of a bunch of files, or
284 perhaps compress large documents when requested. You could similarly
285 produce TAR files using the standard library ``tarfile`` module.
286
287 * *Dynamic images*: The Python Imaging Library
288 (PIL; http://www.pythonware.com/products/pil/) is a fantastic toolkit for
289 producing images (PNG, JPEG, GIF, and a whole lot more). You could use
290 it to automatically scale down images into thumbnails, composite
291 multiple images into a single frame, or even do Web-based image
292 processing.
293
294 * *Plots and charts*: There are a number of incredibly powerful Python
295 plotting and charting libraries you could use to produce on-demand maps,
296 charts, plots, and graphs. We can't possibly list them all, so here are
297 a couple of the highlights:
298
299 * ``matplotlib`` (http://matplotlib.sourceforge.net/) can be
300 used to produce the type of high-quality plots usually generated
301 with MatLab or Mathematica.
302
303 * ``pygraphviz`` (https://networkx.lanl.gov/wiki/pygraphviz), an
304 interface to the Graphviz graph layout toolkit
305 (http://graphviz.org/), can be used for generating structured diagrams of
306 graphs and networks.
307
308 In general, any Python library capable of writing to a file can be hooked into
309 Django. The possibilities really are endless.
310
311 Now that we've looked at the basics of generating non-HTML content, let's step
312 up a level of abstraction. Django ships with some pretty nifty built-in tools
313 for generating some common types of non-HTML content.
314
315 The Syndication Feed Framework
316 ==============================
317
318 Django comes with a high-level syndication-feed-generating framework that
319 makes creating RSS and Atom feeds easy.
320
321 .. admonition:: What's RSS? What's Atom?
322
323 RSS and Atom are both XML-based formats you can use to provide
324 automatically updating "feeds" of your site's content. Read more about RSS
325 at http://www.whatisrss.com/, and get information on Atom at
326 http://www.atomenabled.org/.
327
328 To create any syndication feed, all you have to do is write a short Python
329 class. You can create as many feeds as you want.
330
331 The high-level feed-generating framework is a view that's hooked to ``/feeds/``
332 by convention. Django uses the remainder of the URL (everything after
333 ``/feeds/``) to determine which feed to return.
334
335 To create a feed, you'll write a ``Feed`` class and point to it in your URLconf
336 (see Chapters 3 and 8 for more about URLconfs).
337
338 Initialization
339 --------------
340
341 To activate syndication feeds on your Django site, add this URLconf::
342
343 (r'^feeds/(?P<url>.*)/$',
344 'django.contrib.syndication.views.feed',
345 {'feed_dict': feeds}
346 ),
347
348 This line tells Django to use the RSS framework to handle all URLs starting with
349 ``"feeds/"``. (You can change that ``"feeds/"`` prefix to fit your own needs.)
350
351 This URLconf line has an extra argument: ``{'feed_dict': feeds}``. Use this
352 extra argument to pass the syndication framework the feeds that should be
353 published under that URL.
354
355 Specifically, ``feed_dict`` should be a dictionary that maps a feed's slug
356 (short URL label) to its ``Feed`` class. You can define the ``feed_dict``
357 in the URLconf itself. Here's a full example URLconf::
358
359 from django.conf.urls.defaults import *
360 from myproject.feeds import LatestEntries, LatestEntriesByCategory
361
362 feeds = {
363 'latest': LatestEntries,
364 'categories': LatestEntriesByCategory,
365 }
366
367 urlpatterns = patterns('',
368 # ...
369 (r'^feeds/(?P<url>.*)/$', 'django.contrib.syndication.views.feed',
370 {'feed_dict': feeds}),
371 # ...
372 )
373
374 The preceding example registers two feeds:
375
376 * The feed represented by ``LatestEntries`` will live at
377 ``feeds/latest/``.
378
379 * The feed represented by ``LatestEntriesByCategory`` will live at
380 ``feeds/categories/``.
381
382 Once that's set up, you'll need to define the ``Feed`` classes themselves.
383
384 A ``Feed`` class is a simple Python class that represents a syndication feed.
385 A feed can be simple (e.g., a "site news" feed, or a basic feed displaying the
386 latest entries of a blog) or more complex (e.g., a feed displaying all the
387 blog entries in a particular category, where the category is variable).
388
389 ``Feed`` classes must subclass ``django.contrib.syndication.feeds.Feed``. They
390 can live anywhere in your code tree.
391
392 A Simple Feed
393 -------------
394
395 This simple example, taken from chicagocrime.org, describes a feed of the
396 latest five news items::
397
398 from django.contrib.syndication.feeds import Feed
399 from chicagocrime.models import NewsItem
400
401 class LatestEntries(Feed):
402 title = "Chicagocrime.org site news"
403 link = "/sitenews/"
404 description = "Updates on changes and additions to chicagocrime.org."
405
406 def items(self):
407 return NewsItem.objects.order_by('-pub_date')[:5]
408
409 The important things to notice here are as follows:
410
411 * The class subclasses ``django.contrib.syndication.feeds.Feed``.
412
413 * ``title``, ``link``, and ``description`` correspond to the standard RSS
414 ``<title>``, ``<link>``, and ``<description>`` elements, respectively.
415
416 * ``items()`` is simply a method that returns a list of objects that
417 should be included in the feed as ``<item>`` elements. Although this
418 example returns ``NewsItem`` objects using Django's database API,
419 ``items()`` doesn't have to return model instances.
420
421 You do get a few bits of functionality "for free" by using Django
422 models, but ``items()`` can return any type of object you want.
423
424 There's just one more step. In an RSS feed, each ``<item>`` has a ``<title>``,
425 ``<link>``, and ``<description>``. We need to tell the framework what data to
426 put into those elements.
427
428 * To specify the contents of ``<title>`` and ``<description>``, create
429 Django templates (see Chapter 4) called ``feeds/latest_title.html`` and
430 ``feeds/latest_description.html``, where ``latest`` is the ``slug``
431 specified in the URLconf for the given feed. Note that the ``.html``
432 extension is required.
433
434 The RSS system renders that template for each item, passing it two
435 template context variables:
436
437 * ``obj``: The current object (one of whichever objects you
438 returned in ``items()``).
439
440 * ``site``: A ``django.models.core.sites.Site`` object representing the
441 current site. This is useful for ``{{ site.domain }}`` or ``{{
442 site.name }}``.
443
444 If you don't create a template for either the title or description, the
445 framework will use the template ``"{{ obj }}"`` by default -- that is,
446 the normal string representation of the object.
447
448 You can also change the names of these two templates by specifying
449 ``title_template`` and ``description_template`` as attributes of your
450 ``Feed`` class.
451
452 * To specify the contents of ``<link>``, you have two options. For each
453 item in ``items()``, Django first tries executing a
454 ``get_absolute_url()`` method on that object. If that method doesn't
455 exist, it tries calling a method ``item_link()`` in the ``Feed`` class,
456 passing it a single parameter, ``item``, which is the object itself.
457
458 Both ``get_absolute_url()`` and ``item_link()`` should return the item's
459 URL as a normal Python string.
460
461 * For the previous ``LatestEntries`` example, we could have very simple feed
462 templates. ``latest_title.html`` contains::
463
464 {{ obj.title }}
465
466 and ``latest_description.html`` contains::
467
468 {{ obj.description }}
469
470 It's almost *too* easy . . .
471
472 A More Complex Feed
473 -------------------
474
475 The framework also supports more complex feeds, via parameters.
476
477 For example, chicagocrime.org offers an RSS feed of recent crimes for every
478 police beat in Chicago. It would be silly to create a separate ``Feed`` class for
479 each police beat; that would violate the Don't Repeat Yourself (DRY) principle
480 and would couple data to programming logic.
481
482 Instead, the syndication framework lets you make generic
483 feeds that return items based on information in the feed's URL.
484
485 On chicagocrime.org, the police-beat feeds are accessible via URLs like this:
486
487 * ``http://www.chicagocrime.org/rss/beats/0613/``: Returns recent crimes
488 for beat 0613
489
490 * ``http://www.chicagocrime.org/rss/beats/1424/``: Returns recent crimes
491 for beat 1424
492
493 The slug here is ``"beats"``. The syndication framework sees the extra URL
494 bits after the slug -- ``0613`` and ``1424`` -- and gives you a hook to tell
495 it what those URL bits mean and how they should influence which items get
496 published in the feed.
497
498 An example makes this clear. Here's the code for these beat-specific feeds::
499
500 from django.core.exceptions import ObjectDoesNotExist
501
502 class BeatFeed(Feed):
503 def get_object(self, bits):
504 # In case of "/rss/beats/0613/foo/bar/baz/", or other such
505 # clutter, check that bits has only one member.
506 if len(bits) != 1:
507 raise ObjectDoesNotExist
508 return Beat.objects.get(beat__exact=bits[0])
509
510 def title(self, obj):
511 return "Chicagocrime.org: Crimes for beat %s" % obj.beat
512
513 def link(self, obj):
514 return obj.get_absolute_url()
515
516 def description(self, obj):
517 return "Crimes recently reported in police beat %s" % obj.beat
518
519 def items(self, obj):
520 crimes = Crime.objects.filter(beat__id__exact=obj.id)
521 return crimes.order_by('-crime_date')[:30]
522
523 Here's the basic algorithm the RSS framework, given this class and a
524 request to the URL ``/rss/beats/0613/``:
525
526 #. The framework gets the URL ``/rss/beats/0613/`` and notices there's an
527 extra bit of URL after the slug. It splits that remaining string by the
528 slash character (``"/"``) and calls the ``Feed`` class's
529 ``get_object()`` method, passing it the bits.
530
531 In this case, bits is ``['0613']``. For a request to
532 ``/rss/beats/0613/foo/bar/``, bits would be ``['0613', 'foo', 'bar']``.
533
534 #. ``get_object()`` is responsible for retrieving the given beat, from the
535 given ``bits``.
536
537 In this case, it uses the Django database API to
538 retrieve the beat. Note that ``get_object()`` should raise
539 ``django.core.exceptions.ObjectDoesNotExist`` if given invalid
540 parameters. There's no ``try``/``except`` around the
541 ``Beat.objects.get()`` call, because it's not necessary. That function
542 raises ``Beat.DoesNotExist`` on failure, and ``Beat.DoesNotExist`` is a
543 subclass of ``ObjectDoesNotExist``. Raising ``ObjectDoesNotExist`` in
544 ``get_object()`` tells Django to produce a 404 error for that request.
545
546 #. To generate the feed's ``<title>``, ``<link>``, and ``<description>``,
547 Django uses the ``title()``, ``link()``, and ``description()`` methods.
548 In the previous example, they were simple string class attributes, but
549 this example illustrates that they can be either strings *or* methods.
550 For each of ``title``, ``link``, and ``description``, Django follows
551 this algorithm:
552
553 #. It tries to call a method, passing the ``obj`` argument,
554 where ``obj`` is the object returned by ``get_object()``.
555
556 #. Failing that, it tries to call a method with no arguments.
557
558 #. Failing that, it uses the class attribute.
559
560 #. Finally, note that ``items()`` in this example also takes the ``obj``
561 argument. The algorithm for ``items`` is the same as described in the
562 previous step -- first, it tries ``items(obj)``, then ``items()``, and then
563 finally an ``items`` class attribute (which should be a list).
564
565 Full documentation of all the methods and attributes of the ``Feed`` classes is
566 always available from the official Django documentation
567 (http://www.djangoproject.com/documentation/0.96/syndication_feeds/).
568
569 Specifying the Type of Feed
570 -------------------------------
571
572 By default, the syndication framework produces RSS 2.0. To change that,
573 add a ``feed_type`` attribute to your ``Feed`` class::
574
575 from django.utils.feedgenerator import Atom1Feed
576
577 class MyFeed(Feed):
578 feed_type = Atom1Feed
579
580 Note that you set ``feed_type`` to a class object, not an instance. Currently
581 available feed types are shown in Table 11-1.
582
583 .. table:: Table 11-1. Feed Types
584
585 =================================================== =====================
586 Feed Class Format
587 =================================================== =====================
588 ``django.utils.feedgenerator.Rss201rev2Feed`` RSS 2.01 (default)
589
590 ``django.utils.feedgenerator.RssUserland091Feed`` RSS 0.91
591
592 ``django.utils.feedgenerator.Atom1Feed`` Atom 1.0
593 =================================================== =====================
594
595 Enclosures
596 ----------
597
598 To specify enclosures (i.e., media resources associated with a feed item such as
599 MP3 podcast feeds), use the ``item_enclosure_url``, ``item_enclosure_length``,
600 and ``item_enclosure_mime_type`` hooks, for example::
601
602 from myproject.models import Song
603
604 class MyFeedWithEnclosures(Feed):
605 title = "Example feed with enclosures"
606 link = "/feeds/example-with-enclosures/"
607
608 def items(self):
609 return Song.objects.all()[:30]
610
611 def item_enclosure_url(self, item):
612 return item.song_url
613
614 def item_enclosure_length(self, item):
615 return item.song_length
616
617 item_enclosure_mime_type = "audio/mpeg"
618
619 This assumes, of course, that you've created a ``Song`` object with ``song_url``
620 and ``song_length`` (i.e., the size in bytes) fields.
621
622 Language
623 --------
624
625 Feeds created by the syndication framework automatically include the
626 appropriate ``<language>`` tag (RSS 2.0) or ``xml:lang`` attribute (Atom).
627 This comes directly from your ``LANGUAGE_CODE`` setting.
628
629 URLs
630 ----
631
632 The ``link`` method/attribute can return either an absolute URL (e.g.,
633 ``"/blog/"``) or a URL with the fully qualified domain and protocol (e.g.,
634 ``"http://www.example.com/blog/"``). If ``link`` doesn't return the domain,
635 the syndication framework will insert the domain of the current site,
636 according to your ``SITE_ID`` setting.
637
638 Atom feeds require a ``<link rel="self">`` that defines the feed's current
639 location. The syndication framework populates this automatically, using the
640 domain of the current site according to the ``SITE_ID`` setting.
641
642 Publishing Atom and RSS Feeds in Tandem
643 -------------------------------------------
644
645 Some developers like to make available both Atom *and* RSS versions of their
646 feeds. That's easy to do with Django: just create a subclass of your ``feed``
647 class and set the ``feed_type`` to something different. Then update your
648 URLconf to add the extra versions. Here's a full example::
649
650 from django.contrib.syndication.feeds import Feed
651 from chicagocrime.models import NewsItem
652 from django.utils.feedgenerator import Atom1Feed
653
654 class RssSiteNewsFeed(Feed):
655 title = "Chicagocrime.org site news"
656 link = "/sitenews/"
657 description = "Updates on changes and additions to chicagocrime.org."
658
659 def items(self):
660 return NewsItem.objects.order_by('-pub_date')[:5]
661
662 class AtomSiteNewsFeed(RssSiteNewsFeed):
663 feed_type = Atom1Feed
664
665 And here's the accompanying URLconf::
666
667 from django.conf.urls.defaults import *
668 from myproject.feeds import RssSiteNewsFeed, AtomSiteNewsFeed
669
670 feeds = {
671 'rss': RssSiteNewsFeed,
672 'atom': AtomSiteNewsFeed,
673 }
674
675 urlpatterns = patterns('',
676 # ...
677 (r'^feeds/(?P<url>.*)/$', 'django.contrib.syndication.views.feed',
678 {'feed_dict': feeds}),
679 # ...
680 )
681
682 The Sitemap Framework
683 =====================
684
685 A *sitemap* is an XML file on your Web site that tells search engine indexers
686 how frequently your pages change and how "important" certain pages are in
687 relation to other pages on your site. This information helps search engines
688 index your site.
689
690 For example, here's a piece of the sitemap for Django's Web site
691 (http://www.djangoproject.com/sitemap.xml)::
692
693 <?xml version="1.0" encoding="UTF-8"?>
694 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
695 <url>
696 <loc>http://www.djangoproject.com/documentation/</loc>
697 <changefreq>weekly</changefreq>
698 <priority>0.5</priority>
699 </url>
700 <url>
701 <loc>http://www.djangoproject.com/documentation/0_90/</loc>
702 <changefreq>never</changefreq>
703 <priority>0.1</priority>
704 </url>
705 ...
706 </urlset>
707
708 For more on sitemaps, see http://www.sitemaps.org/.
709
710 The Django sitemap framework automates the creation of this XML file by
711 letting you express this information in Python code. To create a sitemap,
712 you just need to write a ``Sitemap`` class and point to it in your URLconf.
713
714 Installation
715 ------------
716
717 To install the sitemap application, follow these steps:
718
719 #. Add ``'django.contrib.sitemaps'`` to your ``INSTALLED_APPS`` setting.
720
721 #. Make sure
722 ``'django.template.loaders.app_directories.load_template_source'`` is
723 in your ``TEMPLATE_LOADERS`` setting. It's in there by default, so
724 you'll need to change this only if you've changed that setting.
725
726 #. Make sure you've installed the sites framework (see Chapter 14).
727
728 .. note::
729
730 The sitemap application doesn't install any database tables. The only
731 reason it needs to go into ``INSTALLED_APPS`` is so the
732 ``load_template_source`` template loader can find the default templates.
733
734 Initialization
735 --------------
736
737 To activate sitemap generation on your Django site, add this line to your
738 URLconf::
739
740 (r'^sitemap.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps})
741
742 This line tells Django to build a sitemap when a client accesses ``/sitemap.xml``.
743
744 The name of the sitemap file is not important, but the location is. Search
745 engines will only index links in your sitemap for the current URL level and
746 below. For instance, if ``sitemap.xml`` lives in your root directory, it may
747 reference any URL in your site. However, if your sitemap lives at
748 ``/content/sitemap.xml``, it may only reference URLs that begin with
749 ``/content/``.
750
751 The sitemap view takes an extra, required argument: ``{'sitemaps':
752 sitemaps}``. ``sitemaps`` should be a dictionary that maps a short section
753 label (e.g., ``blog`` or ``news``) to its ``Sitemap`` class (e.g.,
754 ``BlogSitemap`` or ``NewsSitemap``). It may also map to an *instance* of a
755 ``Sitemap`` class (e.g., ``BlogSitemap(some_var)``).
756
757 Sitemap Classes
758 ---------------
759
760 A ``Sitemap`` class is a simple Python class that represents a "section" of
761 entries in your sitemap. For example, one ``Sitemap`` class could represent
762 all the entries of your Weblog, while another could represent all of the
763 events in your events calendar.
764
765 In the simplest case, all these sections get lumped together into one
766 ``sitemap.xml``, but it's also possible to use the framework to generate a
767 sitemap index that references individual sitemap files, one per section
768 (as described shortly).
769
770 ``Sitemap`` classes must subclass ``django.contrib.sitemaps.Sitemap``. They
771 can live anywhere in your code tree.
772
773 For example, let's assume you have a blog system, with an ``Entry`` model, and
774 you want your sitemap to include all the links to your individual blog
775 entries. Here's how your ``Sitemap`` class might look::
776
777 from django.contrib.sitemaps import Sitemap
778 from mysite.blog.models import Entry
779
780 class BlogSitemap(Sitemap):
781 changefreq = "never"
782 priority = 0.5
783
784 def items(self):
785 return Entry.objects.filter(is_draft=False)
786
787 def lastmod(self, obj):
788 return obj.pub_date
789
790 Declaring a ``Sitemap`` should look very similar to declaring a ``Feed``;
791 that's by design.
792
793 Like ``Feed`` classes, ``Sitemap`` members can be either methods or
794 attributes. See the steps in the earlier "A Complex Example" section for more
795 about how this works.
796
797 A ``Sitemap`` class can define the following methods/attributes:
798
799 * ``items`` (**required**): Provides list of objects. The framework
800 doesn't care what *type* of objects they are; all that matters is that
801 these objects get passed to the ``location()``, ``lastmod()``,
802 ``changefreq()``, and ``priority()`` methods.
803
804 * ``location`` (optional): Gives the absolute URL for a given object.
805 Here, "absolute URL" means a URL that doesn't include the protocol or
806 domain. Here are some examples:
807
808 * Good: ``'/foo/bar/'``
809 * Bad: ``'example.com/foo/bar/'``
810 * Bad: ``'http://example.com/foo/bar/'``
811
812 If ``location`` isn't provided, the framework will call the
813 ``get_absolute_url()`` method on each object as returned by
814 ``items()``.
815
816 * ``lastmod`` (optional): The object's "last modification" date, as a
817 Python ``datetime`` object.
818
819 * ``changefreq`` (optional): How often the object changes. Possible values
820 (as given by the Sitemaps specification) are as follows:
821
822 * ``'always'``
823 * ``'hourly'``
824 * ``'daily'``
825 * ``'weekly'``
826 * ``'monthly'``
827 * ``'yearly'``
828 * ``'never'``
829
830 * ``priority`` (optional): A suggested indexing priority between ``0.0``
831 and ``1.0``. The default priority of a page is ``0.5``; see the
832 http://sitemaps.org documentation for more about how ``priority`` works.
833
834 Shortcuts
835 ---------
836
837 The sitemap framework provides a couple convenience classes for common cases. These
838 are described in the sections that follow.
839
840 FlatPageSitemap
841 ```````````````
842
843 The ``django.contrib.sitemaps.FlatPageSitemap`` class looks at all flat pages
844 defined for the current site and creates an entry in the sitemap. These
845 entries include only the ``location`` attribute -- not ``lastmod``,
846 ``changefreq``, or ``priority``.
847
848 See Chapter 14 for more about flat pages.
849
850 GenericSitemap
851 ``````````````
852
853 The ``GenericSitemap`` class works with any generic views (see Chapter 9) you
854 already have.
855
856 To use it, create an instance, passing in the same ``info_dict`` you pass to
857 the generic views. The only requirement is that the dictionary have a
858 ``queryset`` entry. It may also have a ``date_field`` entry that specifies a
859 date field for objects retrieved from the ``queryset``. This will be used for
860 the ``lastmod`` attribute in the generated sitemap. You may also pass
861 ``priority`` and ``changefreq`` keyword arguments to the ``GenericSitemap``
862 constructor to specify these attributes for all URLs.
863
864 Here's an example of a URLconf using both ``FlatPageSitemap`` and
865 ``GenericSiteMap`` (with the hypothetical ``Entry`` object from earlier)::
866
867 from django.conf.urls.defaults import *
868 from django.contrib.sitemaps import FlatPageSitemap, GenericSitemap
869 from mysite.blog.models import Entry
870
871 info_dict = {
872 'queryset': Entry.objects.all(),
873 'date_field': 'pub_date',
874 }
875
876 sitemaps = {
877 'flatpages': FlatPageSitemap,
878 'blog': GenericSitemap(info_dict, priority=0.6),
879 }
880
881 urlpatterns = patterns('',
882 # some generic view using info_dict
883 # ...
884
885 # the sitemap
886 (r'^sitemap.xml$',
887 'django.contrib.sitemaps.views.sitemap',
888 {'sitemaps': sitemaps})
889 )
890
891 Creating a Sitemap Index
892 ------------------------
893
894 The sitemap framework also has the ability to create a sitemap index that
895 references individual sitemap files, one per each section defined in your
896 ``sitemaps`` dictionary. The only differences in usage are as follows:
897
898 * You use two views in your URLconf:
899 ``django.contrib.sitemaps.views.index`` and
900 ``django.contrib.sitemaps.views.sitemap``.
901
902 * The ``django.contrib.sitemaps.views.sitemap`` view should take a
903 ``section`` keyword argument.
904
905 Here is what the relevant URLconf lines would look like for the previous example::
906
907 (r'^sitemap.xml$',
908 'django.contrib.sitemaps.views.index',
909 {'sitemaps': sitemaps}),
910
911 (r'^sitemap-(?P<section>.+).xml$',
912 'django.contrib.sitemaps.views.sitemap',
913 {'sitemaps': sitemaps})
914
915 This will automatically generate a ``sitemap.xml`` file that references both
916 ``sitemap-flatpages.xml`` and ``sitemap-blog.xml``. The ``Sitemap`` classes
917 and the ``sitemaps`` dictionary don't change at all.
918
919 Pinging Google
920 --------------
921
922 You may want to "ping" Google when your sitemap changes, to let it know to
923 reindex your site. The framework provides a function to do just that:
924 ``django.contrib.sitemaps.ping_google()``.
925
926 .. note::
927
928 At the time this book was written, only Google responded to sitemap pings.
929 However, it's quite likely that Yahoo and/or MSN will soon support
930 these pings as well.
931
932 At that time, we'll likely change the name of ``ping_google()`` to
933 something like ``ping_search_engines()``, so make sure to check the latest
934 sitemap documentation at
935 http://www.djangoproject.com/documentation/0.96/sitemaps/.
936
937 ``ping_google()`` takes an optional argument, ``sitemap_url``, which should be
938 the absolute URL of your site's sitemap (e.g., ``'/sitemap.xml'``). If this
939 argument isn't provided, ``ping_google()`` will attempt to figure out your
940 sitemap by performing a reverse lookup on your URLconf.
941
942 ``ping_google()`` raises the exception
943 ``django.contrib.sitemaps.SitemapNotFound`` if it cannot determine your
944 sitemap URL.
945
946 One useful way to call ``ping_google()`` is from a model's ``save()`` method::
947
948 from django.contrib.sitemaps import ping_google
949
950 class Entry(models.Model):
951 # ...
952 def save(self):
953 super(Entry, self).save()
954 try:
955 ping_google()
956 except Exception:
957 # Bare 'except' because we could get a variety
958 # of HTTP-related exceptions.
959 pass
960
961 A more efficient solution, however, would be to call ``ping_google()`` from a
962 ``cron`` script or some other scheduled task. The function makes an HTTP
963 request to Google's servers, so you may not want to introduce that network
964 overhead each time you call ``save()``.
965
966 What's Next?
967 ============
968
969 Next, we'll continue to dig deeper into all the nifty built-in tools Django
970 gives you. `Chapter 12`_ looks at all the tools you need to provide
971 user-customized sites: sessions, users, and authentication.
972
973 Onward!
974
975 .. _Chapter 12: ../chapter12/
Something went wrong with that request. Please try again.