jkeyes/diveintopython3 forked from diveintomark/diveintopython3

added a few asides

• Loading branch information...
Mark Pilgrim committed Sep 28, 2009
1 parent 3e0cb2a commit 727c1494afdbc52c622c0c9667c86940988635f0
Showing with 46 additions and 3 deletions.
1. +7 −1 advanced-iterators.html
2. +6 −0 comprehensions.html
3. +3 −0 dip3.css
4. +13 −1 files.html
5. +17 −1 http-web-services.html
 @@ -38,7 +38,7 @@

Diving In

Puzzles like this are called cryptarithms or alphametics. The letters spell out actual words, but if you replace each letter with a digit from 0–9, it also “spells” an arithmetic equation. The trick is to figure out which letter maps to each digit. All the occurrences of each letter must map to the same digit, no digit can be repeated, and no “word” can start with the digit 0. -

The most well-known alphametic puzzle is SEND + MORE = MONEY. +

In this chapter, we’ll dive into an incredible Python program originally written by Raymond Hettinger. This program solves alphametic puzzles in just 14 lines of code. @@ -107,6 +107,8 @@

Finding all occurrences of a pattern

>>> re.findall(' s.*? s', "The sixth sick sheikh's sixth sheep's sick.") [' sixth s', " sheikh's s", " sheep's s"] + +

Surprised? The regular expression looks for a space, an s, and then the shortest possible series of any character (.*?), then a space, then another s. Well, looking at that input string, I see five matches:

@@ -258,6 +260,8 @@

Calculating Permutations… The Lazy Way!

1. That’s it! Those are all the permutations of [1, 2, 3] taken 2 at a time. Pairs like (1, 1) and (2, 2) never show up, because they contain repeats so they aren’t valid permutations. When there are no more permutations, the iterator raises a StopIteration exception.
+ +

The permutations() function doesn’t have to take a list. It can take any sequence — even a string.

@@ -438,6 +442,8 @@

A New Kind Of String Manipulation

• The translate() method on a string takes a translation table and runs the string through it. That is, it replaces all occurrences of the keys of the translation table with the corresponding values. In this case, “translating” MARK to MORK. + +

What does this have to do with solving alphametic puzzles? As it turns out, everything.

•  @@ -34,6 +34,8 @@

The Current Working Directory

• Explain the result + +

If you don’t know about the current working directory, step 1 will probably fail with an ImportError. Why? Because Python will look for the example module in the import search path, but it won’t find it because the examples folder isn’t one of the directories in the search path. To get past this, you can do one of two things:

@@ -108,6 +110,8 @@

Listing Directories

The glob module is another tool in the Python standard library. It’s an easy way to get the contents of a directory programmatically, and it uses the sort of wildcards that you may already be familiar with from working on the command line. +

+
>>> os.chdir('/Users/pilgrim/diveintopython3/')
>>> import glob
@@ -189,6 +193,8 @@

Constructing Absolute Pathnames

List Comprehensions

+ +

A list comprehension provides a compact way of mapping a list into another list by applying a function to each of the elements of the list.

•  @@ -345,6 +345,9 @@ aside.ots { aside a { color: #fff !important; } +aside code { + font-size: inherit; +} /* previous/next navigation links */
 @@ -55,6 +55,8 @@

Character Encoding Rears Its Ugly Head

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 28: character maps to <undefined> >>> + +

What just happened? You didn’t specify a character encoding, so Python is forced to use the default encoding. What’s the default encoding? If you look closely at the traceback, you can see that it’s dying in cp1252.py, meaning that Python is using CP-1252 as the default encoding here. (CP-1252 is a common encoding on computers running Microsoft Windows.) The CP-1252 character set doesn’t support the characters that are in this file, so the read fails with an ugly UnicodeDecodeError.

But wait, it’s worse than that! The default encoding is platform-dependent, so this code might work on your computer (if your default encoding is UTF-8), but then it will fail when you distribute it to someone else (whose default encoding is different, like CP-1252). @@ -101,6 +103,8 @@

Reading Data From A Text File

• Perhaps somewhat surprisingly, reading the file again does not raise an exception. Python does not consider reading past end-of-file to be an error; it simply returns an empty string. + +

What if you want to re-read a file?

@@ -202,6 +206,8 @@

Closing Files

Closing Files Automatically

+ +

Stream objects have an explicit close() method, but what happens if your code has a bug and crashes before you call close()? That file could theoretically stay open for much longer than necessary. While you’re debugging on your local computer, that’s not a big deal. On a production server, maybe it is.

Python 2 had a solution for this: the try..finally block. That still works in Python 3, and you may see it in other people’s code or in older code that was ported to Python 3. But Python 2.5 introduced a cleaner solution, which is now the preferred solution in Python 3: the with statement. @@ -275,6 +281,8 @@

Reading Data One Line At A Time

Writing to Text Files

+ +

You can write to files in much the same way that you read from them. First you open a file and get a stream object, then you use methods on the stream object to write data to the file, then you close the file.

To open a file for writing, use the open() function and specify the write mode. There are two file modes for writing: @@ -361,7 +369,9 @@

Binary Files

⁂ -

Streams Objects From Non-File Sources

+

Stream Objects From Non-File Sources

+ +

Imagine you’re writing a library, and one of your library functions is going to read some data from a file. The function could simply take a filename as a string, go open the file for reading, read it, and close it before exiting. But you shouldn’t do that. Instead, your API should take an arbitrary stream object. @@ -433,6 +443,8 @@

Handling Compressed Files

Standard Input, Output, and Error

+ +

Command-line gurus are already familiar with the concept of standard input, standard output, and standard error. This section is for the rest of you.

Standard output and standard error (commonly abbreviated stdout and stderr) are pipes that are built into every UNIX-like system, including Mac OS X and Linux. When you call the print() function, the thing you’re printing is sent to the stdout pipe. When your program crashes and prints out a traceback, it goes to the stderr pipe. By default, both of these pipes are just connected to the terminal window where you are working; when your program prints something, you see the output in your terminal window, and when a program crashes, you see the traceback in your terminal window too. In the graphical Python Shell, the stdout and stderr pipes default to your “Interactive Window”.

•  @@ -52,6 +52,8 @@

Caching

The most important thing to understand about any type of web service is that network access is incredibly expensive. I don’t mean “dollars and cents” expensive (although bandwidth ain’t free). I mean that it takes an extraordinary long time to open a connection, send a request, and retrieve a response from a remote server. Even on the fastest broadband connection, latency (the time it takes to send a request and start retrieving data in a response) can still be higher than you anticipated. A router misbehaves, a packet is dropped, an intermediate proxy is under attack — there’s never a dull moment on the public internet, and there may be nothing you can do about it. +

+

HTTP is designed with caching in mind. There is an entire class of devices (called “caching proxies”) whose only job is to sit between you and the rest of the world and minimize network access. Your company or ISP almost certainly maintains caching proxies, even if you’re unaware of them. They work because caching built into the HTTP protocol.

Here’s a concrete example of how caching works. You visit diveintomark.org in your browser. That page includes a background image, wearehugh.com/m.jpg. When your browser downloads that image, the server includes the following HTTP headers: @@ -82,6 +84,8 @@

Last-Modified Checking

Some data never changes, while other data changes all the time. In between, there is a vast field of data that might have changed, but hasn’t. CNN.com’s feed is updated every few minutes, but my weblog’s feed may not change for days or weeks at a time. In the latter case, I don’t want to tell clients to cache my feed for weeks at a time, because then when I do actually post something, people may not read it for weeks (because they’re respecting my cache headers which said “don’t bother checking this feed for weeks”). On the other hand, I don’t want clients downloading my entire feed once an hour if it hasn’t changed! +

+

HTTP has a solution to this, too. When you request data for the first time, the server can send back a Last-Modified header. This is exactly what it sounds like: the date that the data was changed. That background image referenced from diveintomark.org included a Last-Modified header.

HTTP/1.1 200 OK
@@ -130,7 +134,9 @@

ETags

Content-Type: image/jpeg
-The second time you request the same data, you include the ETag hash in an If-None-Match header of your request. If the data hasn’t changed, the server will send you back a 304 status code. As with the last-modified date checking, the server sends back only the 304 status code; it doesn’t send you the same data a second time. By including the ETag hash in your second request, you’re telling the server that there’s no need to re-send the same data if it still matches this hash, since you still have the data from the last time. + + +

The second time you request the same data, you include the ETag hash in an If-None-Match header of your request. If the data hasn’t changed, the server will send you back a 304 status code. As with the last-modified date checking, the server sends back only the 304 status code; it doesn’t send you the same data a second time. By including the ETag hash in your second request, you’re telling the server that there’s no need to re-send the same data if it still matches this hash, since you still have the data from the last time.

Again with the curl: @@ -161,6 +167,8 @@

Redirects

Cool URIs don’t change, but many URIs are seriously uncool. Web sites get reorganized, pages move to new addresses. Even web services can reorganize. A syndicated feed at http://example.com/index.xml might be moved to http://example.com/xml/atom.xml. Or an entire domain might move, as an organization expands and reorganizes; http://www.example.com/index.xml becomes http://server-farm-1.example.com/index.xml. +

+

Every time you request any kind of resource from an HTTP server, the server includes a status code in its response. Status code 200 means “everything’s normal, here’s the page you asked for”. Status code 404 means “page not found”. (You’ve probably seen 404 errors while browsing the web.) Status codes in the 300’s indicate some form of redirection.

HTTP has several different ways of signifying that a resource has moved. The two most common techiques are status codes 302 and 301. Status code 302 is a temporary redirect; it means “oops, that got moved over here temporarily” (and then gives the temporary address in a Location header). Status code 301 is a permanent redirect; it means “oops, that got moved permanently” (and then gives the new address in a Location header). If you get a 302 status code and a new address, the HTTP specification says you should use the new address to get what you asked for, but the next time you want to access the same resource, you should retry the old address. But if you get a 301 status code and a new address, you’re supposed to use the new address from then on. @@ -224,6 +232,8 @@

What’s On The Wire?

• The fourth line specifies the name of the library that is making the request. By default, this is Python-urllib plus a version number. Both urllib.request and httplib2 support changing the user agent, simply by adding a User-Agent header to the request (which will override the default value). + +

Now let’s look at what the server sent back in its response.

@@ -393,6 +403,8 @@

A Short Digression To Explain Why httplib2 Returns

If you know what sort of resource you’re expecting (an XML document in this case), perhaps you could “just” pass the returned bytes object to the xml.etree.ElementTree.parse() function. That’ll work as long as the XML document includes information on its own character encoding (as this one does), but that’s an optional feature and not all XML documents do that. If an XML document doesn’t include encoding information, the client is supposed to look at the enclosing transport — i.e. the Content-Type HTTP header, which can include a charset parameter. +

+

But it’s worse than that. Now character encoding information can be in two places: within the XML document itself, and within the Content-Type HTTP header. If the information is in both places, which one wins? According to RFC 3023 (I swear I am not making this up), if the media type given in the Content-Type HTTP header is application/xml, application/xml-dtd, application/xml-external-parsed-entity, or any one of the subtypes of application/xml such as application/atom+xml or application/rss+xml or even application/rdf+xml, then the encoding is

@@ -456,6 +468,8 @@

How httplib2 Handles Caching

1. Here’s the rub: this “response” was generated from httplib2’s local cache. That directory name you passed in when you created the httplib2.Http object — that directory holds httplib2’s cache of all the operations it’s ever performed.
+ +

If you want to turn on httplib2 debugging, you need to set a module-level constant (httplib2.debuglevel), then create a new httplib2.Http object. If you want to turn off debugging, you need to change the same module-level constant, then create a new httplib2.Http object.

@@ -576,6 +590,8 @@

How httplib2 Handles Last-ModifiedHow http2lib Handles Compression

+ +

HTTP supports two types of compression. httplib2 supports both of them.

• 0 comments on commit `727c149`

Please sign in to comment.