http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html

# Code Like a Pythonista: Idiomatic Python

David Goodger

goodger@python.org

http://python.net/~goodger

In this interactive tutorial, we'll cover many essential Python idioms and techniques in depth, adding immediately useful tools to your belt.

There are 3 versions of this presentation:

- S5 presentation

- Plain HTML handout

- reStructuredText source

©2006-2008, licensed under a Creative Commons Attribution/Share-Alike (BY-SA) license.

My credentials: I am

- a resident of Montreal,

- father of two great kids, husband of one special woman,

- a full-time Python programmer,

- author of the Docutils project and reStructuredText,

- an editor of the Python Enhancement Proposals (or PEPs),

- an organizer of PyCon 2007, and chair of PyCon 2008,

- a member of the Python Software Foundation,

- a Director of the Foundation for the past year, and its Secretary.

In the tutorial I presented at PyCon 2006 (called Text & Data Processing), I was surprised at the reaction to some techniques I used that I had thought were common knowledge. But many of the attendees were unaware of these tools that experienced Python programmers use without thinking.

Many of you will have seen some of these techniques and idioms before. Hopefully you'll learn a few techniques that you haven't seen before and maybe something new about the ones you have already seen.

# The Zen of Python (1)

These are the guiding principles of Python, but are open to interpretation. A sense of humor is required for their proper interpretation.

If you're using a programming language named after a sketch comedy troupe, you had better have a sense of humor.

    Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.

# The Zen of Python (2)

    In the face of ambiguity, refuse the temptation to guess.
    There should be one—and preferably only one—obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than right now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea—let's do more of those!
    —Tim Peters

This particular "poem" began as a kind of a joke, but it really embeds a lot of truth about the philosophy behind Python. The Zen of Python has been formalized in PEP 20, where the abstract reads:

    Long time Pythoneer Tim Peters succinctly channels the BDFL's guiding principles for Python's design into 20 aphorisms, only 19 of which have been written down.
    —http://www.python.org/dev/peps/pep-0020/

You can decide for yourself if you're a "Pythoneer" or a "Pythonista". The terms have somewhat different connotations.

When in doubt:

    import this

Try it in a Python interactive interpreter:

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


Here's another easter egg:

In [2]:
from __future__ import braces

SyntaxError: not a chance (<ipython-input-2-2aebb3fc8ecf>, line 1)

What a bunch of comedians! :-)

# Coding Style: Readability Counts

    Programs must be written for people to read, and only incidentally for machines to execute.
    — Abelson & Sussman, *Structure and Interpretation of Computer Programs*

Try to make your programs easy to read and obvious.

# PEP 8: Style Guide for Python Code

Worthwhile reading: http://www.python.org/dev/peps/pep-0008/

PEP = Python Enhancement Proposal

A PEP is a design document providing information to the Python community, or describing a new feature for Python or its processes or environment.

The Python community has its own standards for what source code should look like, codified in PEP 8. These standards are different from those of other communities, like C, C++, C#, Java, VisualBasic, etc.

Because indentation and whitespace are so important in Python, the Style Guide for Python Code approaches a standard. It would be wise to adhere to the guide! Most open-source projects and (hopefully) in-house projects follow the style guide quite closely.

# Whitespace 1

- 4 spaces per indentation level.

- No hard tabs.

- Never mix tabs and spaces.

    This is exactly what IDLE and the Emacs Python mode support. Other editors may also provide this support.

- One blank line between functions.

- Two blank lines between classes.

# Whitespace 2

- Add a space after "," in dicts, lists, tuples, & argument lists, and after ":" in dicts, but not before.

- Put spaces around assignments & comparisons (except in argument lists).

- No spaces just inside parentheses or just before argument lists.

- No spaces just inside docstrings.

In [3]:
def make_squares(key, value=0):
    """Return a dictionary and a list..."""
    d = {key: value}
    l = [key, value]
    return d, l

# Naming

- joined_lower for functions, methods, attributes

- joined_lower or ALL_CAPS for constants

- StudlyCaps for classes

- camelCase only to conform to pre-existing conventions

- Attributes: interface, _internal, __private

But try to avoid the __private form. I never use it. Trust me. If you use it, you WILL regret it later.

Explanation:

People coming from a C++/Java background are especially prone to overusing/misusing this "feature". But __private names don't work the same way as in Java or C++. They just trigger a name mangling whose purpose is to prevent accidental namespace collisions in subclasses: MyClass.__private just becomes MyClass._MyClass__private. (Note that even this breaks down for subclasses with the same name as the superclass, e.g. subclasses in different modules.) It is possible to access __private names from outside their class, just inconvenient and fragile (it adds a dependency on the exact name of the superclass).

The problem is that the author of a class may legitimately think "this attribute/method name should be private, only accessible from within this class definition" and use the __private convention. But later on, a user of that class may make a subclass that legitimately needs access to that name. So either the superclass has to be modified (which may be difficult or impossible), or the subclass code has to use manually mangled names (which is ugly and fragile at best).

There's a concept in Python: "we're all consenting adults here". If you use the __private form, who are you protecting the attribute from? It's the responsibility of subclasses to use attributes from superclasses properly, and it's the responsibility of superclasses to document their attributes properly.

It's better to use the single-leading-underscore convention, _internal. 　This isn't name mangled at all; it just indicates to others to "be careful with this, it's an internal implementation detail; don't touch it if you don't fully understand it". It's only a convention though.

There are some good explanations in the answers here:

http://stackoverflow.com/questions/70528/why-are-pythons-private-methods-not-actually-private

http://stackoverflow.com/questions/1641219/does-python-have-private-variables-in-classes

# Long Lines & Continuations

Keep lines below 80 characters in length.

Use implied line continuation inside parentheses/brackets/braces:

In [4]:
def __init__(self, first, second, third,
             fourth, fifth, sixth):
    output = (first + second + third
              + fourth + fifth + sixth)

Use backslashes as a last resort:

In [5]:
VeryLong.left_hand_side \
    = even_longer.right_hand_side()

NameError: name 'even_longer' is not defined

Backslashes are fragile; they must end the line they're on. If you add a space after the backslash, it won't work any more. Also, they're ugly.

# Long Strings

Adjacent literal strings are concatenated by the parser:

In [6]:
print 'o' 'n' "e"

one


The spaces between literals are not required, but help with readability. Any type of quoting can be used:

In [7]:
print 't' r'\/\/' """o"""

t\/\/o


The string prefixed with an "r" is a "raw" string. Backslashes are not evaluated as escapes in raw strings. They're useful for regular expressions and Windows filesystem paths.

Note named string objects are not concatenated:

In [8]:
a = 'three'
b = 'four'
a b

SyntaxError: invalid syntax (<ipython-input-8-11c003d22034>, line 3)

That's because this automatic concatenation is a feature of the Python parser/compiler, not the interpreter. You must use the "+" operator to concatenate strings at run time.

In [10]:
text = ('Long strings can be made up '
        'of several shorter strings.')

The parentheses allow implicit line continuation.

Multiline strings use triple quotes:

In [11]:
"""Triple
double
quotes"""

'Triple\ndouble\nquotes'

In the last example above (triple single quotes), note how the backslashes are used to escape the newlines. This eliminates extra newlines, while keeping the text and quotes nicely left-justified. The backslashes must be at the end of their lines.

# Compound Statements

Good:

In [12]:
if foo == 'blah':
    do_something()
do_one()
do_two()
do_three()

NameError: name 'foo' is not defined

Bad:

In [13]:
if foo == 'blah': do_something()
do_one(); do_two(); do_three()

NameError: name 'foo' is not defined

Whitespace & indentations are useful visual indicators of the program flow. The indentation of the second "Good" line above shows the reader that something's going on, whereas the lack of indentation in "Bad" hides the "if" statement.

Multiple statements on one line are a cardinal sin. In Python, *readability counts*.

# Docstrings & Comments

Docstrings = How to use code

Comments = Why (rationale) & how code works

Docstrings explain how to use code, and are for the users of your code. Uses of docstrings:

- Explain the purpose of the function even if it seems obvious to you, because it might not be obvious to someone else later on.

- Describe the parameters expected, the return values, and any exceptions raised.

- If the method is tightly coupled with a single caller, make some mention of the caller (though be careful as the caller might change later).

Comments explain why, and are for the maintainers of your code. Examples include notes to yourself, like:

In [14]:
# !!! BUG: ...

# !!! FIX: This is a hack

# ??? Why is this here?

Both of these groups include you, so write good docstrings and comments!

Docstrings are useful in interactive use (help()) and for auto-documentation systems.

False comments & docstrings are worse than none at all. So keep them up to date! When you make changes, make sure the comments & docstrings are consistent with the code, and don't contradict it.

There's an entire PEP about docstrings, PEP 257, "Docstring Conventions":

http://www.python.org/dev/peps/pep-0257/

# Practicality Beats Purity

*A foolish consistency is the hobgoblin of little minds.* —Ralph Waldo Emerson

(hobgoblin: Something causing superstitious fear; a bogy.)

There are always exceptions. From PEP 8:

But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!

Two good reasons to break a particular rule:

1. When applying the rule would make the code less readable, even for someone who is used to reading code that follows the rules.

2. To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess (in true XP style).

... but practicality shouldn't beat purity to a pulp!

# Idiom Potpourri

A selection of small, useful idioms.

Now we move on to the meat of the tutorial: lots of idioms.

We'll start with some easy ones and work our way up.

# Swap Values

In other languages:
    
    temp = a
    a = b
    b = temp

In Python:

    b, a = a, b

Perhaps you've seen this before. But do you know how it works?

- The comma is the tuple constructor syntax.

- A tuple is created on the right (tuple packing).

- A tuple is the target on the left (tuple unpacking).

The right-hand side is unpacked into the names in the tuple on the left-hand side.

Further examples of unpacking:

In [19]:
l = ['David', 'Pythonista', '+1-514-555-1234']
name, title, phone = l
name

'David'

In [20]:
title

'Pythonista'

In [21]:
phone

'+1-514-555-1234'

Useful in loops over structured data:

l (L) above is the list we just made (David's info). So people is a list containing two items, each a 3-item list.

In [3]:
people = [l, ['Guido', 'BDFL', 'unlisted']]
for (name, title, phone) in people:
    print name, phone

David +1-514-555-1234
Guido unlisted


Each item in people is being unpacked into the (name, title, phone) tuple.

Arbitrarily nestable (just be sure to match the structure on the left & right!):

In [22]:
david, (gname, gtitle, gphone) = people
gname

'Guido'

In [23]:
gtitle

'BDFL'

In [24]:
gphone

'unlisted'

In [25]:
david

['David', 'Pythonista', '+1-514-555-1234']

# More About Tuples

We saw that the comma is the tuple constructor, not the parentheses. Example:

In [6]:
1,

(1,)

The Python interpreter shows the parentheses for clarity, and I recommend you use parentheses too:

In [7]:
(1, )

(1,)

Don't forget the comma!

In [8]:
(1)

1

In a one-tuple, the trailing comma is required; in 2+-tuples, the trailing comma is optional. In 0-tuples, or empty tuples, a pair of parentheses is the shortcut syntax:

In [9]:
()

()

In [10]:
tuple()

()

A common typo is to leave a comma even though you don't want a tuple. It can be easy to miss in your code:

In [11]:
value = 1,
value

(1,)

So if you see a tuple where you don't expect one, look for a comma!

# Interactive "_"

This is a really useful feature that surprisingly few people know.

In the interactive interpreter, whenever you evaluate an expression or call a function, the result is bound to a temporary name, "_" (an underscore):

In [13]:
1 + 1

2

In [14]:
_

2

"_" stores the last printed expression.

When a result is None, nothing is printed, so "_" doesn't change. That's convenient!

This only works in the interactive interpreter, not within a module.

It is especially useful when you're working out a problem interactively, and you want to store the result for a later step:

In [16]:
import math
math.pi / 3

1.0471975511965976

In [17]:
angle = _
math.cos(angle)

0.5000000000000001

In [18]:
_

0.5000000000000001

# Building Strings from Substrings

Start with a list of strings:

In [26]:
colors = ['red', 'blue', 'green', 'yellow']

We want to join all the strings together into one large string. Especially when the number of substrings is large...

Don't do this:

In [29]:
result = ''
for s in colors:
    result += s
print result

redbluegreenyellow


This is very inefficient.

It has terrible memory usage and performance patterns. The "summation" will compute, store, and then throw away each intermediate step.

Instead, do this:

In [30]:
result = ''.join(colors)
print result

redbluegreenyellow


The join() string method does all the copying in one pass.

When you're only dealing with a few dozen or hundred strings, it won't make much difference. But get in the habit of building strings efficiently, because with thousands or with loops, it will make a difference.

# Building Strings, Variations 1

Here are some techniques to use the join() string method.

If you want spaces between your substrings:

In [31]:
result = ' '.join(colors)
print result

red blue green yellow


Or commas and spaces:

In [32]:
result = ', '.join(colors)
print result

red, blue, green, yellow


Here's a common case:

In [33]:
colors = ['red', 'blue', 'green', 'yellow']
print 'Choose', ', '.join(colors[:-1]), \
      'or', colors[-1]

Choose red, blue, green or yellow


To make a nicely grammatical sentence, we want commas between all but the last pair of values, where we want the word "or". The slice syntax does the job. The "slice until -1" ([:-1]) gives all but the last value, which we join with comma-space.

Of course, this code wouldn't work with corner cases, lists of length 0 or 1.

# Building Strings, Variations 2

If you need to apply a function to generate the substrings:

    result = ''.join(fn(i) for i in items)

This involves a generator expression, which we'll cover later.

If you need to compute the substrings incrementally, accumulate them in a list first:

    items = []
    ...
    items.append(item)  # many times
    ...
    # items is now complete
    result = ''.join(fn(i) for i in items)

We accumulate the parts in a list so that we can apply the join string method, for efficiency.

# Use 'in' where possible (1)

Good:

    for key in d:
        print key

- 'in' is generally faster.

- This pattern also works for items in arbitrary containers (such as lists, tuples, and sets).

- 'in' is also an operator (as we'll see).

Bad:

    for key in d.keys():
        print key

This is limited to objects with a keys() method.

# Use 'in' where possible (2)

But .keys() is necessary when mutating the dictionary:

    for key in d.keys():
        d[str(key)] = d[key]

d.keys() creates a static list of the dictionary keys. Otherwise, you'll get an exception "RuntimeError: dictionary changed size during iteration".

For consistency, use key in dict, not dict.has_key():

    # do this:
    if key in d:
        ...do something with d[key]

    # not this:
    if d.has_key(key):
        ...do something with d[key]

This usage of in is as an operator.