<img src='https://s3.amazonaws.com/drivendata.org/kif-example/img/dd.png' style=\"width:100%\"/>

# Python syntax refresher

## Introduction

This module is adapted from Google's excellent [Python class](https://developers.google.com/edu/python/), which is distributed under the Creative Commons Attribution [license](http://creativecommons.org/licenses/by/2.5/).

Python is a dynamic, interpreted (bytecode-compiled) language. There are no type declarations of variables, parameters, functions, or methods in source code. This makes the code short and flexible, and you lose the compile-time type checking of the source code. Python tracks the types of all values at runtime and flags code that does not make sense as it runs.

### The interactive interpreter - AKA the read-eval-print-loop ("REPL")

An excellent way to see how Python code works is to run the Python interpreter and type code right into it. If you ever have a question like, "What happens if I add an int to a list?" Just typing it into the Python interpreter is a fast and likely the best way to see what happens. (See below to see what really happens!)

You can use the interactive Python interpreter to play experiment:

```python
$ python        ## Run the Python interpreter
Python 2.7.9 (default, Dec 30 2014, 03:41:42) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-55)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 6       ## set a variable in this interpreter session
>>> a           ## entering an expression prints its value
6
>>> a + 2
8
>>> a = 'hi'    ## 'a' can hold a string just as well
>>> a
'hi'
>>> len(a)      ## call the len() function on a string
2
>>> a + len(a)  ## try something that doesn't work
Traceback (most recent call last):
  File "", line 1, in 
TypeError: cannot concatenate 'str' and 'int' objects
>>> a + str(len(a))  ## probably what you really wanted
'hi2'
>>> foo         ## try something else that doesn't work
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'foo' is not defined
>>> ^D          ## type CTRL-d to exit (CTRL-z in Windows/DOS terminal)
```

### Source code files

Here's a very simple hello.py program (notice that blocks of code are delimited strictly using indentation rather than curly braces — more on this later!):

```python
import sys

# Gather our code in a main() function
def main():
    print 'Hello there', sys.argv[1]
    # Command line args are in sys.argv[1], sys.argv[2] ...
    # sys.argv[0] is the script name itself and can be ignored

# Standard boilerplate to call the main() function to begin
# the program.
if __name__ == '__main__':
    main()
```

Running this program from the command line looks like:

```
$ python hello.py Peter
Hello there Peter
```

### Notebooks

We're running an even more awesome interactive version of Python here in a **Jupyter notebook**. Jupyter notebooks allow us to alternate text cells like this one (written in [Markdown](https://en.wikipedia.org/wiki/Markdown)), mathematical equations (written in [LaTeX](https://en.wikipedia.org/wiki/LaTeX)),

$$P(\theta | \mathcal{D}) = \frac{P(\mathcal{D} | \theta)P(\theta)}{P(\mathcal{D})}$$


and code which runs and outputs right here in the notebook:

In [2]:
person = 'Krista'
print('Hello ' + person)

Hello Krista


### User defined functions

Functions in Python are defined like this:

In [4]:
# Defines a "repeat" function that takes 2 arguments.
def repeat(s, exclaim):
    """
    Returns the string 's' repeated 3 times.
    If exclaim is true, add exclamation marks.
    """

    result = s + s + s # can also use "s * 3" which is faster (Why?)
    if exclaim:
        result = result + '!!!'
    return result

In [6]:
repeat('Woohoo', True)

'WoohooWoohooWoohoo!!!'

In [None]:
repeat('Yay', True)

What if we don't specify one of the arguments?

In [12]:
repeat('Yay')

'YayYayYay'

We can actually specify argument defaults, called **keyword arguments**:

In [13]:
# Defines a "repeat" function that takes 2 arguments.
def repeat(s, exclaim=False):
    """
    Returns the string 's' repeated 3 times.
    If exclaim is true, add exclamation marks.
    """

    result = s + s + s # can also use "s * 3" which is faster (Why?)
    if exclaim:
        result = result + '!!!'
    return result

In [11]:
repeat('Yay')

'YayYayYay'

And the following two usages are equivalent:

In [14]:
repeat('Yay', True)

'YayYayYay!!!'

In [15]:
repeat('Yay', exclaim=True)

'YayYayYay!!!'

### Code Checked at Runtime

Python does very little checking at compile time, deferring almost all type, name, etc. checks on each line until that line runs. Suppose the above main() calls repeat() like this:

In [16]:
def try_stuff():
    if name == 'Guido':
        print repeeeet(name) + '!!!'  # not an actual function
    else:
        print repeat(name)

We could run this cell and nothing bad happens - at least until the code is run:

In [17]:
try_stuff()

NameError: global name 'name' is not defined

### Variable names

Since Python variables don't have any type spelled out in the source code, it's extra helpful to give meaningful names to your variables to remind yourself of what's going on. So use "name" if it's a single name, and "names" if it's a list of names, and "tuples" if it's a list of tuples. Many basic Python errors result from forgetting what type of value is in each variable, so use your variable names (all you have really) to help keep things straight.

As far as actual naming goes, some languages prefer underscored_parts for variable names made up of "more than one word," but other languages prefer camelCasing. In general, Python prefers the underscore method but guides developers to defer to camelCasing if integrating into existing Python code that already uses that style. Readability counts. Read more in the section on naming conventions in PEP 8.

As you can guess, keywords like 'print' and 'while' cannot be used as variable names — you'll get a syntax error if you do. However, be careful not to use built-ins as variable names. For example, while 'str' and 'list' may seem like good names, you'd be overriding those system variables. Built-ins are not keywords and thus, are susceptible to inadvertent use by new Python developers.

In [None]:
# BAD
zz = ['apple', 'banana', 'orange']

# ANNOYING BUT STILL ACCEPTABLE IF YOU MUST
fruitNames = ['apple', 'banana', 'orange']

# GOOD
fruit_names = ['apple', 'banana', 'orange']

### Online help, `help()`, and `dir()`

There are a variety ways to get help for Python.

* Do a Google search, starting with the word "python", like "python list" or "python string lowercase". The first hit is often the answer. This technique seems to work better for Python than it does for other languages for some reason.
* The official Python docs site — docs.python.org — has high quality docs. Nonetheless, I often find a Google search of a couple words to be quicker.
* There is also an official Tutor mailing list specifically designed for those who are new to Python and/or programming!
* Many questions (and answers) can be found on StackOverflow and Quora.
* Use the `help()` and `dir()` functions (see below).

Inside the Python interpreter, the `help()` function pulls up documentation strings for various modules, functions, and methods. These doc strings are similar to Java's javadoc. The `dir()` function tells you what the attributes of an object are. Below are some ways to call `help()` and `dir()` from the interpreter:

* `help(len)` — help string for the built-in `len()` function; note that it's "len" not "len()", which is a call to the function, which we don't want
* `help(sys)` — help string for the `sys` module (must do an `import sys` first)
* `dir(sys)` — `dir()` is like `help()` but just gives a quick list of its defined symbols, or "attributes"
* `help(sys.exit)` — help string for the `exit()` function in the sys module
* `help('xyz'.split)` — help string for the `split()` method for string objects. You can call `help()` with that object itself or an example of that object, plus its attribute. For example, calling `help('xyz'.split)` is the same as calling `help(str.split)`.
* `help(list)` — help string for `list` objects
* `dir(list)` — displays `list` object attributes, including its methods
* `help(list.append)` — help string for the `append()` method for `list` objects

## Strings

Python has a built-in string class named "str" with many handy features (there is an older module named "string" which you should not use). String literals can be enclosed by either double or single quotes, although single quotes are more commonly used.

In [18]:
a = 'something'  # single quotes
b = "something"  # double quotes
a == b           # they're the same

True

In [19]:
sentence_with_quotes = '"No," he replied coldly'
print sentence_with_quotes

"No," he replied coldly


In [20]:
sentence_with_apostrophe = "Guess I won't see you there."
print sentence_with_apostrophe

Guess I won't see you there.


Backslash escapes work the usual way within both single and double quoted literals -- e.g. \n \' \". A double quoted string literal can contain single quotes without any fuss (e.g. "I didn't do it") and likewise single quoted string can contain double quotes. A string literal can span multiple lines, but there must be a backslash \ at the end of each line to escape the newline. String literals inside triple quotes, """" or ''', can multiple lines of text.

In [21]:
sentence_with_newlines = "guido van rossum\nreally liked monty python\nhence the language name"
print sentence_with_newlines

guido van rossum
really liked monty python
hence the language name


In [22]:
sentence_with_multiple_lines = """using triple quotes
is pleasantly python
for multiple lines"""
print sentence_with_multiple_lines

using triple quotes
is pleasantly python
for multiple lines


In [23]:
another_sentence = 'strings that are spread across multiple ' \
    'lines can be split with the "\\" character'
print another_sentence

strings that are spread across multiple lines can be split with the "\" character


Python strings are "immutable" which means they cannot be changed after they are created (Java strings also use this immutable style). Since strings can't be changed, we construct *new* strings as we go to represent computed values. So for example the expression ('hello' + 'there') takes in the 2 strings 'hello' and 'there' and builds a new string 'hellothere'.

Characters in a string can be accessed using the standard [ ] syntax, and like Java and C++, Python uses zero-based indexing, so if str is 'hello' str[1] is 'e'. If the index is out of bounds for the string, Python raises an error. The Python style (unlike Perl) is to halt if it can't tell what to do, rather than just make up a default value. The handy "slice" syntax (below) also works to extract any substring from a string. The len(string) function returns the length of a string. The [ ] syntax and the len() function actually work on any sequence type -- strings, lists, etc.. Python tries to make its operations work consistently across different types. Python newbie gotcha: don't use "len" as a variable name to avoid blocking out the len() function. The '+' operator can concatenate two strings. Notice in the code below that variables are not pre-declared -- just assign to them and go.

In [24]:
s = 'hi'

In [25]:
print s[1]

i


In [26]:
print len(s)

2


In [27]:
print s + ' there'

hi there


Unlike Java, the '+' does not automatically convert numbers or other types to string form. The str() function converts values to a string form so they can be combined with other strings.

In [28]:
pi = 3.14

In [29]:
text = 'The value of pi is ' + pi      ## NO, does not work

TypeError: cannot concatenate 'str' and 'float' objects

In [30]:
text = 'The value of pi is '  + str(pi)  ## yes
print text

The value of pi is 3.14


For numbers, the standard operators, +, /, * work in the usual way. There is no ++ operator, but +=, -=, etc. work. If you want integer division, it is most correct to use 2 slashes -- e.g. 6 // 5 is 1 (prior to Python 3, a single / does int division with ints anyway, but moving forward // is the preferred way to indicate that you want int division.)

The "print" operator prints out one or more python items followed by a newline (leave a trailing comma at the end of the items to inhibit the newline). A "raw" string literal is prefixed by an 'r' and passes all the chars through without special treatment of backslashes, so r'x\nx' evaluates to the length-4 string 'x\nx'. A 'u' prefix allows you to write a unicode string literal (Python has lots of other unicode support features -- see the docs below).

In [31]:
raw = r'this\t\n and that'
print raw     ## this\t\n and that

multi = """It was the best of times.
It was the worst of times."""

this\t\n and that


### String methods

Here are some of the most common string methods. A method is like a function, but it runs "on" an object. If the variable s is a string, then the code s.lower() runs the lower() method on that string object and returns the result (this idea of a method running on an object is one of the basic ideas that make up Object Oriented Programming, OOP). Here are some of the most common string methods:

* s.lower(), s.upper() -- returns the lowercase or uppercase version of the string
* s.strip() -- returns a string with whitespace removed from the start and end
* s.isalpha()/s.isdigit()/s.isspace()... -- tests if all the string chars are in the various character classes
* s.startswith('other'), s.endswith('other') -- tests if the string starts or ends with the given other string
* s.find('other') -- searches for the given other string (not a regular expression) within s, and returns the first index where it begins or -1 if not found
* s.replace('old', 'new') -- returns a string where all occurrences of 'old' have been replaced by 'new'
* s.split('delim') -- returns a list of substrings separated by the given delimiter. The delimiter is not a regular expression, it's just text. 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']. As a convenient special case s.split() (with no arguments) splits on all whitespace chars.
* s.join(list) -- opposite of split(), joins the elements in the given list together using the string as the delimiter. e.g. '---'.join(['aaa', 'bbb', 'ccc']) -> aaa---bbb---ccc

A google search for "python str" should lead you to the official [python.org string methods](http://docs.python.org/library/stdtypes.html#string-methods) which lists all the str methods.

Python does not have a separate character type. Instead an expression like s[8] returns a string-length-1 containing the character. With that string-length-1, the operators ==, <=, ... all work as you would expect, so mostly you don't need to know that Python does not have a separate scalar "char" type.

### String slices

The "slice" syntax is a handy way to refer to sub-parts of sequences -- typically strings and lists. The slice s[start:end] is the elements beginning at start and extending up to but not including end.

Suppose we have s = "Hello"

![](figures/0.2-hello-string.png)

* s[1:4] is 'ell' -- chars starting at index 1 and extending up to but not including index 4
* s[1:] is 'ello' -- omitting either index defaults to the start or end of the string
* s[:] is 'Hello' -- omitting both always gives us a copy of the whole thing (this is the pythonic way to copy a sequence like a string or list)
* s[1:100] is 'ello' -- an index that is too big is truncated down to the string length


The standard zero-based index numbers give easy access to chars near the start of the string. As an alternative, Python uses negative numbers to give easy access to the chars at the end of the string: s[-1] is the last char 'o', s[-2] is 'l' the next-to-last char, and so on. Negative index numbers count back from the end of the string:

* s[-1] is 'o' -- last char (1st from the end)
* s[-4] is 'e' -- 4th from the end
* s[:-3] is 'He' -- going up to but not including the last 3 chars.
* s[-3:] is 'llo' -- starting with the 3rd char from the end and extending to the end of the string.

It is a neat truism of slices that for any index n, `s[:n] + s[n:] == s`. This works even for n negative or out of bounds. Or put another way s[:n] and s[n:] always partition the string into two string parts, conserving all the characters. As we'll see in the list section later, slices work with lists too.

### String %

Python has a printf()-like facility to put together a string. The % operator takes a printf-type format string on the left (%d int, %s string, %f/%g floating point), and the matching values in a tuple on the right (a tuple is made of values separated by commas, typically grouped inside parenthesis):

Here are some of the most common string methods. A method is like a function, but it runs "on" an object. If the variable s is a string, then the code s.lower() runs the lower() method on that string object and returns the result (this idea of a method running on an object is one of the basic ideas that make up Object Oriented Programming, OOP). Here are some of the most common string methods:

* s.lower(), s.upper() -- returns the lowercase or uppercase version of the string
* s.strip() -- returns a string with whitespace removed from the start and end
* s.isalpha()/s.isdigit()/s.isspace()... -- tests if all the string chars are in the various character classes
* s.startswith('other'), s.endswith('other') -- tests if the string starts or ends with the given other string
* s.find('other') -- searches for the given other string (not a regular expression) within s, and returns the first index where it begins or -1 if not found
* s.replace('old', 'new') -- returns a string where all occurrences of 'old' have been replaced by 'new'
* s.split('delim') -- returns a list of substrings separated by the given delimiter. The delimiter is not a regular expression, it's just text. 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']. As a convenient special case s.split() (with no arguments) splits on all whitespace chars.
* s.join(list) -- opposite of split(), joins the elements in the given list together using the string as the delimiter. e.g. '---'.join(['aaa', 'bbb', 'ccc']) -> aaa---bbb---ccc

A google search for "python str" should lead you to the official [python.org string methods](http://docs.python.org/library/stdtypes.html#string-methods) which lists all the str methods.

Python does not have a separate character type. Instead an expression like s[8] returns a string-length-1 containing the character. With that string-length-1, the operators ==, <=, ... all work as you would expect, so mostly you don't need to know that Python does not have a separate scalar "char" type.

In [32]:
# % operator
text = "%d little pigs come out or I'll %s and %s and %s" % (3, 'huff', 'puff', 'blow down')
text

"3 little pigs come out or I'll huff and puff and blow down"

The above line is kind of long -- suppose you want to break it into separate lines. You cannot just split the line after the '%' as you might in other languages, since by default Python treats each line as a separate statement (on the plus side, this is why we don't need to type semi-colons on each line). To fix this, enclose the whole expression in an outer set of parenthesis -- then the expression is allowed to span multiple lines. This code-across-lines technique works with the various grouping constructs detailed below: ( ), [ ], { }.

In [33]:
# add parens to make the long-line work:
text = ("%d little pigs come out or I'll %s and %s and %s" %
    (3, 'huff', 'puff', 'blow down'))
text

"3 little pigs come out or I'll huff and puff and blow down"

### Unicode

Regular Python strings are *not* unicode, they are just plain bytes. To create a unicode string, use the 'u' prefix on the string literal:

In [None]:
ustring = u'A unicode \u018e string \xf1'
ustring

A unicode string is a different type of object from regular "str" string, but the unicode string is compatible (they share the common superclass "basestring"), and the various libraries such as regular expressions work correctly if passed a unicode string instead of a regular string.

To convert a unicode string to bytes with an encoding such as 'utf-8', call the ustring.encode('utf-8') method on the unicode string. Going the other direction, the unicode(s, encoding) function converts encoded plain bytes to a unicode string:

In [None]:
## (ustring from above contains a unicode string)
s = ustring.encode('utf-8')
s ## bytes of utf-8 encoding

In [None]:
t = unicode(s, 'utf-8')  ## Convert bytes back to a unicode string
t == ustring             ## It's the same as the original, yay!

### If statement

Python does not use `{` and `}` to enclose blocks of code for if/loops/function etc. Instead, Python uses the colon (`:`) and indentation/whitespace to group statements. The boolean test for an `if` does not need to be in parenthesis (big difference from C++/Java), and it can have `elif` and `else` clauses (mnemonic: the word "elif" is the same length as the word "else").

Any value can be used as an if-test. The "zero" values all count as false: None, 0, empty string, empty list, empty dictionary. There is also a Boolean type with two values: True and False (converted to an int, these are 1 and 0). Python has the usual comparison operations: `==`, `!=`, `<`, `<=`, `>`, `>=`. Unlike Java and C, == is overloaded to work correctly with strings. The boolean operators are the spelled out words `and`, `or`, `not` (Python does not use the C-style `&&` `||` `!`). Here's what the code might look like for a policeman pulling over a speeder -- notice how each block of then/else statements starts with a `:` and the statements are grouped by their indentation:

In [37]:
def check_speed(speed, mood):
    """ take appropriate action based on somebody's speed """
    if speed >= 80:
        print 'License and registration please'    
        
        if mood == 'terrible' or speed >= 100:
            print 'You have the right to remain silent.'
            
        elif mood == 'bad' or speed >= 90:
            print "I'm going to have to write you a ticket."
            
        else:
            print "Let's try to keep it under 80 ok?"
            
check_speed(90, 'good')

License and registration please
I'm going to have to write you a ticket.


### `assert` statement (useful for the lab!)

Like many languages, Python has an `assert` statement. The `assert` statement is meant to test for conditions in a piece of software that **will always be `true`.**

An assert statement is a simple way to make sure that code does what you expect. Generally, `assert` statements are not used for error conditions in production code, but they can make quick-and-dirty unit tests and debugging simpler while you are developing.

We use `assert` statements in the lab for this section. Your goal is to update the code in the lab notebook until it executes without complaining.

In [38]:
a = 1

assert a == 1
assert a * 10 == 10

In [39]:
assert a == 0 # Definitely not true....

AssertionError: 

In [40]:
assert repeat('Wow', True) == 'WowWowWow!!!'

assert repeat('Wow', False) == 'WowWowWow!!!'

AssertionError: 

## Lists

Python has a great built-in list type named "list". List literals are written within square brackets [ ]. Lists work similarly to strings -- use the len() function and square brackets [ ] to access data, with the first element at index 0. (See the official python.org list docs.)

In [41]:
colors = ['red', 'blue', 'green']
print colors[0]    ## red
print colors[2]    ## green
print len(colors)  ## 3

red
green
3


![](figures/0.2-list1.png)

Assignment with an = on lists does not make a copy. Instead, assignment makes the two variables point to the one list in memory.

In [None]:
b = colors   ## Does not copy the list
b

In [42]:
b[0] = 'pink'
colors

TypeError: 'str' object does not support item assignment

![](figures/0.2-list2.png)

The "empty list" is just an empty pair of brackets [ ]. The '+' works to append two lists, so [1, 2] + [3, 4] yields [1, 2, 3, 4] (this is just like + with strings).

### `for` and `in`

Python's *for* and *in* constructs are extremely useful, and the first use of them we'll see is with lists. The *for* construct -- for var in list -- is an easy way to look at each element in a list (or other collection). Do not add or remove from the list during iteration.

In [43]:
squares = [1, 4, 9, 16]
sum = 0
for num in squares:
    sum += num
print sum

30


If you know what sort of thing is in the list, use a variable name in the loop that captures that information such as "num", or "name", or "url". Since python code does not have other syntax to remind you of types, your variable names are a key way for you to keep straight what is going on.

The *in* construct on its own is an easy way to test if an element appears in a list (or other collection) -- value in collection -- tests if the value is in the collection, returning True/False.

In [44]:
l = ['larry', 'curly', 'moe']
if 'curly' in l:
    print 'yay'

yay


The for/in constructs are very commonly used in Python code and work on data types other than list, so should just memorize their syntax. You may have habits from other languages where you start manually iterating over a collection, where in Python you should just use for/in.

You can also use for/in to work on a string. The string acts like a list of its chars, so `for ch in s: print ch` prints all the chars in a string.

### Range

The range(n) function yields the numbers 0, 1, ... n-1, and range(a, b) returns a, a+1, ... b-1 -- up to but not including the last number. The combination of the for-loop and the range() function allow you to build a traditional numeric for loop:

In [45]:
## print the numbers from 0 through 9
for i in range(10):
    print i

0
1
2
3
4
5
6
7
8
9


There is a variant xrange() which avoids the cost of building the whole list for performance sensitive cases (in Python 3, range() has the good performance behavior and you can forget about xrange()).

### While Loop

Python also has the standard while-loop, and the *break* and *continue* statements work as in C++ and Java, altering the course of the innermost loop. The above for/in loops solves the common case of iterating over every element in a list, but the while loop gives you total control over the index numbers. Here's a while loop which accesses every 3rd element in a list:

In [None]:
a = range(30)

In [None]:
## Access every 3rd element in a list
i = 0
while i < len(a):
    print a[i]
    i = i + 3

**Pro-tip**: a more Pythonic way to do this uses `enumerate`, which emits each element as a tuple of (*index*, *element*):

In [None]:
for i, a in enumerate(a):
    if i % 3 == 0:
        print a

### List Methods

Here are some other common list methods.

* list.append(elem) -- adds a single element to the end of the list. Common error: does not return the new list, just modifies the original.
* list.insert(index, elem) -- inserts the element at the given index, shifting elements to the right.
* list.extend(list2) adds the elements in list2 to the end of the list. Using + or += on a list is similar to using extend().
* list.index(elem) -- searches for the given element from the start of the list and returns its index. Throws a ValueError if the element does not appear (use "in" to check without a ValueError).
* list.remove(elem) -- searches for the first instance of the given element and removes it (throws ValueError if not present)
* list.sort() -- sorts the list in place (does not return it). (The sorted() function shown below is preferred.)
* list.reverse() -- reverses the list in place (does not return it)
* list.pop(index) -- removes and returns the element at the given index. Returns the rightmost element if index is omitted (roughly the opposite of append()).

Notice that these are *methods* on a list object, while len() is a function that takes the list (or string or whatever) as an argument.

In [None]:
l = ['larry', 'curly', 'moe']
l.append('shemp')         ## append elem at end
l

In [None]:
l.insert(0, 'xxx')        ## insert elem at index 0
l

In [None]:
l.extend(['yyy', 'zzz'])  ## add list of elems at end
l

In [None]:
l.index('curly')

In [None]:
l.remove('curly')         ## search and remove that element
l

In [None]:
l.pop(1)  ## removes and returns 'larry'

In [None]:
l

### List build up

One common pattern is to start a list a the empty list [], then use append() or extend() to add elements to it:

In [None]:
l = []          ## Start as the empty list
l.append('a')   ## Use append() to add elements
l

In [None]:
l.append('b')
l

### List slices

Slices work on lists just as with strings, and can also be used to change sub-parts of the list.

In [None]:
l = ['a', 'b', 'c', 'd']
l[1:-1]

In [None]:
l[0:2] = 'z'
l

## Sorting

The easiest way to sort is with the sorted(list) function, which takes a list and returns a new list with those elements in sorted order. The original list is not changed.

In [None]:
a = [5, 1, 4, 3]
print sorted(a)

In [None]:
print a

It's most common to pass a list into the sorted() function, but in fact it can take as input any sort of iterable collection. The older list.sort() method is an alternative detailed below. The sorted() function seems easier to use compared to sort(), so I recommend using sorted().

The sorted() function can be customized though optional arguments. The sorted() optional argument reverse=True, e.g. sorted(list, reverse=True), makes it sort backwards.

In [None]:
strs = ['aa', 'BB', 'zz', 'CC']
print sorted(strs)  ## case sensitive

In [None]:
print sorted(strs, reverse=True)

### Custom Sorting With `key=`

For more complex custom sorting, sorted() takes an optional "key=" specifying a "key" function that transforms each element before comparison. The key function takes in 1 value and returns 1 value, and the returned "proxy" value is used for the comparisons within the sort.

For example with a list of strings, specifying key=len (the built in len() function) sorts the strings by length, from shortest to longest. The sort calls len() for each string to get the list of proxy length values, and the sorts with those proxy values.

In [None]:
strs = ['ccc', 'aaaa', 'd', 'bb']
print sorted(strs, key=len)

![](figures/0.2-sorted-key.png)

As another example, specifying "str.lower" as the key function is a way to force the sorting to treat uppercase and lowercase the same:

In [None]:
## "key" argument specifying str.lower function to use for sorting
print sorted(strs, key=str.lower)

You can also pass in your own function as the key function, like this:

In [None]:
## Say we have a list of strings we want to sort by the last letter of the string.
strs = ['xc', 'zb', 'yd' ,'wa']

def last_letter(s):
    """ takes a string, and returns its last letter """
    return s[-1]

## Now pass key=last_letter to sorted() to sort by the last letter:
print sorted(strs, key=last_letter)

To use key= custom sorting, remember that you provide a function that takes one value and returns the proxy value to guide the sorting. There is also an optional argument "cmp=cmpFn" to sorted() that specifies a traditional two-argument comparison function that takes two values from the list and returns negative/0/positive to indicate their ordering. The built in comparison function for strings, ints, ... is cmp(a, b), so often you want to call cmp() in your custom comparator. The newer one argument key= sorting is generally preferable.

### sort() method

As an alternative to sorted(), the sort() method on a list sorts that list into ascending order, e.g. list.sort(). The sort() method changes the underlying list and returns None, so use it like this:

In [None]:
strs = list('zyxmnopabcdefg')
strs

In [None]:
strs.sort()
strs

In [None]:
sorted_strs = strs.sort()
print sorted_strs

The above is a very common misunderstanding with sort() -- it *does not return* the sorted list. The sort() method must be called on a list; it does not work on any enumerable collection (but the sorted() function above works on anything). The sort() method predates the sorted() function, so you will likely see it in older code. The sort() method does not need to create a new list, so it can be a little faster in the case that the elements to sort are already in a list.

### Tuples

A tuple is a fixed size grouping of elements, such as an (x, y) co-ordinate. Tuples are like lists, except they are immutable and do not change size (tuples are not strictly immutable since one of the contained elements could be mutable). Tuples play a sort of "struct" role in Python -- a convenient way to pass around a little logical, fixed size bundle of values. A function that needs to return multiple values can just return a tuple of the values. For example, if I wanted to have a list of 3-d coordinates, the natural python representation would be a list of tuples, where each tuple is size 3 holding one (x, y, z) group.

To create a tuple, just list the values within parenthesis separated by commas. The "empty" tuple is just an empty pair of parenthesis. Accessing the elements in a tuple is just like a list -- len(), [ ], for, in, etc. all work the same.

In [None]:
t = (1, 2, 'hi')
print len(t)

In [None]:
print t[2]

In [None]:
t[2] = 'bye'  ## NO, tuples cannot be changed

In [None]:
t = (1, 2, 'bye')  ## this works
t

To create a size-1 tuple, the lone element must be followed by a comma.

In [None]:
t = ('hi',)   ## size-1 tuple
t

or even shorter:

In [None]:
t = 'hi',
t

It's a funny case in the syntax, but the comma is necessary to distinguish the tuple from the ordinary case of putting an expression in parentheses. In some cases you can omit the parenthesis and Python will see from the commas that you intend a tuple.

Assigning a tuple to an identically sized tuple of variable names assigns all the corresponding values. This is called **unpacking**.

If the tuples are not the same size, it throws an error. This feature works for lists too.

In [None]:
x, y, z = (42, 13, "hike")
print z  ## hike

### List comprehensions

List comprehensions are a more advanced feature which is nice for some cases but is not needed for the exercises and is not something you need to learn at first (i.e. you can skip this section). A list comprehension is a compact way to write an expression that expands to a whole list. Suppose we have a list nums [1, 2, 3], here is the list comprehension to compute a list of their squares [1, 4, 9]:

In [None]:
nums = [1, 2, 3, 4]

In [None]:
squares = [n * n for n in nums]
squares

The syntax is `[expr for var in list]` -- the for var in list looks like a regular for-loop, but without the colon (:). The expr to its left is evaluated once for each element to give the values for the new list. Here is an example with strings, where each string is changed to upper case with '!!!' appended:

In [None]:
strs = ['hello', 'and', 'goodbye']

In [None]:
shouting = [s.upper() + '!!!' for s in strs]
shouting

You can add an if test to the right of the for-loop to narrow the result. The if test is evaluated for each element, including only the elements where the test is true.

In [None]:
## Select values <= 2
nums = [2, 8, 1, 6]
small = [n for n in nums if n <= 2]
small

In [None]:
## Select fruits containing 'a', change to upper case
fruits = ['apple', 'cherry', 'banana', 'lemon']
a_fruits = [s.upper() for s in fruits if 'a' in s]
a_fruits

### Dict Hash Table

Python's efficient key/value hash table structure is called a "dict". The contents of a dict can be written as a series of key:value pairs within braces { }, e.g. dict = {key1:value1, key2:value2, ... }. The "empty dict" is just an empty pair of curly braces {}.

Looking up or setting a value in a dict uses square brackets, e.g. dict['foo'] looks up the value under the key 'foo'. Strings, numbers, and tuples work as keys, and any type can be a value. Other types may or may not work correctly as keys (strings and tuples work cleanly since they are immutable). Looking up a value which is not in the dict throws a KeyError -- use "in" to check if the key is in the dict, or use dict.get(key) which returns the value or None if the key is not present (or get(key, not-found) allows you to specify what value to return in the not-found case).

In [None]:
## Can build up a dict by starting with the the empty dict {}
## and storing key/value pairs into the dict like this:
## dict[key] = value-for-that-key
d = {}
d['a'] = 'alpha'
d['g'] = 'gamma'
d['o'] = 'omega'
d

In [None]:
print d['a']     ## Simple lookup, returns 'alpha'

In [None]:
d['a'] = 'aleph'       ## Put new key/value into dict

In [None]:
'a' in d         ## True

That line above is actually short for:

In [None]:
'a' in d.keys()

In [None]:
print d['z']                  ## Throws KeyError

In [None]:
if 'z' in d:
    print d['z']     ## Avoid KeyError

In [None]:
print d.get('z')  ## None (instead of KeyError)

In [None]:
print d.get('z', 'oops, not here')  ## default value if not found

![](figures/0.2-dict.png)

A for loop on a dictionary iterates over its keys by default. The keys will appear in an arbitrary order. The methods dict.keys() and dict.values() return lists of the keys or values explicitly. There's also an items() which returns a list of (key, value) tuples, which is the most efficient way to examine all the key value data in the dictionary. All of these lists can be passed to the sorted() function.

In [None]:
## By default, iterating over a dict iterates over its keys.
## Note that the keys are in a random order.
for key in d:
    print key

In [None]:
## Exactly the same as above
for key in d.keys():
    print key

In [None]:
## Get the .keys() list:
d.keys()

In [None]:
## Likewise, there's a .values() list of values
d.values()

In [None]:
## Common case -- loop over the keys in sorted order,
## accessing each key/value
for key in sorted(d.keys()):
    print key, d[key]

In [None]:
## .items() is the dict expressed as (key, value) tuples
print d.items()  ##  [('a', 'alpha'), ('o', 'omega'), ('g', 'gamma')]

In [None]:
## This loop syntax accesses the whole dict by looping
## over the .items() tuple list, accessing one (key, value)
## pair on each iteration.
for k, v in d.items():
    print k, '>', v

There are "iter" variants of these methods called iterkeys(), itervalues() and iteritems() which avoid the cost of constructing the whole list -- a performance win if the data is huge. However, I generally prefer the plain keys() and values() methods with their sensible names. In Python 3, the need for the iterkeys() variants is going away.

Strategy note: from a performance point of view, the dictionary is one of your greatest tools, and you should use where you can as an easy way to organize data. For example, you might read a log file where each line begins with an ip address, and store the data into a dict using the ip address as the key, and the list of lines where it appears as the value. Once you've read in the whole file, you can look up any ip address and instantly see its list of lines. The dictionary takes in scattered data and make it into something coherent.

### Del

The "del" operator does deletions. In the simplest case, it can remove the definition of a variable, as if that variable had not been defined. Del can also be used on list elements or slices to delete that part of the list and to delete entries from a dictionary.

In [None]:
var = 6
del var  # var no more!
var

In [None]:
l = ['a', 'b', 'c', 'd']
del l[0]     ## Delete first element
del l[-2:]   ## Delete last two elements
l

In [None]:
d = {'a':1, 'b':2, 'c':3}
del d['b']   ## Delete 'b' entry
d

### Files

The open() function opens and returns a file handle that can be used to read or write a file in the usual way. The code f = open('name', 'r') opens the file into the variable f, ready for reading operations, and use f.close() when finished. Instead of 'r', use 'w' for writing, and 'a' for append. The special mode 'rU' is the "Universal" option for text files where it's smart about converting different line-endings so they always come through as a simple '\n'. The standard for-loop works for text files, iterating through the lines of the file (this works only for text files, not binary files). The for-loop technique is a simple and efficient way to look at all the lines in a text file:

In [None]:
# Echo the contents of a file
f = open('data/bleak-house.txt', 'rU')

mr_bucket_mentions = []

for line in f:
    if 'Mr. Bucket' in line:
        mr_bucket_mentions.append(line)
    
f.close()

In [None]:
mr_bucket_mentions[:10]

In [None]:
len(mr_bucket_mentions)

Reading one line at a time has the nice quality that not all the file needs to fit in memory at one time -- handy if you want to look at every line in a 10 gigabyte file without using 10 gigabytes of memory. The f.readlines() method reads the whole file into memory and returns its contents as a list of its lines. The f.read() method reads the whole file into a single string, which can be a handy way to deal with the text all at once, such as with regular expressions we'll see later.

In [None]:
f = open('data/bleak-house.txt', 'rU')
mr_bucket_mentions = [l for l in f.readlines() if 'Mr. Bucket' in l]
f.close()

In [None]:
len(mr_bucket_mentions)

For writing, `f.write(string)` method is the easiest way to write data to an open output file.

In [None]:
f = open('scratch/misc.txt', 'w')  # 'w' opens for writing

for mention in mr_bucket_mentions:
    f.write(mention)
    
f.close()

In [None]:
!head scratch/misc.txt