This chapter discusses capabilities built into the Python language that will be used
ubiquitously throughout the book.

We’ll start with Python’s workhorse data structures: tuples, lists, dicts, and sets. Then,
we’ll discuss creating your own reusable Python functions. Finally, we’ll look at the
mechanics of Python file objects and interacting with your local hard drive.

# Data Structures and Sequences

## Tuple
A tuple is a fixed-length, immutable sequence of Python objects. The easiest way to
create one is with a comma-separated sequence of values:

In [1]:
tup = 4, 5, 6
tup

(4, 5, 6)

When you’re defining tuples in more complicated expressions, it’s often necessary to
enclose the values in parentheses.

In [2]:
nested_tup = (4, 5, 6), (7, 8)
nested_tup

((4, 5, 6), (7, 8))

You can convert any sequence or iterator to a tuple by invoking tuple:

In [3]:
tuple([1, 2, 3])

(1, 2, 3)

In [4]:
tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

Elements can be accessed with square brackets [] as with most other sequence types.
As in C, C++, Java, and many other languages, sequences are 0-indexed in Python:

In [5]:
tup[0]

's'

While the objects stored in a tuple may be mutable themselves, once the tuple is cre‐
ated it’s not possible to modify which object is stored in each slot:

In [6]:
tup = tuple(['foo', [1, 2], True])
tup[2] = False

TypeError: 'tuple' object does not support item assignment

If an object inside a tuple is mutable, such as a list, you can modify it in-place:

In [None]:
tup[1].append(3)
tup

You can concatenate tuples using the + operator to produce longer tuples:

In [None]:
(4, None, 'foo') + (6, 0) + ('bar', )

Multiplying a tuple by an integer, as with lists, has the effect of concatenating together
that many copies of the tuple:

In [None]:
('foo', 'bar') * 4

**Note that the objects themselves are not copied, only the references to them.**

In [None]:
a = (4, [1, 2])
b = a * 2
print(b)
a[1][1] = 7
print(b)

### Unpacking tuples
If you try to assign to a tuple-like expression of variables, Python will attempt to
unpack the value on the righthand side of the equals sign:

In [None]:
tup = (5, 7, 9)
a, b, c = tup
b

Even sequences with nested tuples can be unpacked:

In [None]:
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
b

Using this functionality you can easily swap variable names.

In [None]:
a, b = 1, 2

In [None]:
b, a = a, b

**A common use of variable unpacking is iterating over sequences of tuples or lists:**

In [None]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print('a={}, b={}, c={}'.format(a, b, c))

Another common use is returning multiple values from a function.

The Python language recently acquired some more advanced tuple unpacking to help
with situations where you may want to “pluck” a few elements from the beginning of
a tuple. This uses the special syntax *rest, which is also used in function signatures
to capture an arbitrarily long list of positional arguments:

In [None]:
values = 1, 2, 3, 4, 5, 6
a, b, *rest = values
a, b

In [None]:
rest

This rest bit is sometimes something you want to discard; there is nothing special
about the rest name. As a matter of convention, many Python programmers will use
the underscore (_) for unwanted variables:

In [None]:
a, b, *_ = values

### Tuple methods
Since the size and contents of a tuple cannot be modified, it is very light on instance
methods. A particularly useful one (also available on lists) is count, which counts the
number of occurrences of a value:

In [None]:
a = (1, 7, 7, 7, 3, 4, 7)
a.count(7)

## List
In contrast with tuples, lists are variable-length and their contents can be modified
in-place. You can define them using square brackets [] or using the list type func‐
tion:

In [None]:
a_list = [2, 3, 7, None]
tup = ('foo', 'bar', 'baz')
b_list = list(tup)
b_list

In [None]:
b_list[1] = 'peekaboo'
b_list

The list function is frequently used in data processing **as a way to materialize an
iterator or generator expression:**

In [7]:
gen = range(10)
gen

range(0, 10)

In [8]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### Adding and removing elements
Elements can be appended to the end of the list with the append method:

In [9]:
b_list.append('dwarf')
b_list

NameError: name 'b_list' is not defined

Using insert you can insert an element at a specific location in the list:

In [10]:
b_list.insert(2, 'red')

NameError: name 'b_list' is not defined

The insertion index must be between 0 and the length of the list, inclusive.

insert is computationally expensive compared with append, because references to subsequent elements have to be shifted internally to make room for the new element. If you need to insert elements at both the beginning and end of a sequence, you may wish to explore collections.deque, a double-ended queue, for this purpose.

The inverse operation to insert is pop, which removes and returns an element at a
particular index:

In [11]:
b_list.pop(2)

NameError: name 'b_list' is not defined

In [None]:
b_list

Elements can be removed by value with remove, which locates the first such value and
removes it from the last:

In [12]:
b_list.append('foo')
b_list

NameError: name 'b_list' is not defined

In [None]:
b_list.remove('foo')
b_list

If performance is not a concern, by using append and remove, you can use a Python
list as a perfectly suitable “multiset” data structure.

Check if a list contains a value using the ```in``` keyword:

In [13]:
'dwarf' in b_list

NameError: name 'b_list' is not defined

The keyword not can be used to negate in:

In [14]:
'dwarf' not in b_list

NameError: name 'b_list' is not defined

Checking whether a list contains a value is a lot slower than doing so with dicts and
sets, as Python makes a linear scan across the values of the list, whereas it can check the others (based on hash tables) in constant time.

### Concatenating and combining lists
Similar to tuples, adding two lists together with + concatenates them:

In [15]:
[4, None, 'foo'] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

If you have a list already defined, you can append multiple elements to it using the
extend method:

In [16]:
x = [4, None, 'foo']
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

**Note that list concatenation by addition is a comparatively expensive operation since
a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable.**

### Sorting
You can sort a list in-place (without creating a new object) by calling its sort
function:

In [17]:
a = [7, 2, 5, 3, 1]
a.sort
a

[7, 2, 5, 3, 1]

```sort``` has a few options that will occasionally come in handy. One is the ability to pass a secondary sort key—that is, a function that produces a value to use to sort the
objects.

In [18]:
b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

### Binary search and maintaining a sorted list
The built-in bisect module implements binary search and insertion into a sorted list.
bisect.bisect finds the location where an element should be inserted to keep it sor‐
ted, while bisect.insort actually inserts the element into that location:

In [19]:
import bisect
c = [1, 2, 4, 5, 7]
bisect.bisect(c, 3), bisect.bisect(c, 6)

(2, 4)

In [20]:
bisect.insort(c, 7)
c

[1, 2, 4, 5, 7, 7]

**The bisect module functions do not check whether the list is sorted, as doing so would be computationally expensive. Thus, using them with an unsorted list will succeed without error but may lead to incorrect results.**

### Slicing
You can select sections of most sequence types by using slice notation, which in its
basic form consists of start:stop passed to the indexing operator []:

In [21]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1: 5]

[2, 3, 7, 5]

Slices can also be assigned to with a sequence:

In [22]:
seq[3: 4] = [6, 3]
seq

[7, 2, 3, 6, 3, 5, 6, 0, 1]

While the element at the start index is included, the stop index is not included, so that the number of elements in the result is stop - start.

Either the start or stop can be omitted, in which case they default to the start of the
sequence and the end of the sequence, respectively:

In [23]:
seq[: 5]

[7, 2, 3, 6, 3]

In [24]:
seq[3: ]

[6, 3, 5, 6, 0, 1]

Negative indices slice the sequence relative to the end:

In [25]:
seq[-4: ]

[5, 6, 0, 1]

In [26]:
seq[-6: -2]

[6, 3, 5, 6]

A step can also be used after a second colon to, say, take every other element:

In [27]:
seq[: : 2]

[7, 3, 3, 6, 1]

A clever use of this is to pass -1, which has the useful effect of reversing a list or tuple:

In [28]:
seq[: : -1]

[1, 0, 6, 5, 3, 6, 3, 2, 7]

## Built-in Sequence Functions

### enumerate
It’s common when iterating over a sequence to want to keep track of the index of the current item.

Since this is so common, Python has a built-in function, enumerate, which returns a
sequence of (i, value) tuples:
```python
for i, value in enumerate(collection):
    # do something with value
```

When you are indexing data, a helpful pattern that uses enumerate is computing a
dict mapping the values of a sequence (which are assumed to be unique) to their
locations in the sequence:

In [29]:
some_list = ['foo', 'bar', 'baz']
mapping = {}
for i, v in enumerate(some_list):
    mapping[v] = i
mapping

{'foo': 0, 'bar': 1, 'baz': 2}

### sorted
The sorted function returns a new sorted list from the elements of any sequence:

In [30]:
a = [7, 1, 2, 6, 0, 3, 2]
sorted(a), a

([0, 1, 2, 2, 3, 6, 7], [7, 1, 2, 6, 0, 3, 2])

The sorted function accepts the same arguments as the sort method on lists.

### zip
zip “pairs” up the elements of a number of lists, tuples, or other sequences to **create a list of tuples:**

In [31]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2)

In [32]:
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

zip can take an arbitrary number of sequences, and the number of elements it produces is determined by the shortest sequence:

In [33]:
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

A very common use of zip is simultaneously iterating over multiple sequences, possibly also combined with enumerate:

In [34]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print('{}: {}, {}'.format(i, a, b))

0: foo, one
1: bar, two
2: baz, three


Another way to think about this is converting a list of rows into a list of
columns. The syntax, which looks a bit magical, is:

In [35]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'), ('Schilling', 'Curt')]

In [36]:
first_names, last_names = zip(*pitchers)
first_names, last_names

(('Nolan', 'Roger', 'Schilling'), ('Ryan', 'Clemens', 'Curt'))

### reversed
reversed iterates over the elements of a sequence in reverse order:

In [37]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Keep in mind that reversed is a generator, so it does not create the reversed sequence until materialized.

## dict
dict is likely the most important built-in Python data structure. A more common
name for it is hash map or associative array. It is a flexibly sized collection of key-value
pairs, where key and value are Python objects. One approach for creating one is to use
curly braces {} and colons to separate keys and values:

In [38]:
empty_dict = {}
d1 = {'a': 'some value', 'b': [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

You can access, insert, or set elements using the same syntax as for accessing elements of a list or tuple:

In [39]:
d1[7] = 'an integer'
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [40]:
d1['b']

[1, 2, 3, 4]

You can **check if a dict contains a key** using the same syntax used for checking whether a list or tuple contains a value:

In [41]:
'some value' in d1, 'b' in d1

(False, True)

You can delete values either using the del keyword or the pop method:

In [42]:
d1[5] = 'some value'
d1['dummy'] = 'another value'
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}

In [43]:
del d1[5]
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 'dummy': 'another value'}

In [44]:
ret = d1.pop('dummy')
ret

'another value'

In [45]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

The keys and values method give you iterators of the dict’s keys and values, respectively. While the key-value pairs are not in any particular order, these functions output the keys and values in the same order:

In [46]:
list(d1.keys())

['a', 'b', 7]

In [47]:
list(d1.values())

['some value', [1, 2, 3, 4], 'an integer']

You can merge one dict into another using the update method:

In [48]:
d1.update({'b': 'foo', 'c': 12})
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

The update method changes dicts in-place, so any existing keys in the data passed to update will have their old values discarded.

### Creating dicts from sequences
It’s common to occasionally end up with two sequences that you want to pair up element-wise in a dict. Since a dict is essentially a collection of 2-tuples, the dict function accepts a list of
2-tuples:

In [49]:
mapping = dict(zip(range(5,), reversed(range(5))))
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

Later we’ll talk about dict comprehensions, another elegant way to construct dicts.

### Default values
It’s very common to have logic like:
```python
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value
```
Thus, the dict methods get and pop can take a default value to be returned, so that
the above if-else block can be written simply as:
```python
value = some_dict.get(key, default_value)
```

get by default will return None if the key is not present, while pop will raise an exception. With setting values, a common case is for the values in a dict to be other collections, like lists.

In [50]:
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The setdefault dict method **inserts key with a value of default if key is not in the dictionary.** The preceding for loop can be rewritten as:

In [51]:
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The built-in collections module has a useful class, defaultdict, which makes this
even easier. To create one, you pass a type or function for generating the default value
for each slot in the dict:

In [52]:
from collections import defaultdict
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)

by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

### Valid dict key types
While the values of a dict can be any Python object, **the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too).** The technical term here is hashability. You can check whether an object is hashable (can be used as a key in a dict) with the hash function:

In [53]:
hash('string')

-3872847231434160

In [54]:
hash((1, 2, (1, 2)))

-1429464707349485113

In [55]:
hash((1, 2, [1, 2]))

TypeError: unhashable type: 'list'

To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can:

In [56]:
d = {}
d[tuple([1, 2, 4])] = 5
d

{(1, 2, 4): 5}

## set
A set is **an unordered collection of unique elements.** You can think of them like dicts, but keys only, no values. A set can be created in two ways: via the set function or via a set literal with curly braces:

In [57]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [58]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

Sets support mathematical set operations like union, intersection, difference, and symmetric difference.

In [59]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

In [60]:
a.union(b), a | b

({1, 2, 3, 4, 5, 6, 7, 8}, {1, 2, 3, 4, 5, 6, 7, 8})

The intersection contains the elements occurring in both sets. The & operator or the intersection method can be used:

In [61]:
a.intersection(b), a & b

({3, 4, 5}, {3, 4, 5})

|Function|Alternative syntax|Description|
|---|---|---|
a.add(x)|N/A|Add element x to the set a
a.clear()|N/A|Reset the set a to an empty state, discarding all of its elements
a.remove(x)|N/A|Remove element x from the set a
a.pop()|N/A|Remove an arbitrary element from the set a, raising KeyError if the set is empty
a.union(b)|a \| b|All of the unique elements in a and b
a.update(b)|a \|= b|Set the contents of a to be the union of the elements in a and b
a.intersection(b)| a & b|All of the elements in both a and b
a.intersection_update(b)|a &= b|Set the contents of a to be the intersection of the elements in a and b 
a.difference(b)|a - b|The elements in a that are not in b
a.difference_update(b)|a -= b|Set a to the elements in a that are not in b
a.symmetric_difference(b)|a ^ b|All of the elements in either a or b but not both
a.symmetric_difference_update(b)|a ^= b|Set a to contain the elements in either a or b but not both
a.issubset(b)|N/A|True if the elements of a are all contained in b
a.issuperset(b)|N/A|True if the elements of b are all contained in a
a.isdisjoint(b)|N/A|True if a and b have no elements in common

All of the logical set operations have in-place counterparts, which enable you to
replace the contents of the set on the left side of the operation with the result. For
very large sets, this may be more efficient:

In [62]:
c = a.copy()
c |= b
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [63]:
d = c.copy()
d &= b
d

{3, 4, 5, 6, 7, 8}

Like dicts, set elements generally must be immutable. To have list-like elements, you
must convert it to a tuple:

In [64]:
my_data = [1, 2, 3, 4, 5]
my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4, 5)}

You can also check if a set is a subset of (is contained in) or a superset of (contains all
elements of) another set:

In [65]:
a_set = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(a_set), a_set.issuperset({1, 2, 3})

(True, True)

Sets are equal if and only if their contents are equal:

In [66]:
{1, 2, 3} == {3, 2, 1}

True

## List, Set, and Dict Comprehensions

List comprehensions are one of the most-loved Python language features. They allow
you to concisely form a new list by filtering the elements of a collection, transforming
the elements passing the filter in one concise expression. They take the basic form:
```python
[expr for val in collection if condition]
```
This is equivalent to the following for loop:
```python
result = []
for val in collection:
    if condition:
        result.append(val)
```
The filter condition can be omitted, leaving only the expression.

In [67]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

Set and dict comprehensions are a natural extension, producing sets and dicts in an
idiomatically similar way instead of lists. A dict comprehension looks like this:
```python
dict_comp = {key-expr : value-expr for value in collection if condition}
```

A set comprehension looks like the equivalent list comprehension except with curly braces instead of square brackets:
```python
set_comp = {expr for value in collection if condition}
```

In [68]:
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

We could also express this more functionally using the map function, introduced shortly:

In [69]:
set(map(len, strings))

{1, 2, 3, 4, 6}

As a simple dict comprehension example, we could create a lookup map of these strings to their locations in the list:

In [70]:
loc_mapping = {val: index for index, val in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

### Nested list comprehensions
Suppose we have a list of lists containing some English and Spanish names:

In [71]:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

You might have gotten these names from a couple of files and decided to organize
them by language. Now, suppose we wanted to get a single list containing all names
with two or more e’s in them. We could certainly do this with a simple for loop:

In [72]:
name_of_interest = []
for names in all_data:
    enough_es = [name for name in names if name.count('e') >= 2]
    name_of_interest.extend(enough_es)
name_of_interest

['Steven']

You can actually wrap this whole operation up in a single nested list comprehension,
which will look like:

In [73]:
result = [name for names in all_data for name in names if name.count('e') >= 2]
result

['Steven']

At first, nested list comprehensions are a bit hard to wrap your head around. The for
parts of the list comprehension are arranged according to the order of nesting, and
any filter condition is put at the end as before.

In [74]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Keep in mind that the order of the for expressions would be the same if you wrote a nested for loop instead of a list comprehension.

You can have arbitrarily many levels of nesting, though if you have more than two or
three levels of nesting you should probably start to question whether this makes sense
from a code readability standpoint. It’s important to distinguish the syntax just shown
from a list comprehension inside a list comprehension, which is also perfectly valid:

In [75]:
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Functions

Functions are declared with the def keyword and returned from with the return keyword:

In [76]:
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

There is no issue with having multiple return statements. If Python reaches the end
of a function without encountering a return statement, None is returned automatically.

Each function can have positional arguments and keyword arguments. Keyword arguments are most commonly used to specify default values or optional arguments. In
the preceding function, x and y are positional arguments while z is a keyword argument.

In [77]:
my_function(1, 2, z=3), my_function(3, 2, 3.5), my_function(10, 20)

(9, 17.5, 45.0)

**The main restriction on function arguments is that the keyword arguments must follow the positional arguments (if any).**

It is possible to use keywords for passing positional arguments as
well. In the preceding example, we could also have written:

In [78]:
my_function(x=5, y=6, z=7), my_function(y=6, x=5, z=7)

(77, 77)

In some cases this can help with readability.

## Namespaces, Scope, and Local Functions
Functions can access variables in two different scopes: global and local. An alternative and more descriptive name describing a variable scope in Python is a namespace. Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and immediately populated by the function’s arguments. After the function is finished, the local namespace is destroyed.

Assigning variables outside of the function’s scope is possible, but those variables must be declared as global via the global keyword:

In [79]:
a = None
def bind_a_variable():
    global a
    a = []
bind_a_variable()
print(a)

[]


**I generally discourage use of the global keyword. Typically global
variables are used to store some kind of state in a system. If you
find yourself using a lot of them, it may indicate a need for object-
oriented programming (using classes).**

## Returning Multiple Values

When I first programmed in Python after having programmed in C++, one
of my favorite features was the ability to return multiple values from a function with
simple syntax.

In [80]:
def f():
    a = 6
    b = 7
    c = 8
    return a, b, c
a, b, c = f()

What’s happening here is that the function is actually just returning one object, namely a tuple, which is then being unpacked into the result variables. In the preceding example, we could have done this instead:

In [81]:
return_value = f()
return_value

(6, 7, 8)

## Functions Are Objects
Since Python functions are objects, many constructs can be easily expressed that are
difficult to do in other languages.

In [82]:
states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda', 
          'south carolina##', 'West virginia?']

Lots of things need to happen to make this list of strings uniform and
ready for analysis: stripping whitespace, removing punctuation symbols, and standardizing on proper capitalization. One way to do this is to use built-in string methods along with the re standard library module for regular expressions:

In [83]:
import re
def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

An alternative approach that you may find useful is to make a list of the operations you want to apply to a particular set of strings:

In [84]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

You can use functions as arguments to other functions like the built-in map function,
which applies a function to a sequence of some kind:

In [85]:
for x in map(remove_punctuation, states):
    print(x)

 Alabama 
Georgia
Georgia
georgia
FlOrIda
south carolina
West virginia


## Anonymous (Lambda) Functions
Python has support for so-called anonymous or lambda functions, which are a way of
writing functions consisting of a single statement, the result of which is the return
value. They are defined with the lambda keyword, which has no meaning other than
“we are declaring an anonymous function”:

In [86]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

They are especially
convenient in data analysis because, as you’ll see, there are many cases where data
transformation functions will take functions as arguments. It’s often less typing (and
clearer) to pass a lambda function as opposed to writing a full-out function declara‐
tion or even assigning the lambda function to a local variable.

In [87]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [88]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']
strings.sort(key=lambda x: len(set(list(x))))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

One reason lambda functions are called anonymous functions is
that , unlike functions declared with the def keyword, the function
object itself is never given an explicit __name__ attribute.

## Currying: Partial Argument Application
Currying is computer science jargon (named after the mathematician Haskell Curry)
that means deriving new functions from existing ones by partial argument application. 

In [89]:
def add_numbers(x, y):
    return x + y

Using this function, we could derive a new function of one variable, add_five, that adds 5 to its argument:

In [90]:
add_five = lambda y: add_numbers(5, y)

The second argument to add_numbers is said to be curried. There’s nothing very fancy
here, as all we’ve really done is define a new function that calls an existing function.
The built-in functools module can simplify this process using the partial function:

In [91]:
from functools import partial
add_five = partial(add_numbers, 5)

## Generators
Having a consistent way to iterate over sequences, like objects in a list or lines in a
file, is an important Python feature. This is accomplished by means of the iterator
protocol, a generic way to make objects iterable.

In [92]:
some_dict = {'a': 1, 'b': 2, 'c': 3}
for key in some_dict:
    print(key)

a
b
c


When you write ```for key in some_dict```, the Python interpreter first attempts to create an iterator out of some_dict:

In [93]:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x1ecfd393220>

An iterator is any object that will yield objects to the Python interpreter when used in
a context like a for loop. Most methods expecting a list or list-like object will also
accept any iterable object. This includes built-in methods such as min, max, and sum,
and type constructors like list and tuple:

In [94]:
list(dict_iterator)

['a', 'b', 'c']

A generator is a concise way to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators return a sequence of
multiple results lazily, pausing after each one until the next one is requested. To create
a generator, use the yield keyword instead of return in a function:

In [95]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

When you actually call the generator, no code is immediately executed:

In [96]:
gen = squares()
gen

<generator object squares at 0x000001ECFD3B00B0>

It is not until you request elements from the generator that it begins executing its
code:

In [97]:
for x in gen:
    print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

### Generator expresssions
Another even more concise way to make a generator is by using a generator expres‐
sion. This is a generator analogue to list, dict, and set comprehensions

In [99]:
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x000001EC8CBBD200>

Generator expressions can be used instead of list comprehensions as function arguments in many cases:

In [100]:
sum(x ** 2 for x in range(100))

328350

In [101]:
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

### itertools module
The standard library itertools module has a collection of generators for many com‐
mon data algorithms.

In [102]:
import itertools
first_letter = lambda x: x[0]
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names))

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


Function|Description
---|---
combinations(iterable, k)|Generates a sequence of all possible k-tuples of elements in the iterable, ignoring order and without replacement (see also the companion function combinations_with_replacement)
permutations(iterable, k)|Generates a sequence of all possible k-tuples of elements in the iterable, respecting order 
groupby(iterable[, keyfunc])|Generates (key, sub-iterator) for each unique key
product(*iterables, repeat=1)|Generates the Cartesian product of the input iterables as tuples, similar to a nested for loop

In [106]:
for i, j in itertools.product([2, 3], [4, 3, 2]):
    print((i,j))

(2, 4)
(2, 3)
(2, 2)
(3, 4)
(3, 3)
(3, 2)


## Errors and Exception Handling
Handling Python errors or exceptions gracefully is an important part of building
robust programs. In data analysis applications, many functions only work on certain
kinds of input. As an example, Python’s float function is capable of casting a string
to a floating-point number, but fails with ValueError on improper inputs:

In [107]:
float('1.2345')

1.2345

In [108]:
float('string')

ValueError: could not convert string to float: 'string'

Suppose we wanted a version of float that fails gracefully, returning the input argument. We can do this by writing a function that encloses the call to float in a try/
except block:

In [109]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

The code in the except part of the block will only be executed if float(x) raises an
exception:

In [112]:
attempt_float('1.2345'), attempt_float('string')

(1.2345, 'string')

In [113]:
float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

You might want to only suppress ValueError, since a TypeError (the input was not a
string or numeric value) might indicate a legitimate bug in your program. To do that,
write the exception type after except:

In [114]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [115]:
attempt_float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

In [1]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

### Exceptions in IPython

In [2]:
%run examples/ipython_bug.py

Exception: File `'examples/ipython_bug.py'` not found.

If an exception is raised while you are %run-ing a script or executing any statement,
IPython will by default print a full call stack trace (traceback) with a few lines of context around the position at each point in the stack:

Having additional context by itself is a big advantage over the standard Python inter‐
preter (which does not provide any additional context). You can control the amount
of context shown using the %xmode magic command, from Plain (same as the standard Python interpreter) to Verbose (which inlines function argument values and more). As you will see later in the chapter, you can step into the stack (using the
%debug or %pdb magics) after an error has occurred for interactive post-mortem
debugging.

# Files and the Operating System

Most of this book uses high-level tools like pandas.read_csv to read data files from
disk into Python data structures. However, it’s important to understand the basics of
how to work with files in Python.

To open a file for reading or writing, use the built-in open function with either a relative or absolute file path:

In [4]:
path = 'README.txt'
f = open(path)