In [None]:
# enable wordwrap
#https://stackoverflow.com/questions/58890109/line-wrapping-in-collaboratory-google-results
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
    td,th,p {
        font-size: 18px
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# 3.1 Data Structures and Sequences 

Python has simple but powerful data structures.



## Tuple

Tuples are fixed-length, immutable sequences of Python objects. Create tuples using comma-separated sequences of values.

In [None]:
tup = 4, 5, 6

In [None]:
tup

(4, 5, 6)

When you're defining tuples in more complicated expressions, it's often necessary to enclose the values in parentheses, as in the following example of creating tuples:


In [None]:
nested_tup = (4, 5, 6), (7, 8)
nested_tup

((4, 5, 6), (7, 8))

You can convert any sequence or iterator to a tuple by invoking `tuple`:

In [None]:
tuple([4, 0, 2])

(4, 0, 2)

In [None]:
tup = tuple('string')

In [None]:
tup

('s', 't', 'r', 'i', 'n', 'g')

Elements can be accessed with square brackets `[]` as with most other sequence types. Sequences are 0-indexed in Python.

In [None]:
tup[0]

's'

While the objects in a tuple may be mutable themselves, once the tuple is created it's not possible to modify which object is stored in each slot.

In [None]:
tup = tuple(['foo', [1, 2], True])

In [None]:
tup[2] = False

TypeError: ignored

If an object, such as a list, inside a tuple is mutable, you can modify it in-place.

In [None]:
tup[1].append(3)

In [None]:
tup

('foo', [1, 2, 3], True)

You can concatenate tuples using the + operator to produce longer tuples:

In [None]:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

Multiplying a tuple by an integer, as with lists, concatenates together the copies of the tuple:

In [None]:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

The objects themseelves are not copied, only the references to them.

### Unpacking tuples 📦

If you try to *assign* to a tuple-like expression of variables, Python tries to *unpack* the value on the righthand side of the equals sign.

In [None]:
tup = (4, 5, 6)

In [None]:
a, b, c = tup

In [None]:
b

5

Even sequences with nested tuples can be unpacked.


In [None]:
tup = 4, 5, (6, 7)

a, b, (c, d) = tup
d

7

Using this functionality you can easily swap variable names, a task which in many languages might look like:

In [None]:
tmp = a
a = b
b = tmp

In Python, the swap can be done like this:

In [None]:
a, b = 1, 2

In [None]:
a

1

In [None]:
b

2

In [None]:
b, a = a, b

In [None]:
a

2

In [None]:
b

1

A common use of variable unpacking is iterating over sequences of tuples or lists:

In [None]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [None]:
for a, b, c, in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


Another common use is returning multiple values from a function. 

The Python language recently acquired some more advanced tuple unpacking to help with situations where you may want to "pluck" a few elements from the beginning of a tuple. This uses the special syntax `*rest`, which is used in function signatures to capture arbitrarily long lists of positional arguments:

In [None]:
values = 1, 2, 3, 4, 5

a, b, *rest = values
a, b

(1, 2)

In [None]:
rest

[3, 4, 5]

The `rest` bit is something you want to discard; Many Python programmers will use the underscore (`_`) for unwanted variables:

In [None]:
a, b, *_ = values

### Tuple methods

Since the size and contents of a tuple cannot be modified, it is very light on instance methods. A useful one, also available on lists is `count`, which counts the number of occurrences of a value.

In [None]:
a = (1, 2, 2, 2, 3, 4, 2)

In [None]:
a.count(2)

4

## List

In contrast with tuples, lists are variable-length and their contents can be modified in-place. You can define lists using square brackets `[]` or using the `list` type function:

In [None]:
a_list = [2, 3, 7, None]
tup = ('foo', 'bar', 'baz')
b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [None]:
b_list[1] = 'peekaboo'

In [None]:
b_list

['foo', 'peekaboo', 'baz']

Lists and tuples are semantically similar, although tuples cannot be modified. Lists and tuples can be used interchangeably in many functions.

The `list` function is used often in data processing as a way to materialize an iterator or generator expression.


In [None]:
gen = range(10)
gen

range(0, 10)

In [None]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### Adding and removing elements

To append elements to the end of a list, use the `append` method.

In [None]:
b_list.append('dwarf')

In [None]:
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

`insert` inserts an element at a specific location in a list.

In [None]:
b_list.insert(1, 'red')

In [None]:
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']

The insertion index must be between 0 and the length of the list, inclusive.

📓❗

`insert` is computationally expensive compared with `append`. To insert elements at both the beginning and end of a sequence, you may want to explore `collections.deque`, a double-ended queue.


`pop` is the inverse operation to `insert`, which removes and returns an element at a particular index.

In [None]:
b_list.pop(2)

'peekaboo'

In [None]:
b_list

['foo', 'red', 'baz', 'dwarf']

Use `remove` to remove elements from a list, which locates the first value and removes it.

In [None]:
b_list.append('foo')

In [None]:
b_list

['foo', 'red', 'baz', 'dwarf', 'foo']

In [None]:
b_list.remove('foo')

In [None]:
b_list

['red', 'baz', 'dwarf', 'foo']

If performance is not a concern, by using `append` and `remove`, you can use a Python list as a perfectly suitable "multiset" data structure.

Check if a list contains a value using the `in` keyword:

In [None]:
'dwarf' in b_list

True

The keyword `not` can be used to negate `in`.

In [None]:
'dwarf' not in b_list

False

Checking whether a list contains a value is a lot slower than doing so with dicts and sets (to be introduced shortly). Python makes a linear scan across the values of the list, whereas it can check the others (based on hash tables) in constant time.

### Concatenating and combining lists

Similar to tuples, adding two lists together with `+` concatenates them:

In [None]:
[4, None, 'foo'] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

If you have a list already defined, you can append multiple elements to it using the `extend` method:

In [None]:
x = [4, None, 'foo']

In [None]:
x.extend([7, 8, (2, 3)])

In [None]:
x

[4, None, 'foo', 7, 8, (2, 3)]

Note that list concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over. Using `extend` to append elements to an existing list, especially if you are building up a large list, is usually preferable.

```python
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)
```

is faster than the concatenative alternative:

```python
everything = []
for chunk in list_of_lists:
    everything = everything + chunk
```

### Sorting

You can sort a list in-place (without creating a new object) by calling its `sort` function:

In [None]:
a = [7, 2, 5, 1, 3]

In [None]:
a.sort()

In [None]:
a

[1, 2, 3, 5, 7]

`sort` has a few options that come in handy. One is the ability to pass a secondary *sort key,* that is a function that produces a value to use to sort the objects. For example, we could sort a collection of strings by their lengths: 

In [None]:
b = ['saw', 'small', 'He', 'foxes', 'six']

In [None]:
b.sort(key=len)

In [None]:
b

['He', 'saw', 'six', 'small', 'foxes']

The `sorted` function produces a sorted copy of a general sequence.

### Binary search and maintaining a sorted list

The built-in `bisect` module implements binary search and insertion into a sorted list.

`bisect.bisect` finds the location where an element should be inserted to keep it sorted, while `bisect.insort` actually inserts the element into that location.

In [None]:
import bisect

c = [1, 2, 2, 2, 3, 4, 7]
bisect.bisect(c, 2)

4

In [None]:
bisect.bisect(c, 5)

6

In [None]:
bisect.insort(c, 6)

In [None]:
c

[1, 2, 2, 2, 3, 4, 6, 7]

📓❗

The `bisect` module functions do not check whether the list is sorted, as doing so would be computationally expensive. Using them with an unsorted list will succeed without error but may lead to incorrect results.

### Slicing

You can select sections of most sequence types by using slice notation, which in its basic form consists of `start:stop` passed to the indexing operating `[]`.

In [None]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]

In [None]:
seq[1:5]

[2, 3, 7, 5]

Slices can also be assigned to with sequences.

In [None]:
seq[3:4] = [6, 3]

In [None]:
seq

[7, 2, 3, 6, 3, 5, 6, 0, 1]

The element at the `start` index is included, but the `stop` index is *not included*, so the number of elements in the result is `stop - start`.

Either the `start` or `stop` can be omitted, in which case they default to the start of the sequence and the end of the sequence, respectively.

In [None]:
seq[:5]

[7, 2, 3, 6, 3]

In [None]:
seq[3:]

[6, 3, 5, 6, 0, 1]

Negative indices slice the sequence relative to the end:

In [None]:
seq[-4:]

[5, 6, 0, 1]

In [None]:
seq[-6:-2]

[6, 3, 5, 6]

Slicing semantics takes a bit of getting used to if you're coming from R or MATLAB. Figure 3-1 provides an illustration of slicing with positive and negative integers.

A `step` can be used after a second colon to, say, take every other element.

In [None]:
seq[::2]

[7, 3, 3, 6, 1]

A clever use of this is to pass `-1`, which has the useful effect of reversing a list or tuple:

In [None]:
seq[::-1]

[1, 0, 6, 5, 3, 6, 3, 2, 7]

<img src="https://drive.google.com/uc?id=1HapMGXj7QcPDbgSEANFY-ps3x6o1ZNho&authuser=scottminer1205%40gmail.com&usp=drive_fs"/>


## Built-in Sequence Functions

Python has a handful of useful sequence functions that you should familiarize yourself with and use at any opportunity.

### enumerate

Python has a built-in function, `enumerate`, which returns a sequence of `(i, value)` tuples.

```python
for i, value in enumerate(collection): #do something with value
```

When indexing data, a helpful pattern that uses `enumerate` is computing a `dict` mapping the values of a sequence (which are assumed to be unique) to their locations in the sequence:

In [None]:
some_list = ['foo', 'bar', 'baz']

mapping = {}

In [None]:
for i, v in enumerate(some_list):
    mapping[v] = i

In [None]:
mapping

{'bar': 1, 'baz': 2, 'foo': 0}

### sorted

The `sorted` function returns a **new** sorted list from the elements of any sequence:

In [None]:
sorted([7, 1, 2, 6, 0, 3, 2])

[0, 1, 2, 2, 3, 6, 7]

### zip 

`zip` "pairs" up the elements of a number of lists, tuples, or other sequences to create a list of tuples:


In [None]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']

zipped = zip(seq1, seq2)

list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

`zip` takes an arbitrary number of sequences, and the number of elements it produces is determined by the *shortest* sequence.

In [None]:
seq3 = [False, True]

In [None]:
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

A very common use of `zip` is simultaneously iterating over multiple sequences, possibly also combined with `enumerate`.

In [None]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print(f'{i}: {a}, {b}')

0: foo, one
1: bar, two
2: baz, three


Given a "zipped" sequence, `zip` can be applied in a clever way to "unzip" the sequence. Another way to think about this is converting a list of *rows* into a list of *columns*. The syntax, which looks a bit magical, is:

In [None]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'), ('Schilling', 'Curt')]

first_names, last_names = zip(*pitchers)

In [None]:
first_names

('Nolan', 'Roger', 'Schilling')

In [None]:
last_names

('Ryan', 'Clemens', 'Curt')

### reversed

`reversed` iterates over the elements of a sequence in reverse order.

In [None]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Keep in mind that `reversed` is a generator, so it does not create the reversed sequence until materialized (e.g., with `list` or a `for` loop).

## dict

`dict` is the most important built-in Python data structure. A more common name for it is *hash map* or *associative array*. It is a flexibly sized collection of *key-value* pairs, where *key* and *value* are Python objects. One approach for creating one is to use curly braces {} and colons to separate keys and values.

In [None]:
empty_dict = {}

d1 = {'a': 'some value', 'b' : [1, 2, 3, 4]}

In [None]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

You can access, insert, or set elements using the same syntax as for accessing elements of a list or tuple.

In [None]:
d1[7] = 'an integer'

d1

{7: 'an integer', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [None]:
d1['b']

[1, 2, 3, 4]

You can check if a dict contains a key using the same syntax used for checking whether a list or tuple contains a value.

In [None]:
'b' in d1

True

You can delete values either using the `del` keyword or the `pop` method (which simultaneously returns the value and deletes the key).

In [None]:
d1[5] = 'some value'

In [None]:
d1

{5: 'some value', 7: 'an integer', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [None]:
d1['dummy'] = 'another value'

In [None]:
d1

{5: 'some value',
 7: 'an integer',
 'a': 'some value',
 'b': [1, 2, 3, 4],
 'dummy': 'another value'}

In [None]:
del d1[5]

In [None]:
d1

{7: 'an integer',
 'a': 'some value',
 'b': [1, 2, 3, 4],
 'dummy': 'another value'}

In [None]:
ret = d1.pop('dummy')

In [None]:
ret

'another value'

In [None]:
d1

{7: 'an integer', 'a': 'some value', 'b': [1, 2, 3, 4]}

The `keys` and `values` method give iterators of the dict's keys and values, respectively. While the key-value pairs are not in any particular order, the functions output the keys and values in the same order.

In [None]:
list(d1.keys())

['a', 'b', 7]

In [None]:
list(d1.values())

['some value', [1, 2, 3, 4], 'an integer']

You can merge one dict into another using the `update` method.

In [None]:
d1.update({'b' : 'foo', 'c' : 12})

In [None]:
d1

{7: 'an integer', 'a': 'some value', 'b': 'foo', 'c': 12}

The `update` method changes dicts in-place, so any existing keys in the data passed to `update` will have their old values discarded.

### Creating dicts from sequences

It's common to end up with two sequences that you want to pair up element-wise in a dict. As a first cut, you might write code like this#

```python
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
```


Since a `dict` is essentially a collection of 2-tuples, the `dict` function accepts a list of 2-tuples.

In [None]:
mapping = dict(zip(range(5), reversed(range(5))))

In [None]:
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

Later we talk about *dict comprehensions*, another elegant way to construct dicts.

### Default values

It's common to have logic like:

```python
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value
```

The `dict` methods `get` and `pop` can take a default value to be returned, so that the above `if-else` block can be written simply as:

```python
value = some_dict.get(key, default_value)
```

`get` by default will return `None` if the key is not present, while `pop` will raise an exception. With *setting* values, a common case is for the values in a dict to be other collections, like lists. You could image categorizing a list of words by their first letters as a dict of lists:


In [None]:
words = ['apple', 'bat', 'bar', 'atom', 'book']

by_letter = {}

In [None]:
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

In [None]:
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The `setdefault` dict method is for presicely this purpose. The preceding `for` loop can be rewritten as:

In [None]:
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)

The built-in `collections` module has a useful class, `defaultdict`, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict.

In [None]:
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)

In [None]:
by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

### Valid dict key types

The values of a dict can be any Python object, the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable too). The technical term here is *hashability*. You can check whether an object is hashable (can be used as a key in a dict) with the `hash` function.


In [None]:
hash('string')

-5867497655514687004

In [None]:
hash((1, 2, (2, 3)))

1097636502276347782

In [None]:
hash((1, 2, [2, 3])) # fails because lists are mutable

TypeError: ignored

To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can.


In [None]:
d = {}

In [None]:
d[tuple([1, 2, 3])] = 5

In [None]:
d

{(1, 2, 3): 5}

## set🎾

A set is an unordered collection of unique elements. Sets are like dicts, but keys only, no values. A set can be created in two ways: via the `set` function or via a *set literal with curly braces.


In [None]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [None]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

Supports mathematical *set operations* like union, intersection, difference, and symmetric difference. Consider these two example sets.

In [None]:
a = {1, 2, 3, 4, 5}

In [None]:
b = {3, 4, 5, 6, 7, 8}

The union of these two sets is the set of distinct elemnts occurring in either set. This can be computed with either the `union` method or the `|` binary operator.

In [None]:
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [None]:
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

The intersection contains the elements occurring in both sets. The `&` operator or the `intersection` method can be used.

In [None]:
a.intersection(b)

{3, 4, 5}

In [None]:
a & b

{3, 4, 5}

See Table 3-1 for a list of commonly used set methods.

| Function | Alternative syntax | Description |
| :--      | :--                | :--         |
| a.add(x) | N/A                | Add element `x` to the set `a` |
| a.clear() | N/A               | Reset the set `a` to an empty state, discarding all of its elements |
| a.remove(x) | N/A | Remove element `x` from the set `a` |
| a.pop() | N/A | Remove an arbitrary element from the set a, raising `KeyError` if the set is empty |
| a.union(b) | a \| b | All of the unique elements in `a` and `b` |
| a.update(b) | a \|= b | Set the contents of `a` to be the union of the elements in `a` and `b` |
| a.intersection(b) | a & b | All of the elements in *both* `a` and `b` |
| a.intersection_update(b) | a &= b | Set the contents of `a` to be the intersection of the elements in `a` and `b` |
| a.difference(b) | a - b | The elements in `a` that are not in `b` |
| a.difference_update(b) | a -= b | Set `a` to the elements in `a` that are not in `b` |
| a.symmetric_difference(b) | a ^ b | All of the elements in either a or b but *not both* |
| a.symmetric_difference_update(b) | a ^= b | Set `a` to contain the elements in either `a` or `b` but *not both* |
| a.issubset(b) | N/A | `True` if the elements of `a` are all contained in `b` |
| a.issuperset(b) | N/A | `True` if the elements of `b` are all contained in `a` |
| a.isdisjoint(b) | N/A | `True` if `a` and `b` have no elements in common |



All of the logical set operations have in-place counterparts, enabling you to replace the contents of the set on the left side of the operation with the result. For large sets, the following may be more efficient.


In [None]:
c = a.copy()

In [None]:
c |= b

In [None]:
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [None]:
d = a.copy()

In [None]:
d

{1, 2, 3, 4, 5}

In [None]:
b

{3, 4, 5, 6, 7, 8}

In [None]:
d &= b

In [None]:
d

{3, 4, 5}

Like dicts, set elements generally must be immutable. To have list-like elements, you must convert it to a tuple.

In [None]:
my_data = [1, 2, 3, 4]

my_set = {tuple(my_data)}

In [None]:
my_set

{(1, 2, 3, 4)}

You can check if a set is a subset of (is contained in) or a superset of (contains all elements of) another set:

In [None]:
a_set = {1, 2, 3, 4, 5}

In [None]:
{1, 2, 3}.issubset(a_set)

True

In [None]:
a_set.issuperset({1, 2, 3})

True

Sets are equal if and only if their contents are equal.

In [None]:
{1, 2, 3} == {3, 2, 1}

True

## List, Set, and Dict Comprehensions

*List comprehensions* are one of the most-loved Python language features, allowing you to concisely form new lists by filtering the elements of a collection, transforming the elements passing the filter in one concise expression.

```python
    [expr for val in collection if condition]
```

which is equivalent to the following `for` loop:

```python
   result = []
   for val in collection:
       if condition:
           result.append(expr)
```

The filter condiction can be omitted, leaving only the expresion. For example, given a list of strings, we can filter out strings with length 2 or less and convert them to uppercase like this.



In [None]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']

[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

Set and dict comprehensions are natural extensions, producing sets and dicts in an idiomatically similar way instead of lists. A dict comprehension looks like the following#

```python
dict_comp = {key-expr : value-expr for value in collection if condition}
```

Set comprehensions look like list comprehensions, except with curly braces instead of square brackets.

```python
set_comp = {expr for value in collection if condition}
```

Like list comprehensions, set and dict comprehensions are mostly conveniences, but they make code both easier to write and read. If we want to create a set containing just the lengths of the strings from the previous collection, we could compute this using a set comprehension.

In [None]:
unique_lengths = {len(x) for x in strings}

In [None]:
unique_lengths

{1, 2, 3, 4, 6}

We could express this more functionally using the `map` function.

In [None]:
set(map(len, strings))

{1, 2, 3, 4, 6}

As an example of a dict comprehensions, we could create a lookup map of the strings to their locations in the list.

In [None]:
loc_mapping = {val : index for index, val in enumerate(strings)}

In [None]:
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

#### Nested list comprehensions

Suppose we have a list of lists containing some English and Spanish names:


In [None]:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

Perhaps you want to organize these names by language. Suppose we want to get a single list containing all names with two or more `e`'s in them. We could do this with a simple `for` loop.

In [None]:
names_of_interest = []
for names in all_data:
    enough_es = [name for name in names if name.count('e') >= 2]
    names_of_interest.extend(enough_es)

In [None]:
names_of_interest

['Steven']

You could wrap this whole operation up in a single *nested list comprehension*, which looks like the following:

In [None]:
result = [name for names in all_data for name in names
          if name.count('e') >= 2]

In [None]:
result

['Steven']

The `for` parts of the list comprehension are arranged according to the order of nesting, and any filter condition is put at the end as before. Here is another example where we "flatten" a list of tuples of integers into a simple list of integers.

In [None]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

flattened = [x for tup in some_tuples for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Keep in mind that the order of the `for` expressions would be the same if you wrote a nested `for` loop instead of a list comprehension.

In [None]:
flattened = []

for tup in some_tuples:
    for x in tup:
        flattened.append(x)

You can have arbitrarily many levels of nesting, though if you ahve more than two or three levels of nesting you might want to take a different approach. It's important to distinguish the syntax just shown from a list comprehension inside a list comprehension, which is perfectly valid.

In [None]:
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

This produces a list of lists, rather than a flattened list of all of the inner elements.

# 3.2 Functions

Functions are declared with the `def` keyword and returned from with the `return` keyword.

```python
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)
```

You can have multiple `return` statements. Also, if Python reaches the end of a function without encountering a `return` statement, `None` is returned automatically.

Each function has *positional* and *keyword* arguments. Keyword arguments specify default values or optional arguments. In the preceding function, `x` and `y` are positional arguments while `z` is a keyword argument. The function can be called in any of these ways:

```python

my_function(5, 6, z=0.7)
my_function(3.14, 7, 3.5)
my_function(10, 20)

```

The keyword arguments *must* follow the positional arguments (if any). Keyword arguments can be specified in any order, freeing you to have to remember which order the function arguments were specified in and only what their names are.

It is possible to use keywords for passing positional arguments as well. In the preceding example, we could have written:

```python
my_function(x=5, y=6, z=7)
my_function(y=6, x=5, z=7)
```

## Namespaces, Scope, and Local Functions

Functions can access variables in *global* and *local* scopes. A *namespace* is an alternative and more descriptive name describing a variable scope in Python. Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the funciton is called and immediately populated by the function's arguments. When the function is finished, the local namespace is destroyed (with some exceptions). Consider the following function:

```python

def func():
    a = []
    for i in range(5):
        a.append(i)
```

When `func()` is called, the empty list `a` is created, five elements are appended, and then `a` is destroyed when the function exits. Suppose instead we declared `a` as follows:

```python
a = []
def func():
    for i in range(5):
        a.append(i)
```

Assigning variables outside of the function's scope is possible, but those variables must be declared as global via the `global` keyword:




In [None]:
a = None

def bind_a_variable():
    global a
    a = []
bind_a_variable()

In [None]:
print(a)

[]


📔❗
The author discourages use of the `global` keyword. Global variables are typically used to store some kind of state in a system. Using a lot of globals may indicate a need to implement classes.

## Returning Multiple Values

You can return multiple values from a function using a simple syntax.

```python
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()
```

In data analysis, you may find yourself doing this often. The function is just returning *one* object, a tuple, which is being unpacked into the resulting variables. We could have also used the following:

```python
return_value = f()
```

Here, `return_value` would be a 3-tuple with the three returned variables. A potentially attractive alternative to returning multiple values like before might be to return a dict instead.

```python
def f():
    a = 5
    b = 6
    c = 7
    return {'a' : a, 'b' : b, 'c' : c}
```



The alternative technique may be useful depending on what you are trying to do.

## Functions Are Objects

Since Python functions are objects, many constructs can be easily expressed that are difficult to do in other languages. Suppose we were doing some data cleaning and needed to apply a bunch of transformations to a list of strings.

In [None]:
states = ['    Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
          'south carolina##', 'West virgina?']

One way to clean this data is to use built-in string methods along with the `re` standard library module for regular expressions.

In [None]:
import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

In [None]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virgina']

An alternative approach is to make a list of the operations you want to apply to a particular set of strings:

In [None]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

Then we have the following:

In [None]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virgina']

A more *functional* pattern like this enables you to easily modify how the strings are transformed at a very high level. The `clean_strings` function is also now more reusable and generic.

You can use functions as arguments to other functions like the built-in `map` function, which applies a function to a sequence of some kind:

In [None]:
for x in map(remove_punctuation, states):
    print(x)

    Alabama 
Georgia
Georgia
georgia
FlOrIda
south carolina
West virgina


## Anonymous (Lambda) Functions

Python has support for so-called *anonymous* or *lambda* functions, which are a way of writing functions consisting of a single statement, the result of which is the return value. They are defined with the `lambda` keyword, which has no meaning other than "we are declaring an anonymous function":

```python
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2
```

It's often simplest to pass a lambda function as opposed to writing a full-out function declaration or even assigning the lambda function to a local variable. 

```python
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)
```

You could have written `[x * 2 for x in ints]`, but here we passed a custom operator to the `apply_to_list` function.

Suppose you wanted to sort a collection of strings by the number of distinct letters in each string:


In [None]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

Here we could pass a lambda function to the list's `sort` method.

In [None]:
strings.sort(key=lambda x: len(set(list(x))))

One reason lambda functions are called anonymous functions is that, unlike functions declared with the `def` keyword, the function object itself is never given an explicit `__name__` attribute.

## Currying: Partial Argument Application

*Currying* is computer science jargon (named after the mathematician Haskell Curry) that means deriving new functions from existing ones by *partial argument application*. Suppose we had a trivial function that adds two numbers together.

```python
def add_numbers(x, y):
    return x + y
```

With this function, we could derive a new function of one variable, `add_five`, that adds 5 to its argument:

```python
add_five = lambda y: add_numbers(5, y)
```

The second argument to `add_numbers` is said to be *curried*. All we've really done is define a new function that calls an existing function. The built-in `functools` module can simplify this process using the `partial` function.

```python
from functools import partial
add_five = partial(add_numbers, 5)
```

## Generators

Having a consistent way to iterate over sequences, like objects in a list or lines in a file, is an important Python feature. This is accomplished by means of the *iterator protocol*, a generic way to make objects iterable. For example, iterating over a dict yields the dict keys.

In [None]:
some_dict = {'a': 1, 'b': 2, 'c': 3}
for key in some_dict:
    print(key)

a
b
c


When you write `for key in some_dict`, the Python interpreter attempts to create an iterator out of `some_dict`:

In [None]:
dict_iterator = iter(some_dict)

In [None]:
dict_iterator

<dict_keyiterator at 0x7f7ed92ac5f0>

An iterator is any object that yields objects to the Python interpreter when used in a context like a `for` loop. Most methods expecting a list or list-like object will also accept any iterable object. This includes built-in methods like `min`, `max`, and `sum`, and type constructors like `list` and `tuple`:

In [None]:
list(dict_iterator)

['a', 'b', 'c']

A *generator* is a concise way to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators return a sequence of multiple results lazily, pausing after each one until the next one is requested. To create a generator, use the `yield` keyword instead of `return` in a function.

In [None]:
def squares(n=10):
    print(f'Generating squares from 1 to {n ** 2}')
    for i in range(1, n + 1):
        yield i ** 2

When you actually call the generator, no code is immediately executed:

In [None]:
gen = squares()

gen

<generator object squares at 0x7f7ed92fcb50>

It is not until you request elements from the generator that it begins executing its code.

In [None]:
for x in gen:
    print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

### Generator expressions

Another even more concise way to make a generator is by using a *generator expression*. This is a generator analogue to list, dict, and set comprehensions; to create one, enclose what would otherwise be a list comprehension within parentheses instead of brackets:

In [None]:
gen = (x ** 2 for x in range(100))

In [None]:
gen

<generator object <genexpr> at 0x7f7ed9337e50>

This is completely equivalent to the following more verbose generator:

In [None]:
def _make_gen():
    for x in range(100):
        yield x ** 2
gen = _make_gen()

Generator expressions can be used instead of list comprehensions as function arguments in many cases.

In [None]:
sum(x ** 2 for x in range(100))

328350

In [None]:
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

### itertools module

The standard library `itertools` module has a collection of generators for many common data algorithms. For example, `groupby` takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function. Here's an example.

In [None]:
import itertools

first_letter = lambda x: x[0]

In [None]:
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

In [None]:
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


See the table for a list of a few other `itertools` functions frequenctly found helpful. 

| Function | Description |
| :-- | :-- |
| `combinations(iterable, k)` | Generates a sequence of all possible k-tuples of elements in the iterable, ignoring order and without replacement (see also the companion function `combinations_with_replacement)` |
| `permutations(iterable, k)` | Generates a sequence of all possible k-tuples of elements in the iterable, respecting order |
| `groupby(iterable[, keyfunc])` | Generates `(key, sub-iterator)` for each unique key |
| `product(*iterables, repeat=1)` | Generates the Cartesian product of the input iterables as tuples, similar to a nested `for` loop |

In [200]:
sample_data = [ x ** 2 for x in range(12) ]
sample_data

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

In [206]:
import itertools

for x in itertools.combinations(sample_data, 2):
    print(x)

(0, 1)
(0, 4)
(0, 9)
(0, 16)
(0, 25)
(0, 36)
(0, 49)
(0, 64)
(0, 81)
(0, 100)
(0, 121)
(1, 4)
(1, 9)
(1, 16)
(1, 25)
(1, 36)
(1, 49)
(1, 64)
(1, 81)
(1, 100)
(1, 121)
(4, 9)
(4, 16)
(4, 25)
(4, 36)
(4, 49)
(4, 64)
(4, 81)
(4, 100)
(4, 121)
(9, 16)
(9, 25)
(9, 36)
(9, 49)
(9, 64)
(9, 81)
(9, 100)
(9, 121)
(16, 25)
(16, 36)
(16, 49)
(16, 64)
(16, 81)
(16, 100)
(16, 121)
(25, 36)
(25, 49)
(25, 64)
(25, 81)
(25, 100)
(25, 121)
(36, 49)
(36, 64)
(36, 81)
(36, 100)
(36, 121)
(49, 64)
(49, 81)
(49, 100)
(49, 121)
(64, 81)
(64, 100)
(64, 121)
(81, 100)
(81, 121)
(100, 121)


In [207]:
import itertools

for x in itertools.permutations(sample_data, 2):
    print(x)



(0, 1)
(0, 4)
(0, 9)
(0, 16)
(0, 25)
(0, 36)
(0, 49)
(0, 64)
(0, 81)
(0, 100)
(0, 121)
(1, 0)
(1, 4)
(1, 9)
(1, 16)
(1, 25)
(1, 36)
(1, 49)
(1, 64)
(1, 81)
(1, 100)
(1, 121)
(4, 0)
(4, 1)
(4, 9)
(4, 16)
(4, 25)
(4, 36)
(4, 49)
(4, 64)
(4, 81)
(4, 100)
(4, 121)
(9, 0)
(9, 1)
(9, 4)
(9, 16)
(9, 25)
(9, 36)
(9, 49)
(9, 64)
(9, 81)
(9, 100)
(9, 121)
(16, 0)
(16, 1)
(16, 4)
(16, 9)
(16, 25)
(16, 36)
(16, 49)
(16, 64)
(16, 81)
(16, 100)
(16, 121)
(25, 0)
(25, 1)
(25, 4)
(25, 9)
(25, 16)
(25, 36)
(25, 49)
(25, 64)
(25, 81)
(25, 100)
(25, 121)
(36, 0)
(36, 1)
(36, 4)
(36, 9)
(36, 16)
(36, 25)
(36, 49)
(36, 64)
(36, 81)
(36, 100)
(36, 121)
(49, 0)
(49, 1)
(49, 4)
(49, 9)
(49, 16)
(49, 25)
(49, 36)
(49, 64)
(49, 81)
(49, 100)
(49, 121)
(64, 0)
(64, 1)
(64, 4)
(64, 9)
(64, 16)
(64, 25)
(64, 36)
(64, 49)
(64, 81)
(64, 100)
(64, 121)
(81, 0)
(81, 1)
(81, 4)
(81, 9)
(81, 16)
(81, 25)
(81, 36)
(81, 49)
(81, 64)
(81, 100)
(81, 121)
(100, 0)
(100, 1)
(100, 4)
(100, 9)
(100, 16)
(100, 25)
(100, 36)
(100

# Errors and Exception Handling


In data analysis apps, many functions only work on certain kinds of input. Python's `float` function is capable of casting a string to a floating-point number, but fails with `ValueError` on improper inputs:

In [208]:
float('1.2345')

1.2345

In [209]:
float('something')

ValueError: ignored

If we wanted a `float` version that fails gracefully and returns the input argument, we can write a function that encloses the call to `float` in a `try/except` block:


In [210]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

The code in the `except` part of the block is only executed if `float(x)` raises an exception.

In [211]:
attempt_float('1.2345')

1.2345

In [212]:
attempt_float('something')

'something'

`float` can raise exceptions other than `ValueError`.

In [213]:
float((1, 2))

TypeError: ignored

Perhaps you only want to suppress `ValueError`, since a `TypeError` might indicate a legitimate bug. To do that, write the exception type after `except`:

In [215]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

Then we have the following:

In [216]:
attempt_float((1, 2))

TypeError: ignored

You can catch multiple exception types by writing a tuple of exception types instead (the parentheses are required):

In [217]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

When you do not want to suppress an exception but want some code to be executed regardless of whether the code in the `try` block succeeds or not, use `finally`.

```python
f = open(path, 'w')

try:
    write_to_file(f)
finally:
    f.close()
```

The file handle `f` *always* gets closed. You can also have code that executes only if the `try:` block succeeds using `else`:

```python

f = open(path, 'w')

try:
    write_to_file(f)
except:
    print('Failed')
else:
    print('Succeeded')
finally:
    f.close()

```        

### Exceptions in IPython

If an exception is raised while you are `%run`-ing a script or executing any statement, IPython will by default print a full call stack trace (traceback) with a few lines of context around the position at each point in the stack.

```python
%run examples/ipython_bug.py
```

Having additional context by itself is a big advantage over the standard Python interpret, which does not provide any additional context. You can control the amount of context shown using the `%xmode` magic command, from `Plain` (same as the standard Python interpreter) to `Verbose` (which inlines function argument values and more). You can step *into the stack* (using the `%debug` or `%pdb` magics) after an error has occurred for interactive post-mortem debugging.

# 3.3 Files and the Operating System

To open a file for reading or writing, use the built-in `open` function with either a relative or absolute path#
