# Chapter 3 - Built-in Data Structures, Functions, and Files

## 3.1 Tuples

Nested tuple.

In [1]:
nested_tup = (4, 5, 6), (7, 8)
nested_tup

((4, 5, 6), (7, 8))

You can convert any sequence or iterator to a tuple by invoking tuple.

In [2]:
tuple([4, 0, 2])

(4, 0, 2)

In [4]:
s = tuple("string")
s

('s', 't', 'r', 'i', 'n', 'g')

While the objects stored in a tuple may be mutable themselves, once the tuple is created
it’s not possible to modify which object is stored in each slot.

In [6]:
tup = tuple(['foo', [1, 2], True])

In [7]:
tup[2] = False

TypeError: 'tuple' object does not support item assignment

If an object inside a tuple is mutable, such as a list, you can modify it in-place.

In [8]:
tup[1].append(3)

In [9]:
tup

('foo', [1, 2, 3], True)

You can concatenate tuples using the + operator to produce longer tuples.

In [10]:
(1, 2, 3) + (4, 5)

(1, 2, 3, 4, 5)

Multiplying a tuple by an integer, as with lists, has the effect of concatenating together
that many copies of the tuple.

In [11]:
(1, 2, 3) * 5

(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)

Note that the objects themselves are not copied, only the references to them.

If you try to assign to a tuple-like expression of variables, Python will attempt to
unpack the value on the righthand side of the equals sign.

In [12]:
tup = (1, 2, 3)
a, b, c = tup

Even sequences with nested tuples can be unpacked.

In [13]:
tup = 4, 5, (6, 7)

In [14]:
a, b, (c, d) = tup

Using this functionality you can easily swap variable names.

In [15]:
a, b = 1, 2

In [16]:
a, b = b, a

A common use of variable unpacking is iterating over sequences of tuples or lists.

In [17]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [18]:
for a, b, c in seq:
    print(a, b, c)

1 2 3
4 5 6
7 8 9


The Python language recently acquired some more advanced tuple unpacking to help
with situations where you may want to “pluck” a few elements from the beginning of
a tuple. This uses the special syntax *rest, which is also used in function signatures
to capture an arbitrarily long list of positional arguments.

In [19]:
values = 1, 2, 3, 4, 5

a, b, *rest = values

In [20]:
a

1

In [21]:
b

2

In [22]:
rest

[3, 4, 5]

This rest bit is sometimes something you want to discard; there is nothing special
about the rest name. As a matter of convention, many Python programmers will use
the underscore (_) for unwanted variables.

In [23]:
a, b, *_ = values

Since the size and contents of a tuple cannot be modified, it is very light on instance
methods. A particularly useful one (also available on lists) is count, which counts the
number of occurrences of a value.

In [24]:
a = (1, 2, 2, 2, 3, 4, 2)

In [25]:
a.count(2)

4

## 3.2 List

You can define them using square brackets [] or using the list type function.

In [26]:
a_list = [2, 3, 7, None]

In [27]:
tup = ("foo", "bar", "baz")

In [29]:
b_list = list(tup)
b_list

['foo', 'bar', 'baz']

The list function is frequently used in data processing as a way to materialize an
iterator or generator expression.

In [30]:
gen = range(10)

In [31]:
gen

range(0, 10)

In [32]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Elements can be appended to the end of the list with the append method.

In [34]:
a = [1, 2, 3]

In [35]:
a.append(4)

In [36]:
a

[1, 2, 3, 4]

Using insert you can insert an element at a specific location in the list. The insertion index must be between 0 and the length of the list, inclusive.

insert is computationally expensive compared with append,
because references to subsequent elements have to be shifted internally
to make room for the new element. If you need to insert elements
at both the beginning and end of a sequence, you may wish
to explore collections.deque, a double-ended queue, for this purpose.

In [37]:
a.insert(1, "red")

In [38]:
a

[1, 'red', 2, 3, 4]

The inverse operation to insert is pop, which removes and returns an element at a
particular index.

In [39]:
a.pop(2)

2

In [40]:
a

[1, 'red', 3, 4]

Elements can be removed by value with remove, which locates the first such value and
removes it from the last.

In [41]:
a.append("foo")

In [42]:
a

[1, 'red', 3, 4, 'foo']

In [43]:
a.remove("foo")

In [44]:
a

[1, 'red', 3, 4]

Check if a list contains a value using the in keyword.

In [45]:
"dwarf" in a

False

In [46]:
"dwarf" not in a

True

Checking whether a list contains a value is a lot slower than doing so with dicts and
sets (to be introduced shortly), as Python makes a linear scan across the values of the
list, whereas it can check the others (based on hash tables) in constant time.

Similar to tuples, adding two lists together with + concatenates them.

In [47]:
[1, 2, 3] + [4, 5]

[1, 2, 3, 4, 5]

If you have a list already defined, you can append multiple elements to it using the
extend method.

In [48]:
l = [1, 2, 3]

In [51]:
l.extend([4, 5])

In [50]:
l

[1, 2, 3, 4, 5, 6]

Note that list concatenation by addition is a comparatively expensive operation since
a new list must be created and the objects copied over. Using extend to append elements
to an existing list, especially if you are building up a large list, is usually preferable.

You can sort a list in-place (without creating a new object) by calling its sort
function.

In [52]:
a = [7, 2, 6, 1, 3]

In [54]:
a.sort()

In [55]:
a

[1, 2, 3, 6, 7]

sort has a few options that will occasionally come in handy. One is the ability to pass
a secondary sort key—that is, a function that produces a value to use to sort the
objects. For example, we could sort a collection of strings by their lengths

In [56]:
b = ["saw", "small", "He", "foxes", "six"]
b.sort(key=len)

In [57]:
b

['He', 'saw', 'six', 'small', 'foxes']

The built-in bisect module implements binary search and insertion into a sorted list.
bisect.bisect finds the location where an element should be inserted to keep it sorted,
while bisect.insort actually inserts the element into that location.

In [58]:
import bisect

In [59]:
c = [1, 2, 2, 2, 3, 4, 7]

In [61]:
bisect.bisect(c, 2)

4

In [62]:
bisect.bisect(c, 5)

6

In [63]:
bisect.insort(c, 6)

In [64]:
c

[1, 2, 2, 2, 3, 4, 6, 7]

The bisect module functions do not check whether the list is sorted,
as doing so would be computationally expensive. Thus, using
them with an unsorted list will succeed without error but may lead
to incorrect results.

Slices can also be assigned to with a sequence.

In [66]:
seq = [1, 2, 3, 4, 5]
seq[3:4] = [6, 3]

In [67]:
seq

[1, 2, 3, 6, 3, 5]

Either the start or stop can be omitted, in which case they default to the start of the
sequence and the end of the sequence, respectively.

In [68]:
seq[:4]

[1, 2, 3, 6]

In [69]:
seq[2:]

[3, 6, 3, 5]

Negative indices slice the sequence relative to the end.

In [70]:
seq[-4:]

[3, 6, 3, 5]

A step can also be used after a second colon to, say, take every other element.

In [71]:
seq[::2]

[1, 3, 3]

A clever use of this is to pass -1, which has the useful effect of reversing a list or tuple.

In [72]:
seq[::-1]

[5, 3, 6, 3, 2, 1]

## 3.3 Built-in sequence functions

Python has a handful of useful sequence functions that you should familiarize yourself
with and use at any opportunity.

### enumerate

It’s common when iterating over a sequence to want to keep track of the index of the
current item. Since this is so common, Python has a built-in function, enumerate, which returns a
sequence of (i, value) tuples.

In [73]:
for idx, val in enumerate([1, 2, 3]):
    print(idx, val)

0 1
1 2
2 3


When you are indexing data, a helpful pattern that uses enumerate is computing a
dict mapping the values of a sequence (which are assumed to be unique) to their
locations in the sequence.

In [74]:
some_list = ["foo", "bar", "baz"]

In [75]:
mapping = {}

In [76]:
for i, v in enumerate(some_list):
    mapping[v] = i  

In [77]:
mapping

{'bar': 1, 'baz': 2, 'foo': 0}

### sorted

The sorted function returns a new sorted list from the elements of any sequence.

In [78]:
sorted([3, 4, 1])

[1, 3, 4]

The sorted function accepts the same arguments as the sort method on lists.

### zip

zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a
list of tuples.

In [79]:
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]

In [80]:
zipped = zip(seq1, seq2)

In [81]:
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

zip can take an arbitrary number of sequences, and the number of elements it produces
is determined by the shortest sequence.

In [82]:
seq3 = [False, True]

In [83]:
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

A very common use of zip is simultaneously iterating over multiple sequences, possibly
also combined with enumerate.

In [86]:
for idx, (a, b) in enumerate(zip(seq1, seq2)):
    print(idx, a, b)

0 foo one
1 bar two
2 baz three


In [87]:
for idx, a, b in enumerate(zip(seq1, seq2)):
    print(idx, a, b)

ValueError: not enough values to unpack (expected 3, got 2)

Given a “zipped” sequence, zip can be applied in a clever way to “unzip” the
sequence. Another way to think about this is converting a list of rows into a list of
columns. The syntax, which looks a bit magical, is this.

In [89]:
pitchers = [('Nolan', 'Ryan'), 
            ('Roger', 'Clemens'),
            ('Schilling', 'Curt')]
pitchers

[('Nolan', 'Ryan'), ('Roger', 'Clemens'), ('Schilling', 'Curt')]

In [90]:
first_names, last_names = zip(*pitchers)

In [91]:
first_names

('Nolan', 'Roger', 'Schilling')

In [92]:
last_names

('Ryan', 'Clemens', 'Curt')

### reversed

reversed iterates over the elements of a sequence in reverse order.

In [93]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Keep in mind that reversed is a generator (to be discussed in some more detail later),
so it does not create the reversed sequence until materialized (e.g., with list or a for
loop).

## 3.4 dict

dict is likely the most important built-in Python data structure. A more common
name for it is hash map or associative array. It is a flexibly sized collection of key-value
pairs, where key and value are Python objects.

You can check if a dict contains a key using the same syntax used for checking
whether a list or tuple contains a value.

In [95]:
d = {"a": 1, "b": 2, "c": 3}

In [96]:
"b" in d

True

You can delete values either using the del keyword or the pop method (which simultaneously
returns the value and deletes the key).

In [97]:
del d["a"]

In [98]:
d

{'b': 2, 'c': 3}

In [99]:
d.pop("b")

2

In [100]:
d

{'c': 3}

The keys and values method give you iterators of the dict’s keys and values, respectively.
While the key-value pairs are not in any particular order, these functions output
the keys and values in the same order.

In [101]:
d = {"a": 1, "b": 2, "c": 3}

In [102]:
d.keys()

dict_keys(['a', 'b', 'c'])

In [103]:
d.values()

dict_values([1, 2, 3])

In [104]:
d.items()

dict_items([('a', 1), ('b', 2), ('c', 3)])

You can merge one dict into another using the update method.

In [105]:
d.update({"d": "foo", "e": 12})

In [106]:
d

{'a': 1, 'b': 2, 'c': 3, 'd': 'foo', 'e': 12}

The update method changes dicts in-place, so any existing keys in the data passed to
update will have their old values discarded.

It’s common to occasionally end up with two sequences that you want to pair up
element-wise in a dict. Since a dict is essentially a collection of 2-tuples, the dict function accepts a list of
2-tuples.

In [107]:
mapping = dict(zip(range(5), reversed(range(5))))

In [108]:
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

The dict methods get and pop can take a default value to be returned.

In [109]:
d.get("z", "-1")

'-1'

Imagine categorizing a list of words by their
first letters as a dict of lists. The built-in collections module has a useful class, defaultdict, which makes this
even easier. To create one, you pass a type or function for generating the default value
for each slot in the dict.

In [112]:
from collections import defaultdict
by_letter = defaultdict(list)
for word in ["abba", "cda", "bbs", "bba"]:
    by_letter[word[0]].append(word)

In [113]:
by_letter

defaultdict(list, {'a': ['abba'], 'b': ['bbs', 'bba'], 'c': ['cda']})

While the values of a dict can be any Python object, the keys generally have to be
immutable objects like scalar types (int, float, string) or tuples (all the objects in the
tuple need to be immutable, too). The technical term here is hashability. You can
check whether an object is hashable (can be used as a key in a dict) with the hash
function.

In [114]:
hash("string")

8659243948610029811

In [115]:
hash([1, 2, 3])

TypeError: unhashable type: 'list'

## 3.5 set

A set is an unordered collection of unique elements. You can think of them like dicts,
but keys only, no values.

A set can be created in two ways: via the set function or via
a set literal with curly braces.

In [116]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [117]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

Sets support mathematical set operations like union, intersection, difference, and
symmetric difference. Consider these two example sets.

In [118]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

The union of these two sets is the set of distinct elements occurring in either set. This
can be computed with either the union method or the | binary operator.

In [119]:
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [120]:
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

The intersection contains the elements occurring in both sets. The & operator or the
intersection method can be used.

In [121]:
a.intersection(b)

{3, 4, 5}

In [122]:
a & b

{3, 4, 5}

List of set operations.
```
a.add(x) N/A Add element x to the set a
a.clear() N/A Reset the set a to an empty state, discarding all of
its elements
a.remove(x) N/A Remove element x from the set a
a.pop() N/A Remove an arbitrary element from the set a, raising
KeyError if the set is empty
a.union(b) a | b All of the unique elements in a and b
a.update(b) a |= b Set the contents of a to be the union of the
elements in a and b
a.intersection(b) a & b All of the elements in both a and b
a.intersection_update(b) a &= b Set the contents of a to be the intersection of the
elements in a and b
a.difference(b) a - b The elements in a that are not in b
a.difference_update(b) a -= b Set a to the elements in a that are not in b
a.symmetric_difference(b) a ^ b All of the elements in either a or b but not both
a.symmetric_difference_update(b) a ^= b Set a to contain the elements in either a or b but
not both
a.issubset(b) N/A True if the elements of a are all contained in b
a.issuperset(b) N/A True if the elements of b are all contained in a
a.isdisjoint(b) N/A True if a and b have no elements in common
```

All of the logical set operations have in-place counterparts, which enable you to
replace the contents of the set on the left side of the operation with the result. For
very large sets, this may be more efficient.

In [123]:
c = a.copy()

In [126]:
c |= b

In [127]:
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [128]:
d = a.copy()

In [129]:
d &= b

In [130]:
d

{3, 4, 5}

Like dicts, set elements generally must be immutable. To have list-like elements, you
must convert it to a tuple.

In [131]:
my_data = [1, 2, 3, 4]

In [133]:
my_set = {tuple(my_data)}

In [135]:
my_set

{(1, 2, 3, 4)}

You can also check if a set is a subset of (is contained in) or a superset of (contains all
elements of) another set.

In [136]:
a_set = {1, 2, 3, 4, 5}

In [137]:
{1, 2, 3}.issubset(a_set)

True

In [138]:
a_set.issuperset({1, 2, 3})

True

Sets are equal if and only if their contents are equal.

In [139]:
{1, 2, 3} == {3, 2, 1}

True

## 3.6 List, set and dict comprehensions

List comprehensions are one of the most-loved Python language features. They allow
you to concisely form a new list by filtering the elements of a collection, transforming
the elements passing the filter in one concise expression.

`[expr for val in collection if condition]`

This is equivalent to the following for loop.

```
result = []
for val in collection:
    if condition:
        result.append(expr)
```

Given a
list of strings, we could filter out strings with length 2 or less and also convert them to
uppercase like this.

In [1]:
strings = ["a", "as", "bat", "car", "dove", "python"]

In [2]:
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

Set and dict comprehensions are a natural extension, producing sets and dicts in an
idiomatically similar way instead of lists. A dict comprehension looks like this.

`dict_comp = {key-expr : value-expr for value in collection if condition}`

A set comprehension looks like the equivalent list comprehension except with curly
braces instead of square brackets.

`set_comp = {expr for value in collection if condition}`

Like list comprehensions, set and dict comprehensions are mostly conveniences, but
they similarly can make code both easier to write and read. Consider the list of strings
from before. Suppose we wanted a set containing just the lengths of the strings contained
in the collection; we could easily compute this using a set comprehension.

In [3]:
{len(x) for x in strings}

{1, 2, 3, 4, 6}

As a simple dict comprehension example, we could create a lookup map of these
strings to their locations in the list.

In [4]:
{val: index for index, val in enumerate(strings)}

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

Nested list comprehensions. Suppose we have a list of lists containing some English and Spanish names. Now, suppose we wanted to get a single list containing all names
with two or more e’s in them.

In [5]:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

In [6]:
[name for names in all_data for name in names if name.count("e") >= 2]

['Steven']

Here is another example where we
“flatten” a list of tuples of integers into a simple list of integers.

In [7]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [10]:
[x for tup in some_tuples for x in tup]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

It’s important to distinguish the syntax just shown
from a list comprehension inside a list comprehension, which is also perfectly valid. This produces a list of lists, rather than a flattened list of all of the inner elements.

In [13]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [14]:
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

## 3.7 Functions

Each function can have positional arguments and keyword arguments. Keyword arguments
are most commonly used to specify default values or optional arguments. In
the preceding function, x and y are positional arguments while z is a keyword argument.
This means that the function can be called in any of these ways.

In [15]:
def my_func(x, y, z=1.5):
    return x + y + z

In [16]:
my_func(5, 6, z=0.7)
my_func(3, 2, 1)
my_func(1, 2)

4.5

The main restriction on function arguments is that the keyword arguments must follow
the positional arguments (if any). You can specify keyword arguments in any
order; this frees you from having to remember which order the function arguments
were specified in and only what their names are.

It is possible to use keywords for passing positional arguments as
well. In some cases this can help with readability. In the preceding example, we could also have written.

In [17]:
my_func(x=5, y=6, z=7)

18

## 3.7.1 Namespaces, scope and local functions

Functions can access variables in two different scopes: global and local.

An alternative
and more descriptive name describing a variable scope in Python is a namespace.

Any
variables that are assigned within a function by default are assigned to the local
namespace. The local namespace is created when the function is called and immediately
populated by the function’s arguments. After the function is finished, the local
namespace is destroyed.

When func() is called, the empty list a is created, five elements are appended, and
then a is destroyed when the function exits.

In [18]:
def func():
    a = []
    for i in range(5):
        a.append(i)

In [19]:
a = []
def func():
    for i in range(5):
        a.append(i)

In [23]:
a = None

def func():
    global a
    a = 2

In [28]:
func()
a

2

I generally discourage use of the global keyword.

## 3.7.2 Anonymous (Lambda) functions

Python has support for so-called anonymous or lambda functions, which are a way of
writing functions consisting of a single statement, the result of which is the return
value. They are defined with the lambda keyword, which has no meaning other than
“we are declaring an anonymous function”.

In [29]:
def short_function(x):
    return x * 2

In [31]:
lambda x: x * 2

<function __main__.<lambda>>

Suppose you wanted to sort a collection of strings by the number
of distinct letters in each string.

In [32]:
strings = ["foo", "card", "bar", "aaaa", "abab"]

In [33]:
strings.sort(key=lambda x: len(set(list(x))))

In [34]:
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

One reason lambda functions are called anonymous functions is
that , unlike functions declared with the def keyword, the function
object itself is never given an explicit __name__ attribute.

## 3.7.3 Currying: partial argument application

Currying is computer science jargon (named after the mathematician Haskell Curry)
that means deriving new functions from existing ones by partial argument application.

In [35]:
def add_number(x, y):
    return x + y

Using this function, we could derive a new function of one variable, add_five, that
adds 5 to its argument.

In [None]:
add_five = lambda y: add_numbers(5, y)

The second argument to add_numbers is said to be curried. There’s nothing very fancy
here, as all we’ve really done is define a new function that calls an existing function.

The built-in functools module can simplify this process using the partial function.

In [36]:
from functools import partial
add_five = partial(add_number, 5)

## 3.7.4 Generators

Having a consistent way to iterate over sequences, like objects in a list or lines in a
file, is an important Python feature. This is accomplished by means of the iterator
protocol, a generic way to make objects iterable. For example, iterating over a dict
yields the dict keys.

In [37]:
some_dict = {"a": 1, "b": 2, "c": 3}

In [38]:
for key in some_dict:
    print(key)

a
b
c


When you write for key in some_dict, the Python interpreter first attempts to create
an iterator out of some_dict.

In [39]:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x10f4fe688>

An iterator is any object that will yield objects to the Python interpreter when used in
a context like a for loop. Most methods expecting a list or list-like object will also
accept any iterable object. This includes built-in methods such as min, max, and sum,
and type constructors like list and tuple.

In [40]:
list(dict_iterator)

['a', 'b', 'c']

In [42]:
sum(dict_iterator)

0

A generator is a concise way to construct a new iterable object. Whereas normal functions
execute and return a single result at a time, generators return a sequence of
multiple results lazily, pausing after each one until the next one is requested. To create
a generator, use the yield keyword instead of return in a function.

In [43]:
def squares(n=10):
    print("Generating squares from 1 to {0}".format(n ** 2))
    for i in range(1, n+1):
        yield i ** 2

When you actually call the generator, no code is immediately executed.

In [46]:
gen = squares()
gen

<generator object squares at 0x10f4d2e60>

It is not until you request elements from the generator that it begins executing its
code.

In [47]:
for x in gen:
    print(x, end=" ")

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

#### Generator expressions

Another even more concise way to make a generator is by using a generator expression.
This is a generator analogue to list, dict, and set comprehensions; to create one,
enclose what would otherwise be a list comprehension within parentheses instead of
brackets.

In [49]:
gen = (x ** 2 for z in range(100))
gen

<generator object <genexpr> at 0x10f4ebe60>

This is completely equivalent to the following more verbose generator.

In [51]:
def _make_gen():
    for x in range(100):
        yield x ** 2

gen = _make_gen()
gen

<generator object _make_gen at 0x10f4d2fc0>

Generator expressions can be used instead of list comprehensions as function arguments
in many cases.

In [52]:
sum(x ** 2 for x in range(100))

328350

In [53]:
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

#### itertools module

The standard library itertools module has a collection of generators for many common
data algorithms. For example, groupby takes any sequence and a function,
grouping consecutive elements in the sequence by return value of the function.

In [56]:
import itertools

first_letter = lambda x: x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


See Table 3-2 for a list of a few other itertools functions I’ve frequently found helpful.

```
combinations(iterable, k) Generates a sequence of all possible k-tuples of elements in the iterable, ignoring order and without replacement (see also the companion function combinations_with_replacement)

permutations(iterable, k) Generates a sequence of all possible k-tuples of elements in the iterable, respecting order

groupby(iterable[, keyfunc]) Generates (key, sub-iterator) for each unique key

product(*iterables, repeat=1) Generates the Cartesian product of the input iterables as tuples, similar to a nested for loop
```

Check itertool [documentation](https://docs.python.org/3/library/itertools.html).

## 3.7.5 Errors and exception handling

Handling Python errors or exceptions gracefully is an important part of building
robust programs. In data analysis applications, many functions only work on certain
kinds of input.

In [57]:
float("1.234")

1.234

In [58]:
float("sdf")

ValueError: could not convert string to float: 'sdf'

Suppose we wanted a version of float that fails gracefully, returning the input argument.

In [59]:
def func(x):
    try:
        return float(x)
    except:
        return x

The code in the except part of the block will only be executed if float(x) raises an
exception.

In [60]:
func(2)

2.0

In [61]:
func("Sd")

'Sd'

You might want to only suppress ValueError, since a TypeError (the input was not a
string or numeric value) might indicate a legitimate bug in your program.

In [63]:
def func(x):
    try:
        return float(x)
    except ValueError:
        return x

In [64]:
func((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

You can catch multiple exception types by writing a tuple of exception types instead
(the parentheses are required).

In [65]:
def func(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In some cases, you may not want to suppress an exception, but you want some code
to be executed regardless of whether the code in the try block succeeds or not. To do
this, use finally. Here, the file handle f will always get closed.

In [67]:
try:
    open("test", "w")
finally:
    print("close file")



close file


Similarly, you can have code that executes
only if the try: block succeeds using else.

In [68]:
print("Open")
try:
    print("Write to file")
except:
    print("Failed")
else:
    print("Succeeded")
finally:
    print("Close")

Open
Write to file
Succeeded
Close


## 3.8 Files and the operating system

To open a file for reading or writing, use the built-in open function with either a relative
or absolute file path.

In [69]:
path = "example/file.txt"

f = open(path)

We can then treat the file handle
f like a list and iterate over the lines like so.

In [None]:
for line in f:
    pass

The lines come out of the file with the end-of-line (EOL) markers intact, so you’ll
often see code to get an EOL-free list of lines in a file like.

In [None]:
lines = [x.rstrip() for x in open(path)]

When you use open to create file objects, it is important to explicitly close the file
when you are finished with it. Closing the file releases its resources back to the operating
system.

In [None]:
f.close()

One of the ways to make it easier to clean up open files is to use the with statement. This will automatically close the file f when exiting the with block.

In [None]:
with open(path) as f:
    # do whatever

If we had typed f = open(path, 'w'), a new file at examples/segismundo.txt would
have been created (be careful!), overwriting any one in its place. There is also the 'x'
file mode, which creates a writable file but fails if the file path already exists.

See
Table 3-3 for a list of all valid file read/write modes.

```
r Read-only mode
w Write-only mode; creates a new file (erasing the data for any file with the same name)
x Write-only mode; creates a new file, but fails if the file path already exists
a Append to existing file (create the file if it does not already exist)
r+ Read and write
b Add to mode for binary files (i.e., 'rb' or 'wb')
t Text mode for files (automatically decoding bytes to Unicode). This is the default if not specified. Add t to other
modes to use this (i.e., 'rt' or 'xt')
```

read returns a certain number of characters from the file. What constitutes a
“character” is determined by the file’s encoding (e.g., UTF-8) or simply raw bytes if
the file is opened in binary mode.

In [None]:
f = open(path)
f.read(10)

The read method advances the file handle’s position by the number of bytes read.
tell gives you the current position.

In [None]:
f.tell() # 11

You can check
the default encoding in the sys module.

In [70]:
import sys
sys.getdefaultencoding()

'utf-8'

seek changes the file position to the indicated byte in the file.

In [None]:
f.seek(3)

See Table 3-4 for many of the most commonly used file methods.

```
read([size]) Return data from file as a string, with optional size argument indicating the number of bytes to read

readlines([size]) Return list of lines in the file, with optional size argument

write(str) Write passed string to file

writelines(strings) Write passed sequence of strings to the file

close() Close the handle

flush() Flush the internal I/O buffer to disk

seek(pos) Move to indicated file position (integer)

tell() Return current file position as integer

closed True if the file is closed
```
