#  Data Structures and Sequences

## Mutable and immutable objects

Most objects in Python, such as lists, dicts, NumPy arrays, and most user-defined types (classes), are mutable. This means that the object or values that they contain can be modified:

In [1]:
a_list = ["foo", 2, [4, 5]]
a_list[2] = (3, 4)
a_list

['foo', 2, (3, 4)]

Others, like strings and tuples, are immutable:

In [2]:
a_tuple = (3, 5, (4, 5))
a_tuple[1] = "four"

TypeError: 'tuple' object does not support item assignment

## Tuples

A **tuple** is a fixed-length, immutable sequence of Python objects. The easiest way to create one is with a comma-separated sequence of values:

In [3]:
tup = 4, 5, 6

In [4]:
tup

(4, 5, 6)

In [5]:
nested_tup = (4, 5, 6), (7, 8)

In [6]:
nested_tup

((4, 5, 6), (7, 8))

You can convert any sequence or iterator to a tuple by invoking **tuple**:

In [7]:
tuple([4, 0, 2])

(4, 0, 2)

Elements can be accessed with square brackets **[ ]** as with most other sequence types.
As in C, C++, Java, and many other languages, sequences are **0-indexed** in Python:

In [8]:
tup = tuple("string")
tup[0]

's'

In [9]:
tup = tuple(["foo", [1, 2], True])

While the objects stored in a tuple may be mutable themselves, once the tuple is created it’s not possible to modify which object is stored in each slot:

In [10]:
tup[2] = False

TypeError: 'tuple' object does not support item assignment

In [11]:
tup[1].append(3)

In [12]:
tup

('foo', [1, 2, 3], True)

In [13]:
(4, None, "foo") + (6, 0) + ("bar",)

(4, None, 'foo', 6, 0, 'bar')

Multiplying a tuple by an integer, as with lists, has the effect of concatenating together
that many copies of the tuple:

In [14]:
("foo", "bar") * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

## Unpacking tuples

In [15]:
tup = (4, 5, 6)
a, b, c = tup

In [16]:
a, b, c

(4, 5, 6)

In [17]:
tup = 4, 5, (6, 7)
a, b, (c, d) = tup

In [18]:
a, b, c, d

(4, 5, 6, 7)

A common use of variable unpacking is iterating over sequences of tuples or lists:

In [19]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

for a, b, c in seq:
    print("a={0}, b={1}, c={2}".format(a, b, c))

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


Advanced tuple unpacking to help with situations where you may want to “pluck” a few elements from the beginning of a tuple.

Use the special syntax __*rest__, which is also used in function signatures to capture an arbitrarily long list of positional arguments:

In [20]:
values = 1, 2, 3, 4, 5

In [21]:
a, b, *rest = values

In [22]:
a, b

(1, 2)

In [23]:
rest

[3, 4, 5]

Python programmers will use the underscore _ for unwanted variables

In [24]:
a, b, *_ = values

In [25]:
a, b

(1, 2)

#### Tuple methods

**count**: counts the number of occurrences of a value:

In [26]:
a = (1, 2, 2, 2, 3, 4, 2)

In [27]:
a.count(2)

4

## Lists

Lists are variable-length and their contents can be modified in-place. You can define them using square brackets `[ ]` or using the `list` type function:

In [40]:
a_list = [2, 3, 7, None]
tup = ("foo", "bar", "baz")

In [41]:
b_list = list(tup)

In [42]:
b_list

['foo', 'bar', 'baz']

In [43]:
b_list[1] = "peekaboo"

In [44]:
b_list

['foo', 'peekaboo', 'baz']

The **list** function is frequently used in data processing as a way to materialize an
iterator or generator expression:

In [45]:
gen = range(10)

In [46]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#### Adding and removing elements

Elements can be appended to the end of the list with the **append** method:

In [47]:
b_list.append("dwarf")

In [48]:
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

Using **insert** you can insert an element at a specific location in the list:

In [49]:
b_list.insert(1, "red")

In [50]:
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']

The inverse operation to insert is **pop**, which removes and returns an element at a particular index:

In [51]:
b_list.pop(2)

'peekaboo'

In [52]:
b_list

['foo', 'red', 'baz', 'dwarf']

Elements can be removed by value with **remove**, which locates the first such value and
removes it from the last:

In [53]:
b_list.remove("red")

In [54]:
b_list

['foo', 'baz', 'dwarf']

Check if a list contains a value using the **in / not in** keyword:

In [55]:
"dwarf" in b_list

True

In [57]:
"gg" not in b_list

True

#### Concatenating and combining lists

In [58]:
[4, None, "foo"] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

If you have a list already defined, you can append multiple elements to it using the
**extend** method:

In [59]:
x = [4, None, "foo"]

In [60]:
x.extend([7, 8, (2, 3)])

In [61]:
x

[4, None, 'foo', 7, 8, (2, 3)]

**Note that list concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable.**

```python
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)
```

##### is faster than
```python
everything = []
for chunk in list_of_lists:
    everything = everything + chunk
```

### Sorting

In [70]:
data = [7, 2, 5, 1, 3]

Sort and creating a new object using `sorted()`

In [71]:
data2 = sorted(data)  # perform COPIED sorting on the data

In [72]:
data

[7, 2, 5, 1, 3]

In [73]:
data2

[1, 2, 3, 5, 7]

You can sort a list in-place (without creating a new object) by calling its **sort**
function:

In [74]:
data.sort()

In [75]:
data

[1, 2, 3, 5, 7]

One is the ability to pass a secondary sort key—that is, a function that produces a value to use to sort the objects. For example, we could sort a collection of strings by their lengths:

In [79]:
data_3 = [[3, 5, 7, 8], [5, 7, 8]]

In [80]:
data_3.sort(key=len)

In [81]:
data_3

[[5, 7, 8], [3, 5, 7, 8]]

In [82]:
b = ["saw", "small", "He", "foxes", "six"]

In [83]:
b.sort(key=len)

In [84]:
b

['He', 'saw', 'six', 'small', 'foxes']

In [85]:
sorted(b, key=len)

['He', 'saw', 'six', 'small', 'foxes']

#### Binary search and maintaining a sorted list

In [86]:
import bisect

The built-in **bisect** module implements binary search and insertion into a ___sorted list___.
`bisect.bisect` finds the location where an element should be inserted to keep it sorted, while `bisect.insort` actually inserts the element into that location:

In [87]:
c = [1, 2, 2, 2, 3, 4, 7]

In [88]:
bisect.bisect(c, 2)

4

In [89]:
bisect.bisect(c, 5)

6

In [90]:
bisect.insort(c, 6)

In [91]:
c

[1, 2, 2, 2, 3, 4, 6, 7]

#### Slicing

You can select sections of most sequence types by using slice notation, which in its
basic form consists of _start:stop_ passed to the indexing operator **[ ]**:

In [92]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

Slices can also be assigned to with a sequence:

In [93]:
seq[3:4]

[7]

In [94]:
seq[3:4] = [6, 3]
seq

[7, 2, 3, 6, 3, 5, 6, 0, 1]

While the element at the _start_ index is _included_, the _stop_ index is __not included__, so that the number of elements in the result is _stop - start_.

Either the start or stop can be omitted, in which case they default to the start of the sequence and the end of the sequence, respectively:

In [95]:
seq[:5]

[7, 2, 3, 6, 3]

In [96]:
seq[3:]

[6, 3, 5, 6, 0, 1]

Negative indices slice the sequence relative to the end:

In [97]:
seq[-4:]

[5, 6, 0, 1]

In [98]:
seq[:-4]

[7, 2, 3, 6, 3]

In [99]:
seq[-6:-2]

[6, 3, 5, 6]

![alt text](images/slicing.png "Illustration of Python slicing conventions")

A __step__ can also be used after a second colon to, say, take every other element:

In [100]:
seq[::2]

[7, 3, 3, 6, 1]

A clever use of this is to pass -1, which has the useful effect of reversing a list or tuple:

In [101]:
seq[::-1]

[1, 0, 6, 5, 3, 6, 3, 2, 7]

#### Built-in Sequence Functions

##### `enumerate`

Returns a sequence of _(i, value)_ tuples:

In [102]:
some_list = ["foo", "bar", "baz"]
mapping = {}

In [103]:
for i, v in enumerate(some_list):
    mapping[v] = i

In [104]:
mapping

{'foo': 0, 'bar': 1, 'baz': 2}

#### `sorted`

Returns a _new_ sorted list from the elements of any sequence:

In [105]:
sorted([7, 1, 2, 6, 0, 3, 2])

[0, 1, 2, 2, 3, 6, 7]

In [106]:
sorted("horse race")

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

#### `zip`

“pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples:

In [107]:
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
seq3 = [False, True]

In [108]:
dict(zip(seq1, seq2))

{'foo': 'one', 'bar': 'two', 'baz': 'three'}

In [109]:
zipped = zip(seq1, seq2)

In [110]:
zipped

<zip at 0x7f97bc506320>

In [111]:
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

**zip** can take an arbitrary number of sequences, and the number of elements it produces is determined by the _shortest_ sequence:

In [112]:
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

A very common use of zip is simultaneously iterating over multiple sequences, possibly also combined with enumerate:

In [113]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print("{0}: {1}, {2}".format(i, a, b))

0: foo, one
1: bar, two
2: baz, three


Given a “zipped” sequence, zip can be applied in a clever way to “unzip” the
sequence. Another way to think about this is converting a list of rows into a list of
columns. The syntax, which looks a bit magical, is:

In [114]:
pitchers = [("Nolan", "Ryan"), ("Roger", "Clemens"), ("Schilling", "Curt")]
first_names, last_names = zip(*pitchers)

In [115]:
first_names

('Nolan', 'Roger', 'Schilling')

In [116]:
last_names

('Ryan', 'Clemens', 'Curt')

#### `reversed`

iterates over the elements of a sequence in reverse order:

In [117]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

**Keep in mind that reversed is a generator, so it does not create the reversed sequence until materialized (e.g., with list or a for loop).**

### Dictionaries

It is a flexibly sized collection of key-value pairs, where key and value are Python objects. One approach for creating one is to use
curly braces **{ }** and colons to separate keys and values:

In [118]:
empty_dict = {}

In [119]:
d1 = {"a": "some value", "b": [1, 2, 3, 4]}

In [120]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

You can access, insert, or set elements using the same syntax as for accessing elements
of a list or tuple:

In [121]:
d1[7] = "an integer"

In [122]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [123]:
d1["b"]

[1, 2, 3, 4]

You can check if a dict contains a key using the same syntax used for checking whether a list or tuple contains a value:

In [124]:
"b" in d1

True

You can delete values either using the **del** keyword or the **pop** method (which simultaneously returns the value and deletes the key):

In [125]:
d1[5] = "some value"

In [126]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}

In [127]:
d1["dummy"] = "another value"

In [128]:
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}

In [129]:
del d1[5]

In [130]:
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 'dummy': 'another value'}

In [131]:
ret = d1.pop("dummy")

In [132]:
ret

'another value'

In [133]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

The `keys()` and `values()` methods give you iterators of the dict's keys and values, respectively. While the key-value pairs are not in any particular order, these functions output the keys and values in the same order:

In [134]:
list(d1.keys())

['a', 'b', 7]

In [135]:
list(d1.values())

['some value', [1, 2, 3, 4], 'an integer']

You can merge one dict into another using the `update()` method:

In [136]:
d1.update({"b": "foo", "c": 12})

In [137]:
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

**The _update_ method changes dicts _in-place_, so any existing keys in the data passed to update will have their old values discarded.**

##### Creating dicts from sequences

```python
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
```

In [138]:
zip(range(5), reversed(range(5)))

<zip at 0x7f97bc4a9230>

In [139]:
list(zip(range(5), reversed(range(5))))

[(0, 4), (1, 3), (2, 2), (3, 1), (4, 0)]

In [140]:
mapping = dict(zip(range(5), reversed(range(5))))

In [141]:
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

##### Default values

The dict methods **get** and **pop** can take a _default_ value to be returned:

```python 
some_dict.get(key, default_value)
```

In [143]:
d1.get("no_key", "I'm the default value")

"I'm the default value"

`get()` by default will return **None** if the key is not present, while `pop()` will raise an exception.

With setting values, a common case is for the values in a dict to be other collections, like lists. For example, you could imagine categorizing a list of words by their first letters as a dict of lists:

In [151]:
words = ["apple", "bat", "bar", "atom", "book"]

In [152]:
by_letter = {}
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

In [153]:
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The `setdefault()` method is for precisely this purpose. The preceding for loop can be rewritten as:

In [154]:
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)

In [155]:
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The built-in _collections_ module has a useful class, `defaultdict`, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict:

In [156]:
from collections import defaultdict

In [157]:
by_letter = defaultdict(list)

In [158]:
by_letter

defaultdict(list, {})

In [159]:
for word in words:
    by_letter[word[0]].append(word)

In [160]:
by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

In [161]:
by_letter["a"]

['apple', 'atom']

#### `defaultdict`

Imagine that you’re trying to count the words in a document. 

In [162]:
document = ["ali", "hosein", "ali", "moh", "ahmad"]

In [163]:
word_counts = {}
for word in document:
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1

In [164]:
word_counts

{'ali': 2, 'hosein': 1, 'moh': 1, 'ahmad': 1}

You could also use the “forgiveness is better than permission” approach and just handle the exception from trying to look up a missing key:

In [165]:
word_counts = {}
for word in document:
    try:
        word_counts[word] += 1
    except KeyError:
        word_counts[word] = 1

In [166]:
word_counts

{'ali': 2, 'hosein': 1, 'moh': 1, 'ahmad': 1}

A third approach is to use **get**, which behaves gracefully for missing keys:

In [167]:
word_counts = {}
for word in document:
    previous_count = word_counts.get(word, 0)
    word_counts[word] = previous_count + 1

A `defaultdict` is like a regular dictionary, except that when you try to look up a key it doesn't contain, it first adds a value for it using a zero-argument function you provided when you created it. In order to use `defaultdict`, you have to import them from `collections`:

In [168]:
from collections import defaultdict

word_counts = defaultdict(int)  # int() produces 0
for word in document:
    word_counts[word] += 1

In [169]:
word_counts

defaultdict(int, {'ali': 2, 'hosein': 1, 'moh': 1, 'ahmad': 1})

They can also be useful with *list* or *dict* or even your own functions:

In [170]:
dd_list = defaultdict(list)  # list() produces an empty list
dd_list[2].append(1)  # now dd_list contains {2: [1]}

In [171]:
dd_dict = defaultdict(dict)  # dict() produces an empty dict
dd_dict["Joel"]["City"] = "Seattle"  # { "Joel" : { "City" : Seattle"}}

In [172]:
dd_pair = defaultdict(lambda: [0, 0])
dd_pair[2][1] = 1  # now dd_pair contains {2: [0,1]}

#### `Counter`

A `Counter` turns a sequence of values into a defaultdict(int)-like object mapping keys to counts. We will primarily use it to create histograms:

In [173]:
from collections import Counter

c = Counter([0, 1, 2, 0])

In [174]:
c

Counter({0: 2, 1: 1, 2: 1})

In [175]:
word_counts = Counter(document)

In [176]:
word_counts

Counter({'ali': 2, 'hosein': 1, 'moh': 1, 'ahmad': 1})

In [177]:
# print the 10 most common words and their counts
for word, count in word_counts.most_common(10):
    print(word, count)

ali 2
hosein 1
moh 1
ahmad 1


##### Valid dict key types

While the **values of a dict can be any Python object**, the **keys generally have to be immutable objects** like scalar types (int, float, string) or tuples (all the objects in the
tuple need to be immutable, too). The technical term here is **hashability**. You can check whether an object is hashable (can be used as a key in a dict) with the hash function:

In [178]:
hash("string")

5479766956953368327

In [179]:
hash((1, 2, (2, 3)))

1097636502276347782

In [180]:
hash((1, 2, [2, 3]))

TypeError: unhashable type: 'list'

To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can:

In [181]:
d = {}

In [182]:
d[tuple([1, 2, 3])] = 5

In [183]:
d

{(1, 2, 3): 5}

## sets

A **set** is an unordered collection of *unique* elements. You can think of them like dicts,
but keys only, no values. A set can be created in two ways: via the `set` function or via
a set _literal_ with curly braces:

In [184]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [185]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

Sets support mathematical set operations like _union, intersection, difference, and symmetric difference_.

In [186]:
a = {1, 2, 3, 4, 5}

In [187]:
b = {3, 4, 5, 6, 7, 8}

In [188]:
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [189]:
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

In [190]:
a.intersection(b)

{3, 4, 5}

In [191]:
a & b

{3, 4, 5}

![alt text](images/set_operations.png "Set operations")

All of the logical set operations have in-place counterparts, which enable you to replace the contents of the set on the left side of the operation with the result. For very large sets, this may be more efficient:

In [192]:
c = a.copy()

In [193]:
c

{1, 2, 3, 4, 5}

In [194]:
c |= b

In [195]:
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [196]:
d = a.copy()

In [197]:
d &= b

In [198]:
d

{3, 4, 5}

Like dicts, set elements generally must be immutable. To have list-like elements, you must convert it to a tuple:

In [199]:
my_data = [1, 2, 3, 4]

In [200]:
my_set = {tuple(my_data)}

In [201]:
my_set

{(1, 2, 3, 4)}

You can also check if a set is a subset of or a superset of another set:

In [202]:
a_set = {1, 2, 3, 4, 5}

In [203]:
{1, 2, 3}.issubset(a_set)

True

In [204]:
a_set.issuperset({1, 2, 3})

True

Sets are equal if and only if their contents are equal:

In [205]:
{1, 2, 3} == {3, 2, 1}

True

## List, Set, and Dict Comprehensions

General form:

```python
[<expr> for <val> in <collection> if <condition>]
```

```python
[expr for val in collection if condition]
```

This is equivalent to the following for loop:

```python
result = []
for val in collection:
    if condition:
        result.append(expr)
```

In [206]:
strings = ["a", "as", "bat", "car", "dove", "python"]

In [207]:
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

Set and dict comprehensions: 

```python
dict_comp = 
{key-expr : value-expr for value in collection if condition}
```

```python
set_comp = {expr for value in collection if condition}
```

In [209]:
unique_lengths = {len(x) for x in strings}

In [210]:
unique_lengths

{1, 2, 3, 4, 6}

In [211]:
{x: len(x) for x in strings}

{'a': 1, 'as': 2, 'bat': 3, 'car': 3, 'dove': 4, 'python': 6}

We could also express this more functionally using the **map** function, introduced shortly:

In [212]:
set(map(len, strings))

{1, 2, 3, 4, 6}

In [213]:
loc_mapping = {val: index for index, val in enumerate(strings)}

In [214]:
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

### Nested list comprehensions

In [215]:
all_data = [
    ["John", "Emily", "Michael", "Mary", "Steven"],
    ["Maria", "Juan", "Javier", "Natalia", "Pilar"],
]

In [216]:
[name for name in all_data[0] if len(name) > 4]

['Emily', 'Michael', 'Steven']

Suppose we wanted to get a single list containing all names with two or more e’s in them. We could certainly do this with a simple for loop:

In [217]:
names_of_interest = []
for names in all_data:
    enough_es = [name for name in names if name.count("e") >= 2]
    names_of_interest.extend(enough_es)

In [218]:
names_of_interest

['Steven']

You can actually wrap this whole operation up in a single nested list comprehension:

In [219]:
result = [name for names in all_data for name in names if name.count("e") >= 2]

In [220]:
result

['Steven']

In [221]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [222]:
flattened = [x for tup in some_tuples for x in tup if x > 3]

In [223]:
flattened

[4, 5, 6, 7, 8, 9]

In [225]:
flattened = []
for tup in some_tuples:
    for x in tup:
        flattened.append(x)
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [226]:
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]