# Built-In Data Structures, Functions, and Files

This notebook is based on [Chapter 3](https://wesmckinney.com/book/python-builtin) of *Python for Data Analysis (3rd ed.)* by *Wes Mckinney*.

## Built-In Data Structures

### Built-In Sequence Functions

__`enumerate()` function__

*The __`enumerate()`__ function takes a collection (e.g. a tuple) and retruns it as an enumerate object. It adds a counter as the key of the enumerate object.*

In [11]:
# Use the enumerate() function
x = ('apple', 'banana', 'cherry')
y = enumerate(x)

In [12]:
y

<enumerate at 0x24ab8f062a0>

In [13]:
print(list(y))

[(0, 'apple'), (1, 'banana'), (2, 'cherry')]


*It's common when iterating over a sequence to want to keep track of the index of the current item. A do-it-yourself approach would look like:*

```Python
# do-it-yourself approach
index = 0
for value in collection:
    # do something with value
    index += 1
```

*Python has a built-in function, __`enumerate`__, which returns a sequence of __`(i, value)`__ tuples and can simply the above code.*

```Python
# Use enumerate() function
for index, value in enumerate(collection):
    # do something with value
```

In [14]:
# Example
for family in enumerate(['Lok Lok', 'Ka Ka', 'Bailey', 'Mui Mui', 'Moji']):
    print(family)

(0, 'Lok Lok')
(1, 'Ka Ka')
(2, 'Bailey')
(3, 'Mui Mui')
(4, 'Moji')


__`sorted()` function__

*The __`sorted()`__ function returns a new sorted list from the elements in any sequence.*

In [15]:
# sorted() function
sorted([7, 1, 2, 6, 0, 3, 2])

[0, 1, 2, 2, 3, 6, 7]

In [16]:
# sorted() function
sorted("horse race")

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

__`zip()` function__

*__`zip()`__ "pairs" up the elements of a number of lists, tuples, or other sequences to create a list of tuples.*

In [17]:
# Using zip() to pair up list
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
zipped = zip(seq1, seq2)

In [18]:
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

*__`zip()`__ can take an arbitrary number of sequences, and the number of elements it produces is determined by the shortest sequence.*

In [19]:
# Pairs up more than two series
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

*A common use of __`zip()`__ is simultaneously iterating over multiple sequences, possibly also combined with __`enumerate`__.*

In [20]:
# Use zip() and enumerate() together
for index, (a, b) in enumerate(zip(seq1, seq2)):
    print(f"{index}: {a}, {b}")

0: foo, one
1: bar, two
2: baz, three


__`reversed()` function__

*__`reversed()`__ iterates over the elements of a sequence in reverse order.*

In [21]:
# Reverse a sequence
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

*Keep in mind that __`reversed()`__ is a generator, so it does not create the reversed sequence until materialized (e.g., with `list` or a `for` loop).*

### List, Set, and Dictionary Comprehensions

__List comprehensions__

*__List comprehensions__ are a convenient and widely used Python language feature. They allow you to concisely form a new list by filtering the elements of a collection, transforming the elements passing the filter into one concise expression. They take the basic form:*

```Python
# List comprehension basic form
[expr for value in collection if condition]
```

*This is equivalent to the following `for` loop:

```Python
# List comprehension using for loop
result = []
for value in collection:
    if condition:
        result.append(expr)
```

*For example, given a list of strings, we could filter out strings with length `2` or less and convert them to uppercase.*

In [22]:
# List comprehensions example
strings = ["a", "as", "bat", "car", "dove", "python"]
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

__Set and dictionary comprehensions__

*__Set__ and __dictionary comprehensions__ are a natural extension, producing sets and dictionaries in an idiomatically similar way instead of lists.*

*A __dictionary comprehensions__ looks like this:*

```Python
# Dictionary comprehension
dict_comp = {key-expr: value-expr for value in collection if condition}
```

*A __set comprehension__ looks like the equivalent __list comprehension__ except with curly braces instead of square brackets:*

```Python
# Set comprehension
set_comp = {expr for value in collection if condition}
```

*Suppose we wanted a set containing just the lengths of the strings contained in the collection; we could easily compute this using a __set comprehension__.*

In [23]:
# Set comprehension
unique_length = {len(x) for x in strings}

In [24]:
unique_length

{1, 2, 3, 4, 6}

*We could also express this more functionally using the __`map()`__ function.*

In [25]:
# Set comprehension using map() function
set(map(len, strings))

{1, 2, 3, 4, 6}

*As a simple __dictionary comprehension__ example, we could create a lookup map of thes strings for their locations in the list.*

In [26]:
# Dictionary comprehension
loc_mapping = {value: index for index, value in enumerate(strings)}

In [27]:
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

__Nested list comprehensions__

*Suppose we have a list of lists containing some English and Spanish names.*

In [28]:
# List of lists of names
all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
            ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]

*Suppose we wanted to get a single list containing all names with two or more `a`'s in them. We could certainly do this with a simple `for` loop:*

In [29]:
# Using for loop
names_of_interest = []
for names in all_data:
    enough_as = [name for name in names if name.count("a") >= 2]
    names_of_interest.extend(enough_as)

In [30]:
names_of_interest

['Maria', 'Natalia']

*You can actually wrap this whole operation up in a single __nested list comprehension__, which will look like:*

In [31]:
# Nested list comprehension
result = [name for names in all_data for name in names if name.count("a") >= 2]

In [32]:
result

['Maria', 'Natalia']

*The `for` parts of the __list comprehension__ are arranged according to the order of nesting, and any filter condition is put at the end as before.*

*Here is another example where we "flatten" a list of tuples of integers into a simple list of integers.*

In [33]:
# Flatten a list of tuples
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]

In [34]:
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

*The order of the `for` expressions would be the same if you wrote a nested `for` loop instead of a list comprehension.*

In [35]:
# Nested for loop
flattened = []
for tup in some_tuples:
    for x in tup:
        flattened.append(x)

In [36]:
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

*Finally, it is also perfectly valid to have a __list comprehension__ inside a __list comprehension__.*

In [37]:
# List comprehension inside a list comprehension
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]