# Illustrations of iterator methods from lecture

In this weeks lecture, some examples were given of useful iterator functions and methods. Here, I will demonstrate some example usage of those functions. You can use the code in this notebook as a framework to experiment yourself. Change the code here to explore how each function of method works.

## `enumerate()`

the function, `enumerate()` is useful in any case where you want access to both an element in an iterable and the index of that element. Whenever you find yourself writing the following, instead consider using `enumerate()`

In [1]:
x = ['a', 'b', 'c']
for i in range(len(x)):
    element = x[i]
    print(i, element)

0 a
1 b
2 c


In [2]:
for i, element in enumerate(x):
    print(i, element)

0 a
1 b
2 c


Note that enumerate is actually returning a `tuple` class. We haven't covered tuples in class. However, they are basically just immutable lists. `tuple`s are denoted using parentheses, and their elements are comma-separated, just like lists. e.g., `(1,2,3)`, or `('a', 'b', 'c')`. `tuples` can contain any type of object, you just can't change them after making them. As they are immutable, you can use them as `dict` keys.

Note that you can store mutable objects in a `tuple`. If you do, you can change the mutable object, even though you can't change *which* elements are in the `tuple`. If you put mutable things in a `tuple`, then you can not use it as a `dict` key.

In [3]:
for thing in enumerate(x):
    print(type(thing), thing)

<class 'tuple'> (0, 'a')
<class 'tuple'> (1, 'b')
<class 'tuple'> (2, 'c')


What you are doing when you put two variables with commas in between them in a statement like above, is using `tuple`s and "unpacking". `enumerate()` returns a `tuple` of values and you are providing a `tuple` of variable names. The first value is "unpacked" into the first variable, etc.

## `range()`

`range()` is useful whenever you want to iterate over a range of numerical values. You can also use `range()` to do something N times. A case in which `range()` would be useful to you in extending the alignment script that you wrote this week is in handling wrapping lines to fit in your terminal.

In [4]:
seq1 = "ATGCAAGTCGAGCGGATGAAGGGAGCTTGCTCCTGGATTCAGCGGCGGAC"
seq2 = "ATGCAAGTCGAGCGGCAGCACAGAGGAACCTTGGGTGGCGAGCGGCGGAC"
aln = "|||||||||||||||  | |  |||    ||   |     ||||||||||"

# wrap alignment to up to 30 characters per line
nchars = 30
for i in range(0, len(seq1), nchars):
    print(seq1[i: i+nchars])
    print(aln[i: i+nchars])
    print(seq2[i: i+nchars])
    print()


ATGCAAGTCGAGCGGATGAAGGGAGCTTGC
|||||||||||||||  | |  |||    |
ATGCAAGTCGAGCGGCAGCACAGAGGAACC

TCCTGGATTCAGCGGCGGAC
|   |     ||||||||||
TTGGGTGGCGAGCGGCGGAC



You could also wrap the text so it splits the text evenly over N lines

In [5]:
# wrap alignment to an even number characters per line
num_lines = 2
chars_per_line = len(seq1)//num_lines # floor division because we need an int for slicing
for i in range(0, len(seq1), chars_per_line):
    print(seq1[i: i+chars_per_line])
    print(aln[i: i+chars_per_line])
    print(seq2[i: i+chars_per_line])
    print()

ATGCAAGTCGAGCGGATGAAGGGAG
|||||||||||||||  | |  |||
ATGCAAGTCGAGCGGCAGCACAGAG

CTTGCTCCTGGATTCAGCGGCGGAC
    ||   |     ||||||||||
GAACCTTGGGTGGCGAGCGGCGGAC



If you are feeling fancy, you could also write something that would wrap text automatically depending on the size of the terminal you are writing to. You can query the terminal size using `os.get_terminal_size()` https://docs.python.org/3/library/os.html#os.get_terminal_size

## `zip()`

`zip()` is useful whenever you want to iterate over two or more iterables at the same time. This would be the easiest way to compare the bases in two sequences one by one.

In [6]:
aln = ""
for a, b in zip(seq1, seq2):
    if a == b:
        aln += "|"
    else:
        aln += " "

print(seq1)
print(aln)
print(seq2)

ATGCAAGTCGAGCGGATGAAGGGAGCTTGCTCCTGGATTCAGCGGCGGAC
|||||||||||||||  | |  |||    ||   |     ||||||||||
ATGCAAGTCGAGCGGCAGCACAGAGGAACCTTGGGTGGCGAGCGGCGGAC


## `reversed()`

As the name suggests, `reversed()` simply returns an iterable in reverse order. It's functionaly equivalent to a slice with a negative step.

In [7]:
x = [1, 2, 3]

for i in reversed(x):
    print(i)

3
2
1


In [8]:
for i in x[::-1]:
    print(i)

3
2
1


Note that `reversed` returns an iterable object so you can pair it with other functions described here. For example, to compare our sequences when one is reversed:

In [9]:
aln = ""
for a, b in zip(reversed(seq1), seq2):
    if a == b:
        aln += "|"
    else:
        aln += " "

print(seq1[::-1]) # reversed doesn't print nicely directly so I'm using a slice here
print(aln)
print(seq2)

CAGGCGGCGACTTAGGTCCTCGTTCGAGGGAAGTAGGCGAGCTGAACGTA
  |   |       |   | |    ||     |   |      |   |  
ATGCAAGTCGAGCGGCAGCACAGAGGAACCTTGGGTGGCGAGCGGCGGAC


## `dict` methods

The next examples of useful iteration functions/methods are all `dict` methods, `dict.keys()`, `dict.values()`, and `dict.items()`. They simply allow you to iterate over the keys, values, or key-value pairs stored in a `dict`.

`.keys()` returns the keys in a `dict` in the order in which they were added to the `dict`

In [10]:
my_dict = {"a": 1, "b": 2, "c": 3}

for k in my_dict.keys():
    print(k)

a
b
c


Note that the `dict.keys()` method is the default iteration method used by the `dict` class. What that means is that if you set up a loop where you are iterating over an instance of `dict`, but don't specify the method to use, the `dict` class is written to use the `.keys()` method. We'll talk more about how that works when we cover classes.

In [11]:
for k in my_dict:
    print(k)

a
b
c


`.values()` returns the values in a `dict` in the order in which their associated keys were added

In [12]:
for v in my_dict.values():
    print(v)

1
2
3


`.items()` returns the key-value pairs in a `dict` in the order in which the keys were added

In [13]:
for i in my_dict.items():
    print(i)

('a', 1)
('b', 2)
('c', 3)


As you can see, `.items()` returns a `tuple`, of the key and value. Accordingly you can unpack the values into a `tuple` of variable names

In [14]:
for k, v in my_dict.items():
    print(k, v)

a 1
b 2
c 3


As we just saw, `dict` has a few methods to allow you to iterate over its contents in different ways. However, only one of those methods, `.keys()` is the default iter method. It might have made sense for the default to be `.items()`. However, the author of the `dict` class chose `.keys()`. 

The two takeaways from this section:
1. The default iter method of `dict` is `.keys()`
2. You can set the iter method of classes when you write them.

## itertools

the itertools module includes a variety of functions for iterating. We'll have a look at a few here. [Check out the full list on the documentation page](https://docs.python.org/3/library/itertools.html?highlight=itertools#module-itertools)

## `itertools.combinations()`

In cases where you want to do all the pairwise comparisons between elements in a collection of data, this is the function to use.

In [15]:
import itertools

foods = ["icecream", "corned beef", "kimchi"]

for a, b in itertools.combinations(foods, 2):
    print(f"I love {a} and {b} smoothies")

I love icecream and corned beef smoothies
I love icecream and kimchi smoothies
I love corned beef and kimchi smoothies


## `itertools.permutations()`

permutations are the same as combinations except order matter. e.g., combinations of "A" and "B" is just AB, while permutations is AB and BA.

In [16]:
for a, b in itertools.permutations(foods, 2):
    print(f"I love {a} flavoured {b}")

I love icecream flavoured corned beef
I love icecream flavoured kimchi
I love corned beef flavoured icecream
I love corned beef flavoured kimchi
I love kimchi flavoured icecream
I love kimchi flavoured corned beef


## `itertools.product()`

When you want to get all the combinations of elements from two iterables, `itertools.product()` is what you want. It is similar to nested `for` loops over the iterables.

In [17]:
blast_hits = ["hit1", "hit2"]
bed_features = ["ft1", "ft2"]

for hit, feat in itertools.product(blast_hits, bed_features):
    print(f"Is {hit} in {feat}?")

Is hit1 in ft1?
Is hit1 in ft2?
Is hit2 in ft1?
Is hit2 in ft2?
