# Loops: Describing How to Repeat Steps in a Program

## For Loops:

Like R and Matlab, Python makes it straightford repeat commands "For-Each" element in a data collection.  For example, to print each element in a list:

```python
names = ['Ted', 'Roy', 'Keeley']
for name in names:
   print(name)
# prints "Ted" then "Roy" then "Keeley"
```
.



**Exercises**

Print 10 times each number in the list 
For example, `[2, 4]` should print `20` then `40`.

In [None]:
nums = [4, 8, 10, 5]

Print the first letter of each name in the list

In [None]:
names = ["John", "Harry", "Moe", "Luke"]

Print the file extension of each file in the list

*Hint*: the pathlib.Path class is useful for this

In [None]:
filenames = ['virus.fasta', 'birthday.jpg', 'hospital.csv', 'letter.pdf']


## List-Append Loop Pattern

This can also be used to create new data collections.  For example:

```python
old_names = ['ted', 'roy', 'keeley']
new_names = []
for name in old_names:
   new_names.append(name.title())
```

**Exercises**

Make a list with the first codon of each sequence

In [None]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]

Clean up the data.  Make a list with that formats each sequence the same way.

In [None]:
seqs = ["GTAATCG", "gtaccaaa", "GGtAGtACCaC"]

Make a list of the number of A in each sequence

In [None]:
seqs = ["GTAATCG", "gtaccaaa", "GGtAGtACCaC"]

Make a list sequences from the list of sequences combinations.


In [None]:
seqs = ['GCAGA GATATC', 'GGTAAAA ACTAGA GGTATA', 'GGTAA']

## Sequence-Unpacking Loop Pattern

Loops are also useful when you have a collection of collections (e.g. a list of lists).
When it's a collection of same-length sequences, you can break apart each sequence inside the loop and work with it:

```python
pairs = [[4, 5], [7, 8], [2, 9]]
sums = []
for pair in pairs:
    first = pair[0]
    second = pair[1]
    sums.append(first + second)
```
.

This is so common, that python provides some syntactic sugar (i.e. a shortcut) for breaking known-length sequences apart, called **"Unpacking"**
.

```python
pairs = [[4, 5], [7, 8], [2, 9]]
sums = []
for pair in pairs:
    first, second = pair
    sums.append(first + second)
```
.

This can even be done in the header of the for-loop!

```python
pairs = [[4, 5], [7, 8], [2, 9]]
sums = []
for first, second in pairs:
    sums.append(first + second)
```

**Exercises**

Make a list of the sum of each number pair

In [None]:
pairs = [[4, 7], [7, 2], [10, 2], [5, 9]]

Write each sequence to its corresponding a file

In [None]:
dataset = [
    ('seq1.txt', 'GACCAGTA'),
    ('seqA.txt', 'GGAGAGTATAC'),
    ('myseq.txt', 'GTTTAAC'),
]

Make a list of Trues and Falses, saying whether the first sequence (the wild-type) in each triplet has more G's than both the second and third sequence in the triplet (the two experimental sequences)

In [None]:
seq_collections = [
    ('GACAGGAGATTA', 'GACCAGATA', 'GCCAGAGGATAA'),
    ('gacccatagag', 'CAGATAcaga', 'GAGGAACCaca'),
    ('ACCAGATA', 'GAGAAAGACCA', 'CCAGAGATATTA'),
    ('AGGGACCCCA', 'CGCCCACCACCG', 'CCCATTATC'),
]


## The Zip Loop Pattern

Most of the times, you don't have a collection of pairs--sometimes, you need to make that collection yourself before you can loop over them.  The `zip()` function makes this straightforward!

```python
names = ['Zanarah', 'Joe', 'Weiwei',]
ages = [20, 21, 22]
combined = list(zip(names, ages))  # [('Zanara', 20), ('Joe', 21), ...]

for name, ages in combined:
    print(name, age)
```
.

To make it more concise, This can also be done inside the header of the for-loop:

```python
names = ['Zanarah', 'Joe', 'Weiwei',]
ages = [20, 21, 22]
for name, ages in zip(names, ages)):
    print(name, age)
```
.




**Exercises**

Add each pair of numbers

In [None]:
firsts = [1, 2, 3, 4, 5]
seconds = [10, 20, 30, 40, 50]

Print the patient number and treatment group of each patient
(e.g. "Patient 32341: control")

In [None]:
patients = [32451, 435679, 4211235, 123121]
groups = ['control', 'treatment', 'treatment', 'control']


Compare the number of Cs in each sequence.  Does the first have more or less than the second? 

In [None]:
firsts = ['GAGATTACA', 'CAGATGATA', 'GGAGGACCAAG']
seconds = ['GGAACCAA', 'CACAGGAGA', 'GATATAACA']

Write these sequences to filenames named after their corresponding animal id (e.g. `324` becomes `'324.txt'`)

In [None]:
animals = [123, 342, 543, 654]
seqs = ['GADCAG', 'CADFAAD', 'GGGGCVAGDA', 'GGDADCA']

Write all the sequences of each animal to filenames named after their corresponding animal id (e.g. 324 becomes '324.txt')

In [None]:
animals = [123, 342, 543, 654]
seqs = [
    ('GACAG', 'GCCAGT'), 
    ('CAAA', 'GGGGCAGA', 'GGACA'),
    ('GGGATATCA', 'CCACAGATA', 'GGACAAATA'),
    ('GCCATATA', 'CAACTTTATA'),
]

## Enumerate Pattern

Sometimes you want to store the **index** of items in a sequence.  You could calculate this in a loop:

```python
bundesländer = ['Baden-Württemberg', 'Bayern', 'Thuringen']
idx = 0
for bundesland in bundesländer:
    print(idx, bundesland)  # prints 0 Baden-Württemberg
    idx += 1
```
.

Python's `enumerate()` function generates a list of (index, element) pairs:

```python
bundesländer = ['Baden-Württemberg', 'Bayern', 'Thuringen']
indices_bundesländer = list(enumerate(bundesländer))  # [(0, 'Baden-Württemberg'), ...]
for idx, bundesland in indices_bundesländer:
    print(idx, bundesland)  # prints 0 Baden-Württemberg
```
.

Like with `zip()`, this can be shortened by just putting it in the header of the for loop:

```python
bundesländer = ['Baden-Württemberg', 'Bayern', 'Thuringen']
for idx, bundesland in enumerate(bundesländer):
    print(idx, bundesland)  # prints 0 Baden-Württemberg
```
.



**Exercises**

## Dictionaries (a.k.a. `Dicts`)

Python has a valuable data structure used for  associating pairs of values: the `dict`.
Each `item` in a `dict` has two parts: the `key` and the `value`.  For example:

```python
phone_numbers = {'Ben': '+49123343282', 'Julie': '+45328472022'}
```
.

In this example, the names are the keys:

```python
>>> phone_numbers.keys()
dict_keys(['Ben', 'Julie'])
```
.

The numbers are the values:

```python
>>> phone_numbers.values()
dict_values(['+49123343282', '+45328472022'])
```
.

Altogether, the pairs are `items`:
```python
>>> phone_numbers.items()
dict_items([('Ben', '+49123343282'), ('Julie', '+45328472022')])
```



In [None]:
phone_numbers = {'Ben': '+49123343282', 'Julie': '+45328472022'}
phone_numbers.items()

dict_items([('Ben', '+49123343282'), ('Julie', '+45328472022')])

## Looping through Dicts

Keys and Values in `dicts` work the same as with lists.  To get both to work like a zipped list, use `items()`:

```python
phone_numbers = {'Ben': '+49123343282', 'Julie': '+45328472022'}
for name in phone_numbers.keys():
    print(name)

for number in phone_numbers.values():
    print(number)

for name, number in phone_numbers.items():
    print(name, number)
```


**Exercises**

Write a loop that Prints each sequence in this dict:

In [None]:
seqs = {'Monday': "CATAACA", 'Tuesday': "GCCGTG", 'Wednesday': "CCATAAA"}


Print both the day of the week and the sequence in a loop

In [None]:
seqs = {'Monday': "CATAACA", 'Tuesday': "GCCGTG", 'Wednesday': "CCATAAA"}

Let's look at some richer data structures, combining dicts and lists together:

For each sequence, print the bacteria id next to it.  For example:
```
BacteriaID Sequence
3252 GCCAGGA
3252 CCCAGGA
1466 CAAGATGA
...
```

In [None]:
seqs = {
    3252: ["GCCAGGA", "CCCAGGA"],
    1466: ["CAAGATGA", "CCACATA"],
    6223: ["CCAACTAG", "CACCAC"],
}
print("BacteriaID", "Sequence")

BacteriaID Sequence


Using the same dict, print the total number of C's in all sequences for each bacteria
For example:
```
BacteriaID TotalC 
3252 5
...
```

print the number of Ts in all of the sequences, totalled

```Total T Count: 3```

For each sequence, print the bacteria id and Day of Week next to it.  For example:
```
BacteriaID Sequence DayOfWeek
3252 GCCAGGA Monday
3252 CCCAGGA Monday
1466 CAAGATGA Monday
...
```

In [None]:
seqs = [
    {'Monday': {3252: "GCCAGGA", 1466: "CAAGATGA", 6223: "CCAACTAG"}},
    {'Tuesday': {3252: "CCCAGGA", 1466: "CCACATA", 6223: "CACCAC"}},
]


Write the above to a file called "sequences.tsv"