## Transforming Each Element of a Collection with a List Comprehension

Often, you want to do something to the data inside your collection--double it, increase it, compute some metric, etc.  
At the end, you still have the same *number* of elements, but the values themselves have changed.  
We will be looking at lots of ways to accomplish this in  Python, but first we're going to use a **for-loop** in a format called a **comprehension**.

Comprehensions produce a new collection containing new values *"for each"* value *in* the original collection.  They look like this:

```python
>>> data = [1, 2, 3]
>>> squared = [x ** 2 for x in data]
>>> squared
[1, 4, 9]
```

```python
>>> data = [1, 4, 25]
>>> roots = [math.sqrt(x) for x in data]
>>> roots
[1, 2, 5]
```

*Tip*: When writing list comprehensions, say to yourself, "Give me a list that does *function* to the item *element* for each item *element* in my collection *collection*

**Exercises**

**Tip**: copy-and-pasting may be faster, but it isn't great for making memories.  To learn this better, it's best to directly type each exercise out.

Get a list that added 1 to each value in this `data` list:


In [None]:
data = [-2, -1, 0, 1, 2, 3]

In [1]:
data = [-2, -1, 0, 1, 2, 3]
[el + 1 for el in data]

[-1, 0, 1, 2, 3, 4]

Get the absolute value of each element in this `data` list, using the built-in abs() function:

In [None]:
data = [-2, -1, 0, 1, 2, 3]

In [2]:
data = [-2, -1, 0, 1, 2, 3]
[abs(el) for el in data]

[2, 1, 0, 1, 2, 3]

Round all these numbers to the nearest integer (use the "round()" function):

In [3]:
data = [-2, -1, 0, 1, 2, 3]

In [4]:
data = [-2, -1, 0, 1, 2, 3]
[round(el) for el in data]

[-2, -1, 0, 1, 2, 3]

Get all the first letters of each name in this list:

In [None]:
names = ["John", "Harry", "Moe", "Luke"]

In [5]:
names = ["John", "Harry", "Moe", "Luke"]
[name[0] for name in names]

['J', 'H', 'M', 'L']

**Exercises with DNA**

Measure the lengths of each of these sequences:

In [None]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]

In [6]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]
[len(seq) for seq in seqs]

[7, 8, 11]

Get the first codon (first three nucleotides) from each of these sequences

In [None]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]

In [7]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]
[seq[0:3] for seq in seqs]

['GTA', 'GTA', 'GGT']

Count the number of "A" in each of these sequences:

In [None]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]

In [8]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]
[seq.count("A") for seq in seqs]

[2, 4, 3]

Make all these sequences formatted the same way:

In [None]:
seqs = ["GTAATCG", "gtaccaaa", "GGtAGtACCaC"]

In [9]:
seqs = ["GTAATCG", "gtaccaaa", "GGtAGtACCaC"]
[seq.upper() for seq in seqs]

['GTAATCG', 'GTACCAAA', 'GGTAGTACCAC']

Reverse each of these sequences:

In [None]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]

In [10]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]
[seq[::-1] for seq in seqs]

['GCTAATG', 'AAACCATG', 'CACCATGATGG']

## Filtering Collections in a List Comprehension

What if you only want to include *some* values in a collection?  With the **if** statement and a **logical expression**, you can do it in a comprehension!  For example:

```python
>>> data = [1, 2, 3, 4]
>>> [x for x in data if x > 2]
[3, 4]
```

This can be combined with various transformations as well!

```python
>>> data = ["John", "Harry", "Moe", "Luke"]
>>> [x[0] for x in data if len(x) < 5]
["J", "M", "L"]
```

**Example:** Get All positive values in the following list:

In [1]:
data = [-6, 3, -1, 10, -5, 0]
[el for el in data if el > 0]

[3, 10]

Make a list of all names in this list that start with the letter "L":

In [None]:
names = ["John", "Harry", "Moe", "Luke"]

In [2]:
names = ["John", "Harry", "Moe", "Luke"]
[name for name in names if name[0] == "L"]

['Luke']

Only keep sequences in the following with more than 1 Tyrosine

In [None]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]

In [3]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]
[seq for seq in seqs if seq.count("T") > 1]

['GTAATCG', 'GGTAGTACCAC']

Only keep sequences in the following that are shorter than 9 nucleotides:

In [None]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]

In [4]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]
[seq for seq in seqs if len(seq) < 9]

['GTAATCG', 'GTACCAAA']

Lower-case the sequences (i.e. make them have small letters), only keeping ones that end with Cytosine 

In [6]:
seqs = ["GTAATCG", "GTACCAAAC", "GGTAGTACCAC"]

In [8]:
seqs = ["GTAATCG", "GTACCAAAC", "GGTAGTACCAC"]
[seq.lower() for seq in seqs if seq[-1] == 'C']

['gtaccaaac', 'ggtagtaccac']

Remove the Missing data from this list

In [None]:
seqs = ["GTAATCG", None, None, "GTACCAAA", None, "GGTAGTACCAC", None]

In [9]:
seqs = ["GTAATCG", None, None, "GTACCAAA", None, "GGTAGTACCAC", None]
[seq for seq in seqs if seq is not None]

['GTAATCG', 'GTACCAAA', 'GGTAGTACCAC']

## Conditional Transformations 

Sometimes you want to transform different data differently, conditioned on some parameter in your analysis.  For example, what if you want to lowercase the names in the list than have 4 letters, and uppercase the other names?  

```
>>> data = ["John", "Harry", "Moe", "Luke"]
>>> [x.lower() if len(x) == 4 else x.upper() for x in data]
["john", "HARRY", "MOE", "luke"]
```

...and only keep the names that end in the letter "e"?

```
>>> data = ["John", "Harry", "Moe", "Luke"]
>>> [x.lower() if len(x) == 4 else x.upper() for x in data if x[-1] == 'e']
["MOE", "luke"]
```

**Exercises**

**Example**: If the second nucleotide is T in the following sequences, lowercase the letters.

In [16]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]
[seq.lower() if seq[1] == 'T' else seq for seq in seqs]

['gtaatcg', 'gtaccaaa', 'GGTAGTACCAC']

If these sequences are shorter than  8 nucleotides, make them twice as long.

In [11]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]

In [14]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]
[seq * 2 if len(seq) < 8 else seq for seq in seqs]

['GTAATCGGTAATCG', 'GTACCAAA', 'GGTAGTACCAC']

If these sequences end in "A", count the number of A in the sequence.  Else, count the number of Cs

In [None]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]

In [15]:
seqs = ["GTAATCG", "GTACCAAA", "GGTAGTACCAC"]
[seq.count("A") if seq[-1] == "A" else seq.count("C") for seq in seqs]

[1, 4, 3]

If this sequence length is divisible by 3, count the number of codons.  Else, show only `None` for that sequence

In [None]:
seqs = ["GTAATCG", "GTACCAAAC", "GGTAGTACCAC", "GCATTA"]


In [26]:
seqs = ["GTAATCG", "GTACCAAAC", "GGTAGTACCAC", "GCATTA"]
[int(len(seq) // 3) if not len(seq) % 3 else None for seq in seqs]

[None, 3, None, 2]

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=d703678d-f12c-4453-a422-685f3ee4b709' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>