# Comprehensions

## List comprehension - Clean loops

Realistically, we've just learned about loops and it's still all fresh and works perfectly well how we've done it. But there is also a quicker, cleaner, and more fancy way to do simple loops that saves on the typing and it's what is called list comprehension.

You probably want to stick with standard loops for now but lets have a quick look at how simple it is to look like an advanced python programmer:

Lets first square this list of numbers the classic way:

In [None]:
numbers = [2, 12, 9, -3, 25, -5]
squared_list = []

for num in numbers:
  squared_list.append(num ** 2)

print(squared_list)

But with a list comprehenson it can be compressed into one line and without making an empty list to append to first.\

In [None]:
squared_list  = [num ** 2 for num in numbers]

print(squared_list)

We can also put ```if``` statements in there. Lets only do it if a positive number:


In [None]:
squared_list  = [num ** 2 for num in numbers if num > 0]

print(squared_list)

We can do ```if``` & ```else``` in one statement too, although for some incomprehensible reason (to me), they then need to go at the beginning like so:

In [None]:
squared_list  = [num ** 2 if num > 0 else num * -1 for num in numbers]

print(squared_list)


Nice and clean!

---

As above, all of this can be done with long-form loop and if/else commands, but are elegent for a quick command. Lets look at something a bit more bioinformatically relevant.

In [None]:
# Convert a list of DNA sequences to uppercase
dna_sequences = ['atcgatcg', 'gtcaatcga', 'ctaggtac']
uppercase_sequences = [sequence.upper() for sequence in dna_sequences]
print("Uppercase DNA Sequences:", uppercase_sequences)

# Filter DNA sequences based on length
sequences = ['ATCG', 'ATCGATCGAT', 'AT', 'ATCGAT']
filtered_sequences = [seq for seq in sequences if len(seq) > 5]
print("Filtered DNA Sequences:", filtered_sequences)

And lets finish off with everyone's favourite, GC% calculation, all in one line:

In [None]:
# Calculate GC content of DNA sequences
sequences = ['ATCGATCG', 'CCGGTACGTA', 'GCGCGCGC', 'ATATATAT']

gc_contents = [((seq.count('G') + seq.count('C')) / len(seq)) * 100 for seq in sequences]

print("GC Content (%):", gc_contents)

## Dictionary comprehensions

The same way we can use comprehensions to loop through lists, we can do the same to make dictionaries in one line.

In [None]:
genes = ['TP53', 'BRCA1', 'EGFR', 'KRAS', 'MYC', 'PTEN', 'CDKN2A', 'AKT1', 'ERBB2', 'RB1']
exp = [(2.5, True), (1.8, True), (3.2, False), (0.9, True), (2.7, True), (1.2, False), (4.5, False), (2.0, True), (1.5, False), (3.8, True)]

genes_and_expression_dict = {key:value for key, value in zip(genes, exp)}

print(genes_and_expression_dict)


# Test the dictionary
test = "CDKN2A"
result = genes_and_expression_dict.get(test)

print(f"The gene {test} has expression of {result[0]}. PCR confirmed: {result[1]}")


{'TP53': (2.5, True), 'BRCA1': (1.8, True), 'EGFR': (3.2, False), 'KRAS': (0.9, True), 'MYC': (2.7, True), 'PTEN': (1.2, False), 'CDKN2A': (4.5, False), 'AKT1': (2.0, True), 'ERBB2': (1.5, False), 'RB1': (3.8, True)}
The gene CDKN2A has expression of 4.5. PCR confirmed: False
