# <center> Python Tutorial Session 2 </center>

## What we covered in our previous session:
- Print/Input 
- Variables
- Function
- Loop
- List
- Dictionary
- Set
- String Manipulation

## <font color=green>Table of Contents II </font>
 - [Error Types and Error Handling](#errors)
 - [List Comprehension](#listcomp)
 - [Dictionary Comprehension](#Dictionarycomp)
 - [Lambda Functions](#LambdaFunc)
 - [Importing Modules](#Import)
 - [The Math module](#math)
 - [The Collections module](#collections)
 - [The OS module](#os)
 - [Working with random numbers](#random)
 - [Working with Time](#time)
 - [Regular Expressions](#re)
 - [Bioinformatics Examples](#Bioinformatics)
 - [What we covered](#Conclusion)

We are going to cover some of these intermediate concepts today.  
List Comprehension and Dictionary Comprehension allows us to generate new lists and Dictionaries using a more compact syntax.

## <a id="errors"> Different Error Types </a>

Depending on the errors in your script, python generates different error types.  This can be helpful for debugging your script.

#### <b> Syntax Error </b>
This occurs when you do not follow the syntax/rule of the command that you are using.  A good example of this is using the print function without the parenthesis.  In this case, the error output also lets you know that you forgot the parenthesis.  


In [1]:
print "Hello World"

SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)? (796388850.py, line 1)

The next common error that can occur is an index error.  Suppose your list has 10 numbers, and you try to print the 11th index, this results in an index error, as follows.  The output lets you know that the list index is out of range.

In [2]:
a=[1,2,3,4,5]
print(a[5])

IndexError: list index out of range

Key Error is thrown when a key is not found in a dictionary.

In [3]:
dict_a={"a":"a","b":"b","c":"c","d":"d"}
print(dict_a['h'])

KeyError: 'h'

TypeError is thrown when the types don't match.  For example, one value is string, while the other is a float or integer.

In [4]:
"a"+3

TypeError: can only concatenate str (not "int") to str

A value error is thrown when an argument to a function is of inappropriate type.  Here we tried converting "apple" to integer using the int() function.  This will not work.  Also, if you pass a list to a dictionary, that will generate a value error as well.

In [5]:
int("apple")

ValueError: invalid literal for int() with base 10: 'apple'

In [6]:
dict([1,2,3])

TypeError: cannot convert dictionary update sequence element #0 to a sequence

There is division by 0 error, which occurs when you divide by 0, as the name implies.

In [7]:
3/0

ZeroDivisionError: division by zero

How do we handle errors in python?

The try-except statements can be used to handle different error types. 

The way it works is, the program first tries to execute the try block.  If there are no errors, the value of j is set to True, and the loop ends.  On the other hand, if your number is a 0, then it generates a ZeroDivisionError.  This then prints the appropriate message(i.e. the number cannot be 0).  On the other hand, if the number you input is of an incorrect type, this generates a ValueError.  Note that there can be many except blocks.

In [9]:
j=False
while(j==False):
    try:
        j=int(input("Enter a number: "))
        print(10/j)
        j=True
    except ZeroDivisionError:
        print("The number cannot be 0")
    except ValueError:
        print("The number is not an integer")
    

Enter a number: 
The number is not an integer
Enter a number: 2
5.0


We can extend the try except block by also including the "else" and "finally" blocks. The else block is executed if there are no error messages.  The "finally" block is executed regardless of the error message.  Let's extend the code above to include these two blocks.

In [10]:
j=False
ct=0
while(j==False):
    try:
        j=int(input("Enter a number: "))
        k=(10/j)
    except ZeroDivisionError:
        print("The number cannot be 0")
    except ValueError:
        print("The number is not an integer")
    else:
        j=True
        print(k)
    finally:
        ct=ct+1
        print(f"Iteration # {ct}")

Enter a number: "apple"
The number is not an integer
Iteration # 1
Enter a number: 0
The number cannot be 0
Iteration # 2
Enter a number: 3
3.3333333333333335
Iteration # 3


There are many other error types generated by Python, which I have not listed here.  If you google "different error types in python" there are many hits.

There is also a raise command if you want to raise an error message.  Suppose you want the values to be greater than 3, but it is less than 3.  You can raise an appropriate error message.

In [13]:
j=int(input("What is the value of x? "))
if(j<3 and j>=0):
    raise SyntaxError("The value is less than 3")
if(j<0):
    raise TypeError("negative value not allowed.")

What is the value of x? -3


TypeError: negative value not allowed.

## <a id="listcomp"> List Comprehension </a>
A list comprehension is basically a way to use a loop and conditions within square ([]) brackets.  
In other words, it is a way to quickly loop through a list and generate a new list that meets certain criteria.
A list comprehension follows the following syntax, and it is within a square bracket:

<i> <b>[Expression for item in iterable if condition] </i></b>

The Expression can be the same variable as item, but it can be something else.

An iterable can be a string, a tuple, or any other iterable that you can think of.

In [14]:
j=[i for i in range(10)]
print(j)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Obviously, there is a different way to generate a list without relying on list comprehension.

In [15]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Which is a bit more compact than the list comprehension above. Let's first look at how to generate  even number from 1 to 10 using a traditional for loop.

In [16]:
for i in range(1,11):
    if i%2==0:
        print(i)

2
4
6
8
10


Here is a more compact way by using list comprehension.

In [17]:
[j for j in range(1,11) if j%2==0]

[2, 4, 6, 8, 10]

How about prime number.

In [18]:
[j for j in range(1,20) if 0 not in [j%k for k in range(2,j)]]

[1, 2, 3, 5, 7, 11, 13, 17, 19]

It is also easy to make a script that print non-prime numbers as follows.

In [19]:
[j for j in range(20) if 0 in [j%k for k in range(2,j)]]

[4, 6, 8, 9, 10, 12, 14, 15, 16, 18]

Here, the second nested loop comprehension tests to see if each value of j is divisible by a number >1.

What if want to convert each value in a list to a string.  The following will not work.

In [20]:
a=[1,2,3,4,5]
b=str(a)
b

'[1, 2, 3, 4, 5]'

This can be done in a single line using list comprehension

In [21]:
[str(j) for j in a]

['1', '2', '3', '4', '5']

Some other examples.  Repeat a number or character multiple times

In [22]:
print([1 for i in range(10)])
print(['orange' for i in range(5)])

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
['orange', 'orange', 'orange', 'orange', 'orange']


## <a id="Dictionarycomp">Dictionary Comprehension </a>

Dictionary Comprehensions are similar to List Comprehensions, except that it creates a dictionary rather than a list.

The syntax is:

<b></i>{key:value for (key,value) in iterable} </b></i>

A basic example of this:

In [23]:
{key:value for (key,value) in ((1,2),(3,4),(5,6))}

{1: 2, 3: 4, 5: 6}

We can combine this we zip as follows:

In [24]:
keys=['fruit','vegetable']
values=[['apple','banana','mango'],['spinach','broccoli','asparagus']]
new_dict={keys:values for (keys,values)in zip(keys,values)}
print(new_dict)
print(new_dict['fruit'])

{'fruit': ['apple', 'banana', 'mango'], 'vegetable': ['spinach', 'broccoli', 'asparagus']}
['apple', 'banana', 'mango']


In [25]:
{key:value[0] for (key,value) in new_dict.items() if value[0]=="spinach" or value[0]=="apple"}

{'fruit': 'apple', 'vegetable': 'spinach'}

The above line creates a new dictionary that maps fruit or vegetable to spinach or apple, and no longer has lists as values

## <a id="LambdaFunc"> Lambda Functions</a>

Lambda functions are anonymous functions.  They do not have a name associated with them.  They can be used for sorting, where we want to specify key to sort by.  Here is an example, where I sort a dictionary.

In [20]:
example_dict={"a":1,
             "b":20,
             "c":9,
             "d":2,
             "e":11}

In [21]:
sorted(example_dict)

['a', 'b', 'c', 'd', 'e']

In [8]:
sorted(example_dict.items())

[('a', 1), ('b', 20), ('c', 9), ('d', 2), ('e', 11)]

In [17]:
dict(sorted(example_dict.items(),key=lambda x:x[1]))

{'a': 1, 'd': 2, 'c': 9, 'e': 11, 'b': 20}

Here, we used an anonymous lambda function to extract the second value of example_dict.items()

It can also be used with the filter and map function.  First, the filter function.  The filter function requires the first argument to be a function.

In [22]:
a=[1,10,20,40,15]

In [26]:
list(filter(lambda x:x>10,a))

[20, 40, 15]

Obviously, the command above can be done using a list comprehension as well (as was shown before).
We can also use lambda functions with map() as follows.  Using map, we map each value in a list to a function.

In [29]:
list(map(lambda a:a**3,a))

[1, 1000, 8000, 64000, 3375]

In most cases, a list comprehension can do what a filter or map can do, and therefore I feel that these functions don't seem all that useful.

## <a id="Import"> Working with Modules </a>

The import command is used to import different modules, just as we used the library() function in R to import different library.  For example, say that we want to find a square root of a number.  We can import the math module.

In [70]:
import math
import numpy as np
from math import sqrt

In the above examples, to import a module you are interested in, you just specify the name of the module after the import keyword.  If you want to specify a specific function as as sqrt, you can use the "from <module> import <function>" command.

## <a id="math">The math module </a>

I am not going to cover math in big detail, but it can be used for performing operations such as calculating the square root of a number, extracting the value of pi, the power function, etc. It can also deal with trigonometry (cos, sign, etc>.  I use the dir command to print the different functions available in the math module.

In [72]:
import math
print(math.sqrt(4))
print(math.pi)
print(math.pow(10,2))
dir(math)

2.0
3.141592653589793
100.0


['__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'comb',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'dist',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'isqrt',
 'lcm',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'nextafter',
 'perm',
 'pi',
 'pow',
 'prod',
 'radians',
 'remainder',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'tau',
 'trunc',
 'ulp']

In [73]:
sqrt(20)

4.47213595499958

Because we imported the sqrt function from math, we no long have to use "math.sqrt) to call the function.

In [74]:
print(math.e**math.log(10))

10.000000000000002


Let's also try the ceil funtion.

In [77]:
math.ceil(3.5)

4

In [78]:
math.ceil(3.1)

4

In [79]:
math.floor(3.1)

3

The ceil function prints the ceiling, while the floor function prints the floor. The ceiling of 3.1 is 4, while the floor is 3.

Let me know highlight the Collections package.  Here, one function that is useful is Counter().

<b> Scientific Python and Data analysis related packages such as Numpy/Pandas will be covered in a different session. </b>

## <a id="collections"> The Collections Module </a>

In [32]:
import collections

In [33]:
print(collections.Counter("AAAAATTTGGGG"))
print(collections.Counter(['A','A','T','T']))
print(collections.Counter(["apple","dog","dog","apple"]))
print(collections.Counter({"apple":10,"banana":11}))

Counter({'A': 5, 'G': 4, 'T': 3})
Counter({'A': 2, 'T': 2})
Counter({'apple': 2, 'dog': 2})
Counter({'banana': 11, 'apple': 10})


You can also find the most common items in a collection using the mostcommon method.  Here you can see that I printed the 2 most common items in the collection.

In [34]:
collections.Counter("AAAAAAAATTTTTTTTGGGGGGGGGGGGGGGGGGGGGGGGCCCCCTTTAAA").most_common(2)

[('G', 24), ('A', 11)]

The deque() function can be used for storing a fixed number of items. 

In [35]:
from collections import deque

In [36]:
j=deque([1,2,3,4,5],maxlen=10)

In [37]:
j.append(7)
j.append(8)
j.append(9)
j.append(10)
j.append(11)
j.append(12)

In [38]:
j

deque([2, 3, 4, 5, 7, 8, 9, 10, 11, 12], maxlen=10)

Notice that the first value in the collection got removed. This can be done using list as well, but you need to code a bit.  The deque method is a more elegant solution. You can pop from either the left or the right.

In [39]:
print(j.pop())
print(j.popleft())
print(j)

12
2
deque([3, 4, 5, 7, 8, 9, 10, 11], maxlen=10)


If you want your tuple to be named, you can use the namedtuple function. The syntax is as follows, first you need to assign a "type" to your tuple.  In the example below, the "type" is food.  Next, within a square bracket, you add names for your tuple.  To assign values, you need to use the "types_of_food" variable as if it is a function.  Within it, you declare your items.  Next, you can use the "." to access specific names.

In [40]:
types_of_food=collections.namedtuple('Food',["fruits","vegetables","juice"])
food_tuple=types_of_food("apple","spinach","orange juice")
print(food_tuple)

Food(fruits='apple', vegetables='spinach', juice='orange juice')


In [41]:
print(food_tuple.fruits)
print(food_tuple.vegetables)

apple
spinach


In [45]:
from collections import namedtuple
mods=namedtuple("modification",["nucleotide","modification"])
new_mods=mods(["A","U"],['m6A','pseudouridine'])

In [47]:
new_mods.nucleotide

['A', 'U']

In [48]:
new_mods.modification

['m6A', 'pseudouridine']

In [49]:
new_mods.nucleotide[1]

'U'

The example above shows that lists could be nested with namedtuples as well.  
Below is a different type of container called ChainMap.  It allows us to combine multiple dictionaries.

In [50]:
nucleotide={"A":'Adenosine',
           'T':"Thymine",
           "C":"Cytosine",
           "G":"Guanine"}
modification={"Adenosine":"m6A","Cytosine":'m5C'}
combined=collections.ChainMap(nucleotide,modification)

In [51]:
print(combined["Adenosine"])
print(combined['A'])

m6A
Adenosine


While I called the chainmap dictionary "combined", in reality the way chainmap works is by keeping track of different dictionaries in a list.

The collections library also has other functions such as <b>OrderedDict,defaultDict,etc </b>.  I am not going to cover these different functions and leave the reader to look them up if necessary.

## <a id="os"> The OS Module </a>

The next module we are going to talk about is the OS module, which can be used to print current working directory, change directory, etc.

In [52]:
import os

In [53]:
os.getcwd()

'C:\\Users\\banskotan2\\Documents\\Python Class'

In [54]:
os.cpu_count()

20

There is also os.walk() for finding files in a directory tree, os.mkdir(), os.rmdir(), etc. <br>


## <a id="random"> Working with random numbers </a> <br>
The <b> random </b> function can be used to generate random numbers.

Note also that the numpy package has its own random functions as well!

In [62]:
import random

The random.random() function can be used to generate values betweeen 0 to 1.

In [70]:
random.random()

0.16970128136688856

In [71]:
random.random()

0.22972894714527736

The random.randint() function can be used to generate integer between two different values

In [75]:
random.randint(0,1)

1

In [76]:
random.randint(0,100)

22

For reproducibility, set the seed so that the same value gets produced.  This can be important for algorithms that require random numbers--for example the k-means algorithm

In [82]:
random.seed(5)
random.randint(5,10)

9

In [83]:
random.randint(5,10)

7

In [86]:
random.seed(5)
random.randint(5,10)

9

Some other random number functions:

In [87]:
random.choice([1,2,3,4,5])

3

In [89]:
random.choice(["A","T","G","C"])

'G'

In [90]:
random.choice("ATGC")

'A'

In [95]:
random.seed(10)
numbers=[1,2,3,4]
random.shuffle(numbers)
print(numbers)

[4, 3, 2, 1]


## <a id="time"> Working with Time </a> <br>
The time module can be used to print time.

In [96]:
import time

In [101]:
print(time.time())

1697490591.3094707


The time function can be used to check how long it takes to run a program.

In [106]:
start=time.time()
for i in range(100000000):
    pass
end=time.time()
print(f"time elapsed:{end-start}")

time elapsed:3.1873350143432617


In [107]:
start=time.time()
for i in range(500000000):
    pass
end=time.time()
print(f"time elapsed:{end-start}")

time elapsed:16.473917961120605


Jupyter Notebook also comes with the timeit magic command that can be run as follows:

In [113]:
%%timeit
for i in range(100000):
    pass


1.65 ms ± 62.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## <a id="re"> Regular Expressions</a>

The re module is used for dealing with regular expressions. The name of the package is "re".  Let's load it first.
First of all, what is a regular expression?  These are expressions that match a particular pattern that you are interested in.  Many different programming languages allow usage of regular expressions: Perl, R, as well as Python.  A few regular expressions to know. <br>



<font size="+5"><b>*</b></font>: matches zero or more repititions of a character. Greedy.

<font size="+5"><b>*?</b></font>: matches zero or more repititions but is not greedy

<font size="+5"><b>+</b></font> :matches one or more repitition of a character

<font size="+5"><b>+?</b></font>: One or more repititions and is not greedy.

<font size="+5"><b>.</b></font>: Dot matches one character except newline.

<font size="+5"><b>^</b>:  Starting string</font>

<font size="+5"><b>$</b>: ending string</font>

<font size="+5"><b>{m} </b> </font> :Matches m repititions of the previous regular expression

<font size="+5"><b>{m,n}</b> </font>: Matches m to n repititions. Greedy.

<font size="+5"><b>{m,n}?</b></font>:  Matches m to n repititions, but is not greedy.

<font size="+5"><b>\</b></font>: Escape spectial characters. For example, you want to match a ".", which is a special character. 
 
<font size="+5"><b>[]<b></font>: Matches characters in the square bracket.

<font size="+5"><b>|</b></font>:  For example A|B matches either A or B 

<font size="+5"><b>(...)</b></font>:  Where "..." is a regular expression. For group capture.

<font size="+5"><b>\n</b></font>:  For example \1.  This matches group 1.

<font size="+5"><b>\b</b></font>:  Matches empty string at the beginning or end of a group

<font size="+5"><b>\B</b></font>:  Matches empty strong NOT at the beginning or end.

<font size="+5"><b>\d</b></font>:  Matches a number

<font size="+5"><b>\D</b></font>:  Matches something that is not a number

<font size="+5"><b>\s</b></font>:  Matches whitespace

<font size="+5"><b>\S</b></font>:  Matches something that is not whitespace

<font size="+5"><b>\w</b></font>: Matches a word

<font size="+5"><b>\W</b></font>: Matches something that is not a word


<b> And there are many other operators out there</b>

In [2]:
import re

In [3]:
txt="I want to know. Have you ever seen the rain?"

The re.search() function can be used to find a match.  if it matches, then it return an object.

In [8]:
re.search("edljdkd",txt)

In [14]:
re.search(".*know\\.",txt)

<re.Match object; span=(0, 15), match='I want to know.'>

In [18]:
if re.search("know",txt): print("yes")

yes


In [20]:
if re.search("gibberish",txt):print("Yes")

In [21]:
if not re.search("gibberish",txt):print("No")

No


In [24]:
re.search("ATG*",txt)

<re.Match object; span=(3, 5), match='AT'>

The reason that the above statement matches is because the * implies that G can be 0 or more characters.
<br>
Printing the original string can be achieved by using the string method.

In [48]:
txt="AAAATTTTAAAGGGGCCTTT"
re.search("ATG*",txt).string

'AAAATTTTAAAGGGGCCTTT'

The .string attribute does not appear to be that informative.  However, the group() method will return the match, which can be useful.

In [53]:
txt="AAAATTTTAAAGGGGCCTTT"
re.search("ATG*",txt).group()

'AT'

Here are 3 example for comparison.  First we use the group() method without specifying anything inside.  This returns the full match.<br>
Next, we specify 1 within group().  This returns the first match.
After this, we spcify 2.  This returns the second group.

In [63]:
txt="AAAATTTTAAAGGGGCCTTT"
print(re.search("(ATG*).*(G.*CT)",txt).group())
print(re.search("(ATG*).*(G.*CT)",txt).group(1))
print(re.search("(ATG*).*(G.*CT)",txt).group(2))


ATTTTAAAGGGGCCT
AT
GCCT


The span() method can be used to return the indices(start and end) of the match in the form of a tuple.  As was the case with the group() command, we can indeed specify the group #.

In [69]:
txt="AAAATTTTAAAGGGGCCTTT"
print(re.search("(ATG*).*(G.*CT)",txt).span())
print(re.search("(ATG*).*(G.*CT)",txt).span(1))
print(re.search("(ATG*).*(G.*CT)",txt).span(2))


(3, 18)
(3, 5)
(14, 18)


The next is the findall() function, which returns all matches.

In [26]:
re.findall("A",txt)

['A', 'A', 'A', 'A', 'A', 'A', 'A']

The findall() function returns an empty list if there is no match as can be seen below.

In [29]:
txt="AAAATTTTAAAGGGGCCTTT"
re.findall("ATGC",txt)

[]

The re package also has a split() function, which is more powerful than the builtin split() function.

In [32]:
txt="I am going to the park"
re.split("am|the",txt)

['I ', ' going to the park']

In the example above, we split using the | (or) function to split where the match is either "am" or "the".  Note that the match is not included in the returned list.

In [36]:
re.split("am|the",txt,maxsplit=1)

['I ', ' going to the park']

The sub() function can be used for subtitution. <br>
The syntax is re.sub(\<old substring\>, \<new substring\>, \<full text\>)
An example below.  

In [42]:
ensembl_ID="ENSG000000003"
re.sub("ENSG","ENSMUSG",ensembl_ID)

'ENSMUSG000000003'

## <a id="Bioinformatics">Bioinformatics Examples </a>

#### Example 1: 
Count the number of A,T,G,C in a sequence.

In [21]:
from collections import Counter
def count_sequence(sequence):
    return Counter(sequence)

In [22]:
print(count_sequence("AAAAAAAAAATTTTTTTTGCCCC"))

Counter({'A': 10, 'T': 8, 'C': 4, 'G': 1})


#### Example 2:
Report the percentage of A T G C

In [182]:
def percentage_sequence(sequence,rounding=1):
    seq_dict=dict(Counter(sequence))
    #print(sum(seq_dict.values()))
    return {key:round(100*(value/sum(seq_dict.values())),rounding) for key,value in seq_dict.items()}

In [183]:
print(percentage_sequence("AAAAAAAAAATTTTTTTTGCCCC"))

{'A': 43.5, 'T': 34.8, 'G': 4.3, 'C': 17.4}


In [186]:
print(percentage_sequence("AAAAAAAAAATTTTTTTTGCCCC",2))

{'A': 43.48, 'T': 34.78, 'G': 4.35, 'C': 17.39}


In [187]:
print(percentage_sequence("AAAATTTTGGGGCCCC",1))

{'A': 25.0, 'T': 25.0, 'G': 25.0, 'C': 25.0}


#### Example 3 <br>
Return kmers of size n.  If a kmer is not of size k, exclude it.

In [227]:
def kmer_size_n(sequence,k=4):
    return [sequence[n:(n+k)] for n in range(0,len(sequence),k) if len(sequence[n:(n+k)])%k==0]

In [228]:
kmer_size_n("ATGCAT")

['ATGC']

In [229]:
kmer_size_n("ATGCAATT")

['ATGC', 'AATT']

kmer_size_n("AAA")

In [230]:
kmer_size_n("AAAATTGCA")

['AAAA', 'TTGC']

In [232]:
kmer_size_n("ATGATC",3)

['ATG', 'ATC']

### Example 4 <br>
Return a kmer of size n only if it is a palindrome and it is of size k.  A palindrome is the same sequence reversed.  For example, the reverse of ATA is also ATA

In [233]:
def kmer_palindrome(sequence,k=4):
    return [sequence[n:(n+k)] for n in range(0,len(sequence),k) if len(sequence[n:(n+k)])%k==0 and sequence[n:(n+k)]=="".join(reversed(sequence[n:(n+k)]))]

In [239]:
kmer_palindrome("ATAAAATTATATATTA")

['ATTA']

In [240]:
kmer_palindrome("ATATATTTAATT",k=3)

['ATA', 'TAT']

### Example 5

Return the reverse complement of a sequence.

In [36]:
def reverse_complement(seq):
    return "".join([k for k in seq.translate(str.maketrans("ATGC","TACG"))])[::-1]

In [37]:
reverse_complement("ATGCATAT")

'ATATGCAT'

In [38]:
reverse_complement("AAAATTTTTGGGCCC")

'GGGCCCAAAAATTTT'

In [39]:
reverse_complement("GCATATCTTAAGGCC")

'GGCCTTAAGATATGC'

Translate an RNA sequence to Protein, given just the coding region.

In [65]:
def translate_seq(RNA):
    translation_dict={"TTT":"F",
                      "TTC":"F",
                      "TTA":"L",
                      "TTG":"L",
                      "CTT":"L",
                      "CTC":"L",
                      "CTA":"L",
                      "CTG":"L",
                      "ATT":"I",
                      "ATC":"I",
                      "ATA":"I",
                      "ATG":"M",
                      "GTT":"V",
                      "GTC":"V",
                      "GTA":"V",
                      "GTG":"V",
                      "TCT":"S",
                      "TCC":"S",
                      "TCA":"S",
                      "TCG":"S",
                      "CCT":"P",
                      "CCC":"P",
                      "CCA":"P",
                      "CCG":"P",
                      "ACT":"T",
                      "ACC":"T",
                      "ACG":"T",
                      "ACA":"T",
                      "GCT":"A",
                      "GCC":"A",
                      "GCA":"A",
                      "GCG":"A",
                      "TAT":"Y",
                      "TAC":"Y",
                      "TAA":"*",
                      "TAG":"*",
                      "CAT":"H",
                      "CAC":"H",
                      "CAA":"Q",
                      "CAG":"Q",
                      "AAT":"N",
                      "AAC":"N",
                      "AAA":"K",
                      "AAG":"K",
                      "GAT":"D",
                      "GAC":"D",
                      "GAA":"E",
                      "GAG":"E",
                      "TGT":"C",
                      "TGC":"C",
                      "TGA":"*",
                      "TGG":"W",
                      "CGT":"R",
                      "CGC":"R",
                      "CGA":"R",
                      "CGG":"R",
                      "AGT":"S",
                      "AGC":"S",
                      "AGA":"R",
                      "AGG":"R",
                      "GGT":"G",
                      "GGC":"G",
                      "GGA":"G",
                      "GGG":"G"}
    return "".join([translation_dict[RNA[k:k+3]] for k in range(0,len(RNA),3)])
    

In [66]:
translate_seq("ATGCAG")

'MQ'

In [59]:
sequence_from_function=translate_seq("ATGGCGGCTAACGCTACTACCAACCCGTCGCAGCTGCTGCCCTTAGAGCTTGTGGACAAATGTATAGGATCAAGAATTCACATCGTGATGAAGAGTGATAAGGAAATTGTTGGTACTCTTCTAGGATTTGATGACTTTGTCAATATGGTACTGGAAGATGTCACTGAGTTTGAAATCACACCAGAAGGAAGAAGGATTACTAAATTAGATCAGATTTTGCTAAATGGAAATAATATAACAATGCTGGTTCCTGGAGGAGAAGGACCTGAAGTGTGA")

In [64]:
print(sequence_from_function)

MAANATTNPSQLLPLELVDKCIGSRIHIVMKSDKEIVGTLLGFDDFVNMVLEDVTEFEITPEGRRITKLDQILLNGNNITMLVPGGEGPEV*


In [62]:
actual_sequence="MAANATTNPSQLLPLELVDKCIGSRIHIVMKSDKEIVGTLLGFDDFVNMVLEDVTEFEITPEGRRITKLDQILLNGNNITMLVPGGEGPEV*"

In [63]:
sequence_from_function==actual_sequence

True

## <a id="Conclusion"> What we covered </a>

Print, Input statements

Conditionals

Functions

Iteration/Loops

String manipulation

Lists, Tuples, Sets, Dictionaries

List and Dictionary Comprehension

Collections package

Error Types and error handling

OS package

Math package

Time package

Random package


<b>What we have not covered (and we can cover in future classes)</b>:<br>
- Generators
- Decorators
- Python for Bioinformatics
    - Biopython
- Scientific Computing, Data Analysis and Visualization
    - Pandas
    - Matplotlib, Seaborn
    - Numpy
    - Scipy
- Deep Learning
    - Packages:
        - Tensorflow
        - Pytorch
    - Algorithms that can be covered:
        - Fully-connected neural network
        - Convolutional Neural Network
    - Natural Language Processing:
        Large Language Models(LLMs) using huggingface.