In [None]:
%%html
<script src="./resources/autostyle.js"></script>

# BIOC0003 - Introduction to Python
Alan R. Lowe

Welcome to the BIOC0003 Introduction to Python notebook. 

In this notebook you will use Python  to calculate some properties of an unknown protein and calculate the concentration from spectrophotometer measurements.  This will give you opportunity to use the ideas from the lectures and to develop your skills to solve a typical problem in biochemistry.

You are presented with some data regarding an unknown protein (**Protein X**):
+ The amino acid sequence
+ The absorbance of a sample of the protein at 280 nm

It's your job is to write some Python code to enable you to calculate the concentration of the protein. To do so, we need to calculate some properties of **Protein X**:

+ Calculate the number of Trytophan, Tyrosine and Cysteine residues 
+ Calculate the predicted extinction coefficient 


<div class='task_red'> NOTE: You need to run the lines of code in order. </div>


### Analysing the sequence of protein X

Here is the sequence of the unknown protein, **Protein X**:

```
MCDKEFMWALKNGDLDEVKDYVAKGEDVNRTLEGGRKPLHYAADCGQLEILEFLLLKGADINAPDK  
HHITPLLSAVYEGHVSCVKLLLSKGADKTVKGPDGLTAFEATDNQAIKALLQ
```

Let's start by storing this sequence as a Python variable named `sequence`:

In [None]:
sequence = "MCDKEFMWALKNGDLDEVKDYVAKGEDVNRTLEGGRKPLHYAADCGQLEILEFLLLKGADINAPDKHHITPLLSAVYEGHVSCVKLLLSKGADKTVKGPDGLTAFEATDNQAIKALLQ"

By using `"` we are telling python to treat the data that has been assigned to the variable named `sequence` as a string of text. Now that we have stored the data we can use Python to calculate properties of this data.

For example, how many amino acids does **Protein X** contain? We can calculate this using Python:

In [None]:
num_amino_acids = len(sequence)
print(num_amino_acids)

In these two lines of code, we calculate the length of the string stored in `sequence` using a special built-in function of Python, called `len`. We store the result in a new variable named `num_amino_acids`. Finally, we use the `print` function to print out the results. Try running these lines of code.

<div class='task_blue'> How many amino acids are in Protein X?</div>

<details><summary>ANSWER</summary> There are 118 amino acids in Protein X</details>

We can also use Python to count the number of occurences of a specific amino acid in the sequence. For example, let's count the number of Alanine (A) residues in the sequence. We can do that using the following method:

In [None]:
num_alanine = sequence.count('A')
print(num_alanine)

You will notice that the syntax is slightly different here. We're using a new function called `count` that is associated with the `sequence` variable. The function `count` has a single argument (in this case `'A'`), which is the string that we want to find within `sequence`. This is a special function that works with variables that are of the type `str`.  In this line of code, we create a new variable called `num_alanine` that stores the output of the `count` function applied to `sequence`. 

<div class='task_blue'>How many alanines are found in Protein X?</div>

<details><summary>ANSWER</summary> There should be 12 Alanine residues in the sequence of Protein X</details>

### Predicting the extinction coefficient of Protein X

In order to calculate the concentration of **Protein X** we need to use the Beer-Lambert equation:

\begin{equation}
A_{280} = \epsilon c \ell
\end{equation}

Where $A_{280}$ is the absorbance of the sample at 280 nm. The concentration of the protein ($c$, Molar, $M$), the path length of the cuvette ($\ell$, cm) in the spectrophotometer, and the molar extinction coefficient ($\epsilon$, $M^{-1}.cm^{-1}$) are the other parameters. We can re-arrange the equation to calculate the protein concentration from the absorbance:

\begin{equation}
c = \frac{A_{280}}{\epsilon\ell}
\end{equation}

There are now several unknowns:
+ The absorbance at 280 nm ($A_{280}$)
+ The path length of the cuvette in centimeters ($\ell$)
+ The molar extinction coefficient for **protein X** in per Molar per cm ($\epsilon$)

The absorbance is our measurement, and the path length is known ($\ell = 1cm$), but we do not know the extinction coefficient. 

The extinction coefficient is a measure of how the sample attenuates light at a given wavelength, in this case in the UV spectrum at 280nm. The aromatic side chains of tyrosine and tryptophan, both contribute to the absorbance spectrum of a protein at 280nm. Cysteine also contributes significantly to absorbance at 280nm.

The extinction coefficients of these three isolated amino acids are known in pure solutions:
+ Tryptophan (5500 $M^{-1}.cm^{-1}$)
+ Tyrosine (1490 $M^{-1}.cm^{-1}$)
+ Cysteine (125 $M^{-1}.cm^{-1}$)

Since know the sequence of the protein, so we can start by calculating the number of Trytophan, Tyrosine and Cysteine residues in **Protein X**.

<div class='task_green'>TASK: Use Python to calculate the number of each type of residue, storing each one in it's own variable. </div>

<details><summary>HINT</summary> You can use the same principle as when calculating the number of Alanine residues </details>

In [None]:
num_tryptophan =

In [None]:
num_tyrosine = 

In [None]:
num_cysteine = 

<div class='task_green'>TASK: Print out how many of each type of amino acid there is.</div>
<details><summary>ANSWER</summary> nW = 1, nY = 3, nC = 3</details>

In [None]:
print(num_tryptophan)

For a pure sample of a protein, we can calculate the *predicted* extinction coefficient using the following equation:

\begin{equation}
\epsilon = (nW \times 5500) + (nY \times 1490) + (nC \times 125)
\end{equation}

Where $nW$ is the number of Tryptophan (W) residues, $nY$ number of tyrosine residues and $nC$ the number of cysteine residues. Now that you have a way to calculate the number of each type of amino acid in the sequence, we can write a function to calculate the predicted extinction coefficient for us.

In [None]:
def estimate_extinction_coefficient(num_trp, num_tyr, num_cys):
    # epsilon = (nW x 5500) + (nY x 1490) + (nC x 125) 
    epsilon = (num_trp * 5500) + (num_tyr * 1490) + (num_cys * 125)
    return epsilon

In the above block of code, we have defined a new function using the `def` keyword. Our function is named `estimate_extinction_coefficient` and it has three arguments as an input, the number of tryptophans, tyrosines and cysteines in the protein. The function calculates the value of the extinction coefficent in `epsilon` and returns it.

In [None]:
epsilon = estimate_extinction_coefficient(1, 3, 3)
print(epsilon)

<div class='task_green'>TASK: Try calling the function using your num_trypothan, num_tyrosine and num_cysteine variables as input. Do you get the same answer?</div>

### Calculating the protein concentration

We can now write a function to calculate the protein concentration using the Beer-Lambert law:

\begin{equation}
c = \frac{A_{280}}{\epsilon\ell}
\end{equation}

<div class='task_green'>TASK: Complete the function below to calculate the concentration of the protein</div>

In [None]:
def calculate_protein_concentration(absorbance, epsilon, path_length):
    # c = A / (e*l)
    concentration = 
    return concentration

Now that you have defined a function, let's calculate the protein concentration using the following values:
+ $A_{280}$ = 0.1
+ $\ell$ = 1 cm

<div class='task_green'> TASK: Calculate the concentration of Protein X</div>

In [None]:
concentration = 

In [None]:
print(concentration)

You have now completed the excercise.

# End of excerise