# Calculate charge from a protein sequence
Author: Jurre Hageman

## The charge of proteins

The charge of a protein is dependent on the pH. All proteins contain a N-terminus and a C-terminus that can be protonated or deprotonated, depending on the pH. In addition, some amino acids have side chains that can be protonated or deprotonated. As a result, the net charge of the protein will depend on the pH. The pKa of the N-terminal amino group is approximately 9.5. That means that below a pH of 9.5, the amino group will have a charge of +1 and above this pH, the amino group will have a charge of 0. The pKa of the C-terminal carboxyl group is approximately 2.3. That means that above a pH of 2.3, the carboxyl group will have a charge of -1 and below the pH of 2.3, the carboxyl group will have a charge of 0.

As mentioned above, some side chains of amino acids can be protonated/deprotonated dependent on the pH. We call these side chains [ionizable](https://www.yumpu.com/en/document/read/10990979/ionizable-amino-acids-protonated-deprotonated-forms-).  
They can be devided in two types. 
- Amino acids with side chains that can aquire a **negative charge**: Aspartic Acid (D), Glutamic Acid (E), Cysteine (C),  Tyrosine (Y). If the pH is **above** the pKa value of the side chain, the amino acid will have a -1 charge. Else the charge will be 0.
- Amino acids with side chains that can aquire a **positive charge**: Lysine (K), Arginine(R), Histidine (H). If the pH is **below** the pKa value of the side chain, the amino acid will have a +1 charge. Else the charge will be 0.


To summarize:
- The amino terminus as well as K, R and H aquire a +1 charge at a pH below the pKa value. Else, the charge is 0.
- The carboxyl terminus as well as D, E, C and Y aquire a -1 charge above the pKa value. Else, the charge is 0.

The pKa values of the amino acids are:  

Positives:
- H: 6.0  
- K: 10.5  
- R: 12.5
- Amino terminus: 9.5

Negatives:
- D: 3.9  
- E: 4.1 
- C: 8.4  
- Y: 10.5  
- Carboxyl terminus: 2.3

Let's look at an example protein sequence and calculate the charge at pH 1, 7 and 14:

Peptide sequence: `MHGC`

At pH 1:  
- M: 0
- H: +1
- G: 0
- C: 0
- amino terminus: +1
- carboxyl terminus: 0  
Thus the net charge is +2.

At pH 7:
- M: 0
- H: 0
- G: 0
- C: 0
- amino terminus: +1
- carboxyl terminus: -1  
Thus the net charge is 0.


At pH 14:
- M 0
- H 0
- G 0
- C -1
- amino terminus: 0
- carboxyl terminus: -1  
Thus the net charge is -2

## Summary in a video

You can find a short video about the subject [in this link](https://www.youtube.com/watch?v=Gkb4it5nOuc&feature=youtu.be).

## The exercise

Write a script that:
- contains the function `read_pka` that reads the file `pka.txt`. Return a dictionary with amino acid linked to the pKA values.
- contains the function `read_seq` that reads the file `sequence.fasta` and returns the sequence as a string.
- contains the function `calc_charge` that calculates the charge of the protein based on a pH value.
- all functions should be called from the `main` function.
- test your script using the list of `ph_values = [1, 7, 14]`

The outcome should be as follows:  
charge at pH 1 is: 24  
charge at pH 7 is: 0  
charge at pH 14 is: -21  

In [1]:
###YOUR CODE HERE###





charge at pH 1 is: 24
charge at pH 7 is: 0
charge at pH 14 is: -21


## Verify your result

You can verify your result at [this web page](http://protcalc.sourceforge.net/cgi-bin/protcalc).
Note that there might be small variations due to the use of different pKa-values.

The end...