<a name="top"></a>
# Introduction to Python Programming for Bioinformatics. Lesson 11

<details>
<summary>
About this notebook
</summary>

This notebook was originally written by [Marc Cohen](https://github.com/mco-gh), an engineer at Google. The original source can be found on [Marc's short link service](https://mco.fyi/), and starts with [Python lesson 0](https://mco.fyi/py0), and I encourage you to work through that notebook if you find some details missing here.

Rob Edwards edited the notebook, adapted it for bioinformatics, using some simple geneticy examples, condensed it into a single notebook, and rearranged some of the lessons, so if some of it does not make sense, it is Rob's fault!

It is intended as a hands-on companion to an in-person course, and if you would like Rob to teach this course (or one of the other courses) don't hesitate to get in touch with him.

</details>
<details>
<summary>
Using this notebook
</summary>

You can download the original version of this notebook from [GitHub](https://linsalrob.github.io/ComputationalGenomicsManual/Python/Python_Lesson_9.ipynb) and from [Rob's Google Drive]()

**You should make your own copy of this notebook by selecting File->Save a copy in Drive from the menu bar above, and then you can edit the code and run it as your own**

There are several lessons, and you can do them in any order. I've tried to organise them in the order I think most appropriate, but you may disagree!

</details>

<a name="lessons"></a>
# Lesson Links

* [Lesson 10 - Translating a DNA sequence](#Lesson-10---Translating-a-DNA-sequence)

Previous Lesson: [GitHub](Python_Lesson_9.ipynb) | [Google Colab](https://colab.research.google.com/drive/1JGRJpUPKkkVukyNvtfEJYVVCcdpkyRLZ)

Next Lesson: GitHub | Google Colab

<!-- #region id="qXu_bY7yPpsS" -->

# Lesson 11 - BioPython

Earlier, (in lessons 8 and 9) we talked about modules and using other people's code. One of the most important libaries for bioinformatics is called [BioPython](https://biopython.org/). There is a [complete tutorial on BioPython](https://biopython.org/DIST/docs/tutorial/Tutorial.html), and the BioPython group also provide [an excellent cookbook of recipes](https://biopython.org/wiki/Category%3ACookbook) that will help you out!

BioPython is designed to help with common biological problems, and is particularly good at:

* Parsing files
  * fasta
  * fastq
  * GenBank
* Manipulating sequences
  * Reverse complement
  * Translating
  * Aligning (wrappers to aligners)
  * Slicing
* Connecting to biological databases

**Before you carry on!** Make sure you have installed biopython by uploading the `requirements.txt` file and running the installation command:


```
!pip install -r requirements.txt
```


Now we can `import` biopython and create a new sequence object. 

Here is how we would translate a DNA sequence


```
from Bio.Seq import Seq
dna = Seq("TCGCGCACGCTGATCGTGGGGTGA")
dna.translate()
```



Can you use BioPython to answer the question from [Lesson 10](https://colab.research.google.com/drive/1trXzcwT0VnmdnVQY_Wj9b__pXVY8_7GJ): translate these sequences:

```
TCGCGCACGCTGATCGTGGGGTGA
AGTAAAACTTTAATTGTTGGTTAA
```

## Reading a fastq file

Here is some simple code to read a fastq file. 

Note that we only read 10 lines from this fastq file! Often, fastq files are _huge_ and so this just provides a glimpse of the sequences and their qualities.


```
from Bio import SeqIO
i = 0
for sequence in SeqIO.parse('barcode01.fastq', 'fastq'):
    i += 1
    print(sequence.seq)
    print(sequence.letter_annotations["phred_quality"])
    if 10 == i:
        break
```



