# I - Download sequences

With this code, you can download a specific sequence from Genbank or the Ensembl database. You only need the accession number of the sequence. We will save the downloads in fasta format. 

You can find all material in our GitHub repository iGEM_UGent_2020. 

## Set your working directory

First, set up your working directory. Standard, this is the location where this jupyter notebook is situated. 

In [1]:
import os
wdir = os.getcwd()
print(wdir)

## Sequences for NCBI - Genbank

### Nucleotide sequence 

Use this code to obtain all information about the sequence with accession number MT747438. In the end, we create a fasta file with this sequence. This is a nucleic acid sequence. 

In [2]:
from Bio import SeqIO
from Bio import Entrez

# Save nucleotide sequence in fasta format

Entrez.email = "A.N.Other@example.com"

with Entrez.efetch(db="nucleotide", rettype="gb", retmode = "text", id = "MT747438" ) as handle:
    for record in SeqIO.parse(handle, "gb"):
        print(record.description)
        print(record.seq)
        nuc_des = str(record.description)
        nuc_seq = str(record.seq)

f = open("BE_isolate.txt","w+")
f.write(">" + nuc_des + "\n" + nuc_seq + "\n")
f.close()

Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/Felis catus/BEL/BE-MG-0320/2020, complete genome
AGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTTTGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGGCTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAAACGTTCGGATGCTCGAACTGCACCTCATGGTCATGTTATGGTTGAGCTGGTAGCAGAACTCGAAGGCATTCAGTACGGTCGTAGTGGTGAGACACTTGGTGTCCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCTTCTTCGTAAGAACGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTAGGCGACGAGCTTGGCACTGATCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAGCAGTGGTGTTACCCGTGAACTCATGCGTGAGCTTAACGGAGGGGCATACACTCGCTATGTCGATAACAACTTCTGTGGCCCTGATGGCTACCCTCTTGAGTGCATTAAAGACCTTCTAGCACGTGCTGGTAAAGCTTCATGCACTTTGTCCGAACAACTGGACTTTATTGACACTA

### Protein sequence

Use this code to obtain all information about the sequence with accession number NP_009225. In the end, we create a fasta file with this sequence. This is a protein sequence. 

In [3]:
from Bio import SeqIO
from Bio import Entrez

Entrez.email = "A.N.Other@example.com"
with Entrez.efetch(db="protein", rettype="gb", retmode = "text", id = "YP_009724390.1" ) as handle:
    for record in SeqIO.parse(handle, "gb"):
        print(record.description)
        print(record.seq)
        prot_des = str(record.description)
        prot_seq = str(record.seq)

f = open("surface_glycoprotein.txt","w+")
f.write(">" + prot_des + "\n" + prot_seq + "\n")
f.close()

surface glycoprotein [Severe acute respiratory syndrome coronavirus 2]
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNS

## Sequences from ensembl

Use this code to obtain all information about the sequence with Ensembl ID ENSG00000157764. In the end, we create a fasta file with this sequence. This is a nucleic acid sequence. The same code can be used to download a protein sequence.

First, uncomment this line to install the ensembl_rest module. 

In [4]:
# pip install ensembl_rest

Code to download the sequence

In [6]:
#pip install ensembl_rest

import ensembl_rest

tcn1 = ensembl_rest.sequence_id('ENSG00000134827')

tcn1_seq = tcn1.get("seq")
tcn1_des = tcn1.get("desc")

print(">" + tcn1_des + "\n" + tcn1_seq + "\n")

f = open("tcn1.txt","w+")
f.write(">" + tcn1_des + "\n" + tcn1_seq + "\n")
f.close()

>chromosome:GRCh38:11:59852800:59866489:-1
CTGGTACACTGTTGGAGAGATGAGACAGTCACACCAGCTGCCCCTAGTGGGGCTCTTACTGTTTTCTTTTATTCCAAGCCAACTATGCGAGATTTGTGGTGAGTAAACTTTGAGCTAAAATTACTCTAGAGTGATAGTCTACTGAAAACTGTTGACAAGAGACTCTATGTAGAGTTACCTGAGAAAAGATCCCATTCCAGTTTCGTATGTGGTTGGACTCCATGAATGATGAAAAGGGTGGCAGGAGATGGTGAGGAAAGTCTCACAATGCATGTTGTTGTAAGAGGAATTATAGATTGTGTATTCACTTGCCTAACTGCTCTTTCTATGCTCAGATCCTCTCTAGCACAGCATCCCTACCAAGTAACTCCCTATTCATGTGACTTTTCAATGACAGAAAATTTATGCTCCCAGAAACTTTTCTAGATCATGGAAAGTTATCTTTTGGTCTTGATATGCAATCTTTTTTCCCTGAAATTCAATGCATGATTTTTGGGTTAACCCCTTGTAACTACATAGAAAAAAATAATAATCTCTTTCAAATAGTGATGACTGCTGTCTTGTCATGAGTGAGTTTAGATGTTCCAAGTTGCATGTCCACCTTCCTTCAACTCTTTCCCAGGACACATGGGATAAAGTTCCATCATCTCTTGCATGGGTGTTGACCCCAGAGATTGTGTGGTTCTGTAAAAGAACTTTAAAGTAAGAGAAGGGAAAACTAAATTCAGGTTTAGTTTGATCCCTTAGTTATTGAATGACCCAACATTTCTGAATTTCTTTAGCTTCATTCATAAATAAAGGATAATGAGGGGAGACTATTGAAGCAAATTATCTCATGAAGTCTGGTGCAGTTCTCAAGTGCATGAATTCTCTTTTGTGAGACTTTACTCTGAAATTGAGCTTCAAACCAGTCCAAAGATTACTAGGTGGTACTATCTGTGCTCTCTAACCAGTAAG

## References

Chang, J., Chapman, B., Friedberg, I., Hamelryck, T., de Hoon, M., Cock, P., Antao, T., Talevich, E., Wilczynski, B. (2020). Biopython Tutorial and Cookbook, Biopython 1.77. Available via biopython.org

S. Moss, “pyEnsemblRest,” 2013. https://github.com/gawbul/pyEnsemblRest.
