# Introduction to Biopython
<img src="img/logo_biopython.PNG" width="500" height="300"/>


Biopython is a set of freely available Python tools that help us to work with biological data. It contains components that are developed specifically for bioinformatics purposes. Today I'll give you the highlights and the rest will be yours to explore. 

On a small side note, before diving into your (biological) data and try analyzing it with complex self-written scripts, it often makes sense that you search through the official documentation for the parts that might help you. This is because it's a huge library that can do a variety of things such as sequence analysis, multiple alignments, protein structures, phylogenetics, population genetics, etc. etc. 

**Installing** the complete module can be done:
- Using Anaconda's environments and searching for the package, or
- Immediately in a Notebook using the following code:

In [2]:
#pip install biopython 

# Import the Biopython library
import Bio

# Check version for proper installment (v1.74)
print(Bio.__version__)

1.74


it makes sense to install functions or submodules that are part of Biopython in order to ease the use. Imagine that you want to work with sequences, you can import only the Seq-object.

Content for today:
- Working with sequences in : `Seq` and `Alphabets`,
- Sequence annotations with: `SeqRecord` objects,
- Reading, writing and parsing files with: `SeqIO`
- Querying NCBI with: `SeqIO`
- BLAST from within Python

## Working with sequences
- A sequence is stored in a `Seq`-object

The main object in bioinformatics is arguably sequencens (DNA, RNA, proteins). Hence biggest part of Biopython is built around sequences as wel. So in Biopython a sequence is stored in a so-called Sequence object. 

In [3]:
# Imports
from Bio.Seq import Seq

In [4]:
# Creating our first Seq object
my_seq = Seq("AGTACACTGGAT")
my_seq

Seq('AGTACACTGGAT')

In [5]:
type(my_seq)

Bio.Seq.Seq

Is the snippet from a DNA or protein sequence?

In [6]:
print(f'Translation of DNA sequence: {my_seq.translate()}')
print(f'The protein AA sequence: {my_seq}')

Translation of DNA sequence: STLD
The protein AA sequence: AGTACACTGGAT


## Working with sequences
- A sequence is stored in a `Seq`-object
- Each `Seq`-object has an additional attribute that contains the `Alphabet`. 
- Avoid any misinterpretation!

In [7]:
my_seq.alphabet

Alphabet()

In [8]:
# importing the IUPAC alphabets
from Bio.Alphabet import IUPAC
#dir(IUPAC)
print(IUPAC.unambiguous_dna.letters)
print(IUPAC.ambiguous_dna.letters)

GATC
GATCRYWSMKHBVDN


In [9]:
# Define Alphabet
my_dna = Seq("AGTACACTGG", IUPAC.unambiguous_dna)
my_prot = Seq("AGTACACTGG", IUPAC.protein)

We can specify that we're working with DNA in this case by defining another argument in the Seq object. In the example below we will allocate a DNA sequence to one variable and a protein sequence to another one.

## Working with sequences
- `Seq`-objects behave like strings

In [13]:
# Get element at position 0
my_seq = Seq('AGTACACTGGAT', IUPAC.unambiguous_dna)
my_seq[0]

# Find how many times "GAT" appears in the sequence
print(my_seq.count("GAT"))

# Find where "GAT" appears in the sequence
print(my_seq.find("GAT"))

1
9


- `Seq`-objects are immutable

In [14]:
my_seq[2] = 'A'

TypeError: 'Seq' object does not support item assignment

## Working with sequences
- `Seq`-objects can be sliced

In [15]:
# Slicing in its most basic form
my_seq[2:6]

Seq('TACA', IUPACUnambiguousDNA())

- Turning `Seq`-objects into strings

In [16]:
str(my_seq)
#print(my_seq)

'AGTACACTGGAT'

- Concatenating or adding sequences

In [17]:
# This will work
protein_seq1 = Seq("EVRNAK")
protein_seq2 = Seq("AGGATC", IUPAC.protein)
protein_seq1 + protein_seq2

Seq('EVRNAKAGGATC', IUPACProtein())

- and much more...

## Methods on `Seq` objects
- transcribe()
- translate()
- complement()
- reverse_complement()

Depending on Alphabet. 

![transcription](img/transcriptionprocess.png)

In [25]:
coding_dna = Seq("ATGGCCATTGTAATGG")
#dir(coding_dna)
print(f'Original DNA seq: {str(coding_dna):>26}')
print(f"Complement DNA seq: {str(coding_dna.complement()):>24}")
print(f"Reverse complement DNA seq: {str(coding_dna.reverse_complement()):>15}")
print(f"mRNA seq: {str(coding_dna.transcribe()):>34}")
print(f"Protein seq: {str(coding_dna.translate()):>20}")

Original DNA seq:           ATGGCCATTGTAATGG
Complement DNA seq:         TACCGGTAACATTACC
Reverse complement DNA seq: CCATTACAATGGCCAT
mRNA seq:                   AUGGCCAUUGUAAUGG
Protein seq:                MAIVM


## Codon tables
- Choose the correct codon table that is relevant for the organism you're working with.   
- Imported from NCBI: standard, vertebrate mictochondrial, yeast mitochondrial, bacterial, etc. 

In [26]:
# Import codon table ()
from Bio.Data import CodonTable
# Change 'Standard' codon table to 'Vertebrate Mitochondrial'
standard_table = CodonTable.unambiguous_dna_by_name["Standard"]
print(standard_table)

Table 1 Standard, SGC0

  |  T      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
T | TTT F   | TCT S   | TAT Y   | TGT C   | T
T | TTC F   | TCC S   | TAC Y   | TGC C   | C
T | TTA L   | TCA S   | TAA Stop| TGA Stop| A
T | TTG L(s)| TCG S   | TAG Stop| TGG W   | G
--+---------+---------+---------+---------+--
C | CTT L   | CCT P   | CAT H   | CGT R   | T
C | CTC L   | CCC P   | CAC H   | CGC R   | C
C | CTA L   | CCA P   | CAA Q   | CGA R   | A
C | CTG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | ATT I   | ACT T   | AAT N   | AGT S   | T
A | ATC I   | ACC T   | AAC N   | AGC S   | C
A | ATA I   | ACA T   | AAA K   | AGA R   | A
A | ATG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GTT V   | GCT A   | GAT D   | GGT G   | T
G | GTC V   | GCC A   | GAC D   | GGC G   | C
G | GTA V   | GCA A   | GAA E   | GGA G   | A
G | GTG V   | GCG A   | GAG E   | GGG G   | G
--+---------

In [None]:
# specify the table using the NCBI ID or table number (e.g. 2)
coding_dna.translate(table="Vertebrate Mitochondrial")

Example:

In [27]:
gene = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA" + \
 "GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT" + \
 "AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCTAAGACCACGGCTGGTGGAAACAACAT" + \
 "TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT" + \
 "AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA")

Another possible option is the argument `cds` that will take into account that it only starts if a start-codon is seen. 

# Exercises
- 5.2.1 +
- 5.3.1 +
- 5.7 +++ 

# `SeqRecord`-object
- `Seq` = sequence 
- `SeqRecord` = `Seq` + metadata
- `SeqRecord`
    - Main attributes: id & seq
    - Additional attributes: name, description, dbxrefs, features, annotations

In [32]:
# Imports
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

Uptil now, we've been using sequence (Seq) objects that stored a sequence and the file format (i.e. fasta, genbank, etc.). Biopython allows us to annotate these Seq objects with additional information like an identifier, a name of the sequence, a description, features and ultimately a bunch of annotations. All of this information is stored in the so-called SeqRecord object which is the follow-up of the Seq object.

In [33]:
# Import SeqRecord object with the SeqIO module
from Bio import SeqIO
record = SeqIO.read("data/NC_005816.gb","gb")
print(record)

ID: NC_005816.1
Name: NC_005816
Description: Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence
Database cross-references: Project:58037
Number of features: 41
/molecule_type=DNA
/topology=circular
/data_file_division=BCT
/date=21-JUL-2008
/accessions=['NC_005816']
/sequence_version=1
/gi=45478711
/keywords=['']
/source=Yersinia pestis biovar Microtus str. 91001
/organism=Yersinia pestis biovar Microtus str. 91001
/taxonomy=['Bacteria', 'Proteobacteria', 'Gammaproteobacteria', 'Enterobacteriales', 'Enterobacteriaceae', 'Yersinia']
/references=[Reference(title='Genetics of metabolic variations between Yersinia pestis biovars and the proposal of a new biovar, microtus', ...), Reference(title='Complete genome sequence of Yersinia pestis strain 91001, an isolate avirulent to humans', ...), Reference(title='Direct Submission', ...), Reference(title='Direct Submission', ...)]
/comment=PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The 

In this case we'll read in a GenBank file, *NC_005816.gb*, which we’ll load using the SeqIO module. It's accessible in [NCBI](https://www.ncbi.nlm.nih.gov/nuccore/NC_005816). The next chapter will discuss the SeqIO module, however here we're just using it to read in a SeqRecord object from a file. 

The following elements are present (amongst others):
- **ID**: usually the accession number of the sequence
- **Name**: the more commonly used name of the sequence (often the same as accession number)
- **Description**: a description or expressive name for the sequence
- **Features**: a list of SeqFeature objects with more structured information about the sequence (discussed below)
- **Annotations**: a dictionary of additional information about the sequence. 
- **Seq**: the sequence itself

In [40]:
# ID
print(record.id)
# Name
print(record.name)
# Description
print(record.description)
# Features
#print(record.features)
# Annotations
record.annotations['taxonomy']
# Sequence
record.seq

NC_005816.1
NC_005816
Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence


Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG', IUPACAmbiguousDNA())

The features and their SeqFeature object are a fairly complex thing on their own. Basically they contain more abstract and detailed information about the SeqRecord object (and thus the sequence). It attempts to encapsulate as much of the information about the sequence as possible by describing a region on the parent sequence.

The features and their SeqFeature object are a fairly complex thing on their own. Basically they contain more abstract and detailed information about the SeqRecord object (and thus the sequence). It attempts to encapsulate as much of the information about the sequence as possible by describing a region on the parent sequence.
It allows us e.g. to extract CDSs from a longer sequence. 

Example SeqFeatures:

In [41]:
# Extract features and check whether SNP of interest (4350) is present
my_snp = 4350
record = SeqIO.read("data/NC_005816.gb", "genbank")

for feature in record.features:
    if my_snp in feature: 
        print(f"Feature type: {feature.type}, Locus tag(s):  {feature.qualifiers.get('locus_tag')}")

Feature type: source, Locus tag(s):  None
Feature type: gene, Locus tag(s):  ['YP_pPCP05']
Feature type: CDS, Locus tag(s):  ['YP_pPCP05']


**db_xref** - A list of database cross-references as strings.

**Locus tags** are identifiers applied systematically to every gene in a sequencing project. If two submitters of different genomes use the same systematic names to describe different genes, this can be a source of confusion. Therefore, INSDC maintains a registry of locus tag prefixes to avoid overlap between genome annotation projects. The prefix is then used systematically to give a new unambiguous name to every gene.

# Exercises
- 6.1.1 ++

## Reading, writing and parsing files
- `SeqIO` reads, writes and parses `SeqRecord` objects
- The output of `Bio.SeqIO.parse()` is a `SeqRecord` iterator.
- The output of `Bio.SeqIO.read()` is one `SeqRecord`object

In [42]:
# Import
from Bio import SeqIO

If there is only one record in the file, you might as well use the Bio.SeqIO.read() function. It takes the same two arguments and returns a SeqRecord object with one record.

Example parsing:

In [43]:
# Here we're using the explicit path to a fasta file (in the data folder)
for seq_record in SeqIO.parse("data/ls_orchid.fasta", "fasta"):
    print(seq_record.id)

gi|2765658|emb|Z78533.1|CIZ78533
gi|2765657|emb|Z78532.1|CCZ78532
gi|2765656|emb|Z78531.1|CFZ78531
gi|2765655|emb|Z78530.1|CMZ78530
gi|2765654|emb|Z78529.1|CLZ78529
gi|2765652|emb|Z78527.1|CYZ78527
gi|2765651|emb|Z78526.1|CGZ78526
gi|2765650|emb|Z78525.1|CAZ78525
gi|2765649|emb|Z78524.1|CFZ78524
gi|2765648|emb|Z78523.1|CHZ78523
gi|2765647|emb|Z78522.1|CMZ78522
gi|2765646|emb|Z78521.1|CCZ78521
gi|2765645|emb|Z78520.1|CSZ78520
gi|2765644|emb|Z78519.1|CPZ78519
gi|2765643|emb|Z78518.1|CRZ78518
gi|2765642|emb|Z78517.1|CFZ78517
gi|2765641|emb|Z78516.1|CPZ78516
gi|2765640|emb|Z78515.1|MXZ78515
gi|2765639|emb|Z78514.1|PSZ78514
gi|2765638|emb|Z78513.1|PBZ78513
gi|2765637|emb|Z78512.1|PWZ78512
gi|2765636|emb|Z78511.1|PEZ78511
gi|2765635|emb|Z78510.1|PCZ78510
gi|2765634|emb|Z78509.1|PPZ78509
gi|2765633|emb|Z78508.1|PLZ78508
gi|2765632|emb|Z78507.1|PLZ78507
gi|2765631|emb|Z78506.1|PLZ78506
gi|2765630|emb|Z78505.1|PSZ78505
gi|2765629|emb|Z78504.1|PKZ78504
gi|2765628|emb|Z78503.1|PCZ78503
gi|2765627

Alternatively:

In [44]:
with open("data/ls_orchid.fasta", "r") as handle:
    for record in SeqIO.parse(handle, "fasta"):
        print(record.id)

gi|2765658|emb|Z78533.1|CIZ78533
gi|2765657|emb|Z78532.1|CCZ78532
gi|2765656|emb|Z78531.1|CFZ78531
gi|2765655|emb|Z78530.1|CMZ78530
gi|2765654|emb|Z78529.1|CLZ78529
gi|2765652|emb|Z78527.1|CYZ78527
gi|2765651|emb|Z78526.1|CGZ78526
gi|2765650|emb|Z78525.1|CAZ78525
gi|2765649|emb|Z78524.1|CFZ78524
gi|2765648|emb|Z78523.1|CHZ78523
gi|2765647|emb|Z78522.1|CMZ78522
gi|2765646|emb|Z78521.1|CCZ78521
gi|2765645|emb|Z78520.1|CSZ78520
gi|2765644|emb|Z78519.1|CPZ78519
gi|2765643|emb|Z78518.1|CRZ78518
gi|2765642|emb|Z78517.1|CFZ78517
gi|2765641|emb|Z78516.1|CPZ78516
gi|2765640|emb|Z78515.1|MXZ78515
gi|2765639|emb|Z78514.1|PSZ78514
gi|2765638|emb|Z78513.1|PBZ78513
gi|2765637|emb|Z78512.1|PWZ78512
gi|2765636|emb|Z78511.1|PEZ78511
gi|2765635|emb|Z78510.1|PCZ78510
gi|2765634|emb|Z78509.1|PPZ78509
gi|2765633|emb|Z78508.1|PLZ78508
gi|2765632|emb|Z78507.1|PLZ78507
gi|2765631|emb|Z78506.1|PLZ78506
gi|2765630|emb|Z78505.1|PSZ78505
gi|2765629|emb|Z78504.1|PKZ78504
gi|2765628|emb|Z78503.1|PCZ78503
gi|2765627

Alternatively

In [None]:
# List comprehension
identifiers = [seq_record.id for seq_record in SeqIO.parse("data/ls_orchid.gbk","genbank")]

# Exercises
- 7.1.1 +

The first thing we'll want to do is reading in a sequence object. The function that we'll need for that is `Bio.SeqIO.parse()` and it expects two arguments:
1. An explicit path to a file, a filehandle or a link to data that can be downloaded from the internet  
2. A lower case string specifying the sequence format. Examples are: clustal, fasta, embl, fastq, genbank or gb, pdb-atom, swiss, uniprot-xml,... You must specify the file format because [*explicit is better than implicit*](https://www.python.org/dev/peps/pep-0020/). 

## Parsing from the internet
- Download and parse sequences from internet (NCBI, Swiss-prot, ExPASy, etc.)
- Big files: fetch once and store
- [Entrez](https://www.ncbi.nlm.nih.gov/Web/Search/entrezfs.html):
    - data retrieval system 
    - provides users access to NCBI's databases: PubMed, GenBank, GEO, and many others.
- Output typically in `XML`

In [50]:
# Imports
from Bio import Entrez
from Bio import SeqIO

In the previous sections, we looked at parsing sequence data from a file (using a filename or handle). As discussed in the introduction of this chapter, it's also possible to download and parse sequences from the internet. Note that just because you can download sequence data and parse it into a SeqRecord object in one go doesn't mean this is a good idea. In general, you should probably download sequences once and save them to a file for reuse.

You can access Entrez from a web browser to manually enter queries, or you can use Biopython's Bio.Entrez module for programmatic access to Entrez. 

- `Entrez.email`
- `Entrez.esearch`
- `Entrez.efetch` 
- `Entrez.read`
- `Entrez.parse`
- ...

In [51]:
# Provide email address
Entrez.email = "hello@its.me"

# Use e-search to search any of the databases of NCBI
with Entrez.esearch(db="nucleotide", term="Cypripedioideae[Orgn] AND matK[Gene]", idtype="acc") as handle:
    records = Entrez.read(handle)

In [52]:
records

{'Count': '542', 'RetMax': '20', 'RetStart': '0', 'IdList': ['MT683624.1', 'MK935187.1', 'MH659838.1', 'MN016934.1', 'NC_045279.1', 'NC_045278.1', 'NC_045400.1', 'MN602053.1', 'MN535015.1', 'MN535014.1', 'KX886268.1', 'KX886267.1', 'KX886266.1', 'KX886265.1', 'KX886264.1', 'KX886263.1', 'KX886262.1', 'KX886261.1', 'KX886260.1', 'KX886259.1'], 'TranslationSet': [{'From': 'Cypripedioideae[Orgn]', 'To': '"Cypripedioideae"[Organism]'}], 'TranslationStack': [{'Term': '"Cypripedioideae"[Organism]', 'Field': 'Organism', 'Count': '6249', 'Explode': 'Y'}, {'Term': 'matK[Gene]', 'Field': 'Gene', 'Count': '190434', 'Explode': 'N'}, 'AND'], 'QueryTranslation': '"Cypripedioideae"[Organism] AND matK[Gene]'}

Output is a record which can be parsed

In [62]:
Entrez.email = "hello@its.me"

with Entrez.efetch(db="nucleotide", rettype="fasta", retmode="text", id="6273291") as handle:
    record = SeqIO.read(handle, "fasta")
record

SeqRecord(seq=Seq('TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAA...AGA', SingleLetterAlphabet()), id='AF191665.1', name='AF191665.1', description='AF191665.1 Opuntia marenae rpl16 gene; chloroplast gene for chloroplast product, partial intron sequence', dbxrefs=[])

In [63]:
Entrez.email = "hello@its.me"

with Entrez.efetch(db="nucleotide", id="6273291", retmode="xml", rettype="fasta") as handle:
    record = Entrez.read(handle)
record

[{'TSeq_seqtype': StringElement('', attributes={'value': 'nucleotide'}), 'TSeq_accver': 'AF191665.1', 'TSeq_taxid': '106980', 'TSeq_orgname': 'Grusonia marenae', 'TSeq_defline': 'Opuntia marenae rpl16 gene; chloroplast gene for chloroplast product, partial intron sequence', 'TSeq_length': '902', 'TSeq_sequence': 'TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGAATCTAAATGATATAGGATTCCACTATGTAAGGTCTTTGAATCATATCATAAAAGACAATGTAATAAAGCATGAATACAGATTCACACATAATTATCTGATATGAATCTATTCATAGAAAAAAGAAAAAAGTAAGAGCCTCCGGCCAATAAAGACTAAGAGGGTTGGCTCAAGAACAAAGTTCATTAAGAGCTCCATTGTAGAATTCAGACCTAATCATTAATCAAGAAGCGATGGGAACGATGTAATCCATGAATACAGAAGATTCAATTGAAAAAGATCCTATGNTCATTGGAAGGATGGCGGAACGAACCAGAGACCAATTCATCTATTCTGAAAAGTGATAAACTAATCCTATAAAACTAAAATAGATATTGAAAGAGTAAATATTCGCCCGCGAAAATTCCTTTTTTATTAAATTGCTCATATTTTCTTTTAGCAATGCAATCTAATAAAATATATCTATACAAAAAAACATAGACAAACTATATATATATATATATATAATATATTTCAAATTCCCTTATATATCCAAATATAAAAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGTATTATTAAATGTATATATTAATTCAATATTATTATTCTA

First, we'll have to define who we are. Then, we're telling the e-fetching function that we want to access the nucleotide database, we're looking for a fasta sequence, the file format and the id in the form of a handle, which we will subsequently read with the Bio.SeqIO.read() function.

# Exercises
- Sickle cell ++ (long)

## BLAST in Biopython
- e.g. find out which organism a sequence belongs to
- Function `qblast()` part of `Bio.Blast.NCBIWWW`

Last thing we will see for today is BLAST. And BLAST represents Basic Local Alignment Search Tool. So what it does is, imagine that you obtained a sequence in the lab, BLAST will find regions of similarity between your sequence and a database full of sequences by trying to  align your sequence to the one of the database. Together with this alignment it will also calculate a statistical significance of how likely that the two sequences are identical, because you can expect that there will be mutliple possible alignments, one a bit more likely than the other (based on mismatches during the alignment procedures).

The function that we will use for this is qblast and it's part of BIo Blast NCBI-WWW. 

In [64]:
# Imports
from Bio.Blast import NCBIWWW

- Three arguments: 
    1. blast program: blastn, blastp, blastx, etc
    2. databases: e.g. nt for nucleotide
    3. sequence, sequence in fasta or identifier

In [65]:
# Don't run this all at once
result_handle = NCBIWWW.qblast("blastn", "nt", "8332116")
#result_handle = NCBIWWW.qblast("blastn", "nt", fasta_string)
#result_handle = NCBIWWW.qblast("blastn", "nt", record.seq)

- Takes a long time to run

- Output in handle object (by default XML)
- Next step:
    - read/parse XML output into Python objects
    - Save a local copy
- handle object output can only be used once - calling it again returns an empty string.

This is especially useful when debugging code that extracts info from the BLAST results (because re-running the online search is slow and wastes the NCBI computer time).

Parse XML output into Python objects:

In [None]:
from Bio.Blast import NCBIXML

# Option 1:
blast_records = NCBIXML.parse(result_handle)

# Alternatively, for only one BLAST result:
blast_record = NCBIXML.read(result_handle)

Save a local copy:

In [None]:
# Option 2:
with open("file.xml", "w") as out_handle:
    out_handle.write(result_handle.read())
    
with open("file.xml", "r") as fh:
    my_blast = fh.read()

The Blast record holds the information of the BLAST output, e.g.:
- Description: information about one hit description,
- Alignment: information about one alignment hit, 
- HSP: information about one HSP. 


In [67]:
from Bio.Blast import NCBIXML
# This is a blast result output that has been saved in a file, called my_blast.xml
with open("data/my_blast.xml", "r") as fh:
    # Read in records
    blast_records = NCBIXML.parse(fh)
    for blast_record in blast_records:
        # Extract information!
        for descr in blast_record.descriptions:
            title = descr.title
            score = descr.score
            e = descr.e
            print(title, score, e)

gi|45357364|gb|AE017046.1| Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence 19218.0 0.0
gi|311902116|gb|HM807366.1| Yersinia pestis strain C790 plasmid pPCP1, complete sequence 19196.0 0.0
gi|262363963|gb|CP001588.1| Yersinia pestis D106004 plasmid pPCY1, complete sequence 19189.0 0.0
gi|1518099693|gb|CP033698.1| Yersinia pestis strain FDAARGOS_601 plasmid unnamed2, complete sequence 19187.0 0.0
gi|1046916732|gb|CP016276.1| Yersinia pestis strain Cadman plasmid pPCP1, complete sequence 19187.0 0.0
gi|294352540|gb|CP001596.1| Yersinia pestis Z176003 plasmid pPCP1, complete sequence 19187.0 0.0
gi|5763810|emb|AL109969.1| Yersinia pestis CO92 plasmid pPCP1 19187.0 0.0
gi|2996216|gb|AF053945.1| Yersinia pestis strain KIM5 plasmid pPCP1, complete sequence >gi|311902126|gb|HM807367.1| Yersinia pestis strain C2614 plasmid pPCP1, complete sequence >gi|311902136|gb|HM807368.1| Yersinia pestis strain C2944 plasmid pPCP1, complete sequence 19185.0 0.0
gi|162350793|gb|CP