## Use ExPASy programmatically

The next two lines import the Biopython libraries needed to:

1.- Access ExPASy

2.- Read / Write to disk

In [1]:
from Bio import ExPASy
from Bio import SeqIO

Now, we will fetch one entry (with accession number P40337) from ExPASy, and store its content in a variable that we will name *handle*

In [2]:
handle = ExPASy.get_sprot_raw("P40337")

Next, we will read the information in handle and store it in another variable (*Record*). We specify that the type of entry we are dealing with is swissprot (*swiss*)

In [3]:
Record = SeqIO.read(handle,"swiss")

As we are done reading data from ExPASy, we close the connection

In [4]:
handle.close()

Finally, we will print information from the retrieved record:
- ID
- Name
- Description
- Sequence (in FASTA format)
- Length of the amino acid sequence
- Organism

In [5]:
print(Record.id)
print(Record.name)
print(Record.description)
print(Record.format("fasta"))
print("Length %i " %len (Record))
print(Record.annotations["organism"])

P40337
VHL_HUMAN
RecName: Full=von Hippel-Lindau disease tumor suppressor; AltName: Full=Protein G7; AltName: Full=pVHL;
>P40337 RecName: Full=von Hippel-Lindau disease tumor suppressor; AltName: Full=Protein G7; AltName: Full=pVHL;
MPRRAENWDEAEVGAEEAGVEEYGPEEDGGEESGAEESGPEESGPEELGAEEEMEAGRPR
PVLRSVNSREPSQVIFCNRSPRVVLPVWLNFDGEPQPYPTLPPGTGRRIHSYRGHLWLFR
DAGTHDGLLVNQTELFVPSLNVDGQPIFANITLPVYTLKERCLQVVRSLVKPENYRRLDI
VRSLYEDLEDHPNVQKDLERLTQERIAHQRMGD

Length 213 
Homo sapiens (Human)


Imagine that we don't want the simple FASTA format and we prefer the GenBank format. We just copy the same code from the previous cell, and we change the "*fasta*" for "*gb*". Try it:

In [6]:
print(Record.id)
print(Record.name)
print(Record.description)
print(Record.format("gb"))
print("Length %i " %len (Record))
print(Record.annotations["organism"])

P40337
VHL_HUMAN
RecName: Full=von Hippel-Lindau disease tumor suppressor; AltName: Full=Protein G7; AltName: Full=pVHL;
LOCUS       VHL_HUMAN                213 aa                     UNK 01-FEB-1995
DEFINITION  RecName: Full=von Hippel-Lindau disease tumor suppressor; AltName:
            Full=Protein G7; AltName: Full=pVHL;.
ACCESSION   P40337
VERSION     P40337
DBLINK      EMBL: AF010238
            EMBL: L15409
            EMBL: AK315799
            EMBL: AC034193
            EMBL: CH471055
            EMBL: BC058831
            EMBL: U54612
            EMBL: X96489
            CCDS: CCDS2597.1
            CCDS: CCDS2598.1
            PIR: I38926
            RefSeq: NP_000542.1
            RefSeq: NP_937799.1
            PDB: 1LM8
            PDB: 1LQB
            PDB: 1VCB
            PDB: 3ZRC
            PDB: 3ZRF
            PDB: 3ZTC
            PDB: 3ZTD
            PDB: 3ZUN
            PDB: 4AJY
            PDB: 4AWJ
            PDB: 4B95
            PDB: 4B9K
            

As you can see, a very convenient feature in a jupyter notebook is that, if the output that you get is very large, it will be confined in a box, so you are able to view a small part, and navigate inside it, but not affecting the rest of the notebook.