In [1]:
import warnings
warnings.filterwarnings(action='ignore')

# Chapter ‍10 Swiss-Prot and ExPASy

## 10.1 Parsing Swiss-Prot files

Swiss-Prot (https://web.expasy.org/docs/swiss-prot_guideline.html) is a hand-curated database of protein sequences. Biopython can parse the “plain text” Swiss-Prot file format, which is still used for the UniProt Knowledgebase which combined Swiss-Prot, TrEMBL and PIR-PSD.

Although in the following we focus on the older human readable plain text format, Bio.SeqIO can read both this and the newer UniProt XML file format for annotated protein sequences.

### 10.1.1 Parsing Swiss-Prot records

In Section ‍5.3.2, we described how to extract the sequence of a Swiss-Prot record as a SeqRecord object. Alternatively, you can store the Swiss-Prot record in a Bio.SwissProt.Record object, which in fact stores the complete information contained in the Swiss-Prot record. In this section, we describe how to extract Bio.SwissProt.Record objects from a Swiss-Prot file.

To parse a Swiss-Prot record, we first get a handle to a Swiss-Prot record. There are several ways to do so, depending on where and how the Swiss-Prot record is stored:

In [2]:
# Open a Swiss-Prot file locally:
handle = open("uniprot_sprot.dat")

In [3]:
# Open a gzipped Swiss-Prot file:
import gzip
handle = gzip.open("uniprot_sprot.dat.gz", "rt")

In [4]:
# Open a Swiss-Prot file over the internet:
from urllib.request import urlopen
url = "https://raw.githubusercontent.com/biopython/biopython/master/Tests/SwissProt/F2CXE6.txt"
handle = urlopen(url)

In [5]:
# read(): To read one Swiss-Prot record from the handle
from Bio import SwissProt
record = SwissProt.read(handle)

In [6]:
print(record.description)

SubName: Full=Plasma membrane intrinsic protein {ECO:0000313|EMBL:BAN04711.1}; SubName: Full=Predicted protein {ECO:0000313|EMBL:BAJ87517.1};


In [7]:
for ref in record.references:
    print("authors:", ref.authors)
    print("title:", ref.title)

authors: Matsumoto T., Tanaka T., Sakai H., Amano N., Kanamori H., Kurita K., Kikuta A., Kamiya K., Yamamoto M., Ikawa H., Fujii N., Hori K., Itoh T., Sato K.
title: Comprehensive sequence analysis of 24,783 barley full-length cDNAs derived from 12 clone libraries.
authors: Shibasaka M., Sasano S., Utsugi S., Katsuhara M.
title: Functional characterization of a novel plasma membrane intrinsic protein2 in barley.
authors: Shibasaka M., Katsuhara M., Sasano S.
title: 


In [8]:
print(record.organism_classification)

['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'Liliopsida', 'Poales', 'Poaceae', 'BEP clade', 'Pooideae', 'Triticeae', 'Hordeum']


\## How to parse a file that contains more than one Swiss-Prot record

To parse a file that contains more than one Swiss-Prot record, we use the parse function instead. This function allows us to iterate over the records in the file.

For example, let’s parse the full Swiss-Prot database and collect all the descriptions. You can download this from the [ExPASy FTP site](ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz) as a single gzipped-file `uniprot_sprot.dat.gz` (about 300MB). This is a compressed file containing a single file, `uniprot_sprot.dat` (over 1.5GB).

As described at the start of this section, you can use the Python library `gzip` to open and uncompress a `.gz` file, like this:

In [9]:
import gzip
handle = gzip.open("./uniprot_sprot.dat.gz","rt")

However, uncompressing a large file takes time, and each time you open the file for reading in this way, it has to be decompressed on the fly. So, if you can spare the disk space you’ll save time in the long run if you first decompress the file to disk, to get the `uniprot_sprot.dat` file inside. Then you can open the file for reading as usual:

In [10]:
handle = open("uniprot_sprot.dat")

In [11]:
# parse() with list comprehension
from Bio import SwissProt
handle = open("uniprot_sprot.dat") # gzip.open("uniprot_sprot.dat.gz", "rt")
descriptions = [record.description for record in SwissProt.parse(handle)]
len(descriptions)

569213

In [12]:
descriptions[:5]

['RecName: Full=Putative transcription factor 001R;',
 'RecName: Full=Uncharacterized protein 002L;',
 'RecName: Full=Uncharacterized protein 002R;',
 'RecName: Full=Uncharacterized protein 003L;',
 'RecName: Full=Uncharacterized protein 3R; Flags: Precursor;']

In [13]:
# Or, using a for loop over the record iterator
from Bio import SwissProt
descriptions = []

for record in SwissProt.parse(handle):
    descriptions.append(record.description)

In [14]:
len(descriptions)

0

In [15]:
dir(record)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'accessions',
 'annotation_update',
 'comments',
 'created',
 'cross_references',
 'data_class',
 'description',
 'entry_name',
 'features',
 'gene_name',
 'host_organism',
 'host_taxonomy_id',
 'keywords',
 'molecule_type',
 'organelle',
 'organism',
 'organism_classification',
 'protein_existence',
 'references',
 'seqinfo',
 'sequence',
 'sequence_length',
 'sequence_update',
 'taxonomy_id']

### 10.1.2 Parsing the Swiss-Prot keyword and category list
Swiss-Prot also distributes a file `keywlist.txt`, which lists the keywords and categories used in Swiss-Prot. The file contains entries in the following form:

```
ID   2Fe-2S.
AC   KW-0001
DE   Protein which contains at least one 2Fe-2S iron-sulfur cluster: 2 iron
DE   atoms complexed to 2 inorganic sulfides and 4 sulfur atoms of
DE   cysteines from the protein.
SY   Fe2S2; [2Fe-2S] cluster; [Fe2S2] cluster; Fe2/S2 (inorganic) cluster;
SY   Di-mu-sulfido-diiron; 2 iron, 2 sulfur cluster binding.
GO   GO:0051537; 2 iron, 2 sulfur cluster binding
HI   Ligand: Iron; Iron-sulfur; 2Fe-2S.
HI   Ligand: Metal-binding; 2Fe-2S.
CA   Ligand.
//
ID   3D-structure.
AC   KW-0002
DE   Protein, or part of a protein, whose three-dimensional structure has
DE   been resolved experimentally (for example by X-ray crystallography or
DE   NMR spectroscopy) and whose coordinates are available in the PDB
DE   database. Can also be used for theoretical models.
HI   Technical term: 3D-structure.
CA   Technical term.
//
ID   3Fe-4S.
...
```

The entries in this file can be parsed by the parse function in the `Bio.SwissProt.KeyWList` module. Each entry is then stored as a `Bio.SwissProt.KeyWList.Record`, which is a Python dictionary.

In [16]:
from Bio.SwissProt import KeyWList
handle = open("keywlist.txt")
records = KeyWList.parse(handle)
for record in records:
    print(record["ID"])
    print(record["DE"])

2Fe-2S.
Protein which contains at least one 2Fe-2S iron-sulfur cluster: 2 iron atoms complexed to 2 inorganic sulfides and 4 sulfur atoms of cysteines from the protein.


## 10.2 Parsing Prosite records

Prosite is a database containing protein domains, protein families, functional sites, as well as the patterns and profiles to recognize them. Prosite was developed in parallel with Swiss-Prot. In Biopython, a Prosite record is represented by the `Bio.ExPASy.Prosite.Record` class, whose members correspond to the different fields in a Prosite record.

In general, a Prosite file can contain more than one Prosite records. For example, the full set of Prosite records, which can be downloaded as a single file (`prosite.dat`) from the [ExPASy FTP site](ftp://ftp.expasy.org/databases/prosite/prosite.dat), contains 2073 records (version 20.24 released on 4 December 2007). To parse such a file, we again make use of an iterator:

In [17]:
from Bio.ExPASy import Prosite
handle = open("prosite.dat")
records = Prosite.parse(handle)
record = next(records)

In [18]:
record.accession

'PS00001'

In [19]:
record.name

'ASN_GLYCOSYLATION'

In [20]:
record.pdoc

'PDOC00001'

In [21]:
record = next(records)

In [22]:
record.accession

'PS00004'

In [23]:
record.name

'CAMP_PHOSPHO_SITE'

In [24]:
record.pdoc

'PDOC00004'

In [25]:
record = next(records)

In [26]:
record.accession

'PS00005'

In [27]:
record.name

'PKC_PHOSPHO_SITE'

In [28]:
record.pdoc

'PDOC00005'

If you’re interested in how many Prosite records there are, you could use

In [29]:
from Bio.ExPASy import Prosite
handle = open("prosite.dat")
records = Prosite.parse(handle)

n = 0
while(True):
    try:
        try:
            record = next(records)
        except StopIteration:
            break
        n += 1
    except ValueError: # DT   PADR1 domain profile.
        continue
'''
n = 0
for record in records:
    n += 1
'''

print(n)

2633


To read exactly one Prosite from the handle, you can use the `read` function:

* `mysingleprositerecord.dat`
```
//
ID   GLA_1; PATTERN.
AC   PS00011;
DT   01-APR-1990 CREATED; 01-MAY-2005 DATA UPDATE; 22-FEB-2023 INFO UPDATE.
DE   Vitamin K-dependent carboxylation domain.
PA   E-x(2)-[ERK]-E-x-C-x(6)-[EDR]-x(10,11)-[FYA]-[YW].
NR   /RELEASE=2023_01,569213;
NR   /TOTAL=120(120); /POSITIVE=118(118); /UNKNOWN=0(0); /FALSE_POS=2(2);
NR   /FALSE_NEG=3; /PARTIAL=6;
CC   /TAXO-RANGE=??E??; /MAX-REPEAT=1;
CC   /VERSION=2;
DR   Q1L659    , FA101_PSETE, T; Q1L658    , FA102_PSETE, T;
DR   P00743    , FA10_BOVIN , T; P25155    , FA10_CHICK , T;
DR   P00742    , FA10_HUMAN , T; O88947    , FA10_MOUSE , T;
DR   O19045    , FA10_RABIT , T; Q63207    , FA10_RAT   , T;
DR   Q4QXT9    , FA10_TROCA , T; P22457    , FA7_BOVIN  , T;
DR   P08709    , FA7_HUMAN  , T; P70375    , FA7_MOUSE  , T;
DR   Q2F9P4    , FA7_PANPA  , T; Q2F9P2    , FA7_PANTR  , T;
DR   P98139    , FA7_RABIT  , T; Q8K3U6    , FA7_RAT    , T;
DR   P00741    , FA9_BOVIN  , T; P19540    , FA9_CANLF  , T;
DR   Q804X6    , FA9_CHICK  , T; Q6SA95    , FA9_FELCA  , T;
DR   P00740    , FA9_HUMAN  , T; P16294    , FA9_MOUSE  , T;
DR   Q95ND7    , FA9_PANTR  , T; P16293    , FA9_PIG    , T;
DR   P16296    , FA9_RAT    , T; Q58L95    , FAXC_OXYMI , T;
DR   Q58L96    , FAXC_OXYSU , T; Q56VR3    , FAXC_PSETE , T;
DR   A6MFK7    , FAXD1_DEMVE, T; P82807    , FAXD1_NOTSC, T;
DR   P00744    , PROZ_BOVIN , T; P22891    , PROZ_HUMAN , T;
DR   Q9CQW3    , PROZ_MOUSE , T; P00735    , THRB_BOVIN , T;
DR   P00734    , THRB_HUMAN , T; P19221    , THRB_MOUSE , T;
DR   Q19AZ8    , THRB_PIG   , T; Q5R537    , THRB_PONAB , T;
DR   P18292    , THRB_RAT   , T; A7Z070    , TMG1_BOVIN , T;
DR   O14668    , TMG1_HUMAN , T; Q5RCB6    , TMG1_PONAB , T;
DR   O14669    , TMG2_HUMAN , T; Q8R182    , TMG2_MOUSE , T;
DR   Q9BZD7    , TMG3_HUMAN , T; Q6PAQ9    , TMG3_MOUSE , T;
DR   Q9BZD6    , TMG4_HUMAN , T; Q8BGN6    , TMG4_MOUSE , T;
DR   P0CY52    , FAXD_NOTSN , P; P83347    , MGP_PRIGL  , P;
DR   P86312    , OSTCN_CAMHE, P; P83473    , OSTCN_HALDD, P;
DR   P84351    , OSTCN_HOMNE, P; P84122    , THRB_SALSA , P;
DR   Q800Y2    , MGP_ARGRE  , N; A0A024QYT3, OST2A_ONCMY, N;
DR   K9J977    , OST2B_ONCMY, N;
DR   O51378    , KAD_BORBU  , F; Q661K1    , KAD_BORGP  , F;
3D   1CFH; 1CFI; 1DAN; 1EZQ; 1F0R; 1F0S; 1IOD; 1J34; 1J35; 1KSN; 1LPG; 1LPK;
3D   1LPZ; 1LQD; 1MGX; 1NFU; 1NFW; 1NFX; 1NFY; 1NL0; 1NL1; 1NL2; 1O5D; 1P0S;
3D   1PFX; 1Q3M; 1Q8H; 1VZM; 1W0Y; 1W2K; 1WHE; 1WHF; 1WQV; 1WSS; 1WTG; 1WUN;
3D   1WV7; 1X7A; 1Z6J; 2A2Q; 2AEI; 2AER; 2B7D; 2B8O; 2BOH; 2C4F; 2CJI; 2EC9;
3D   2F9B; 2FIR; 2FLB; 2FLR; 2J2U; 2J34; 2J38; 2J4I; 2J94; 2J95; 2PF1; 2PF2;
3D   2SPT; 2UWL; 2UWO; 2UWP; 2VH0; 2VH6; 2WYG; 2WYJ; 2Y7X; 2Y7Z; 2Y80; 2Y81;
3D   2Y82; 2ZP0; 2ZWL; 2ZZU; 3ELA; 3TH2; 3TH3; 3TH4; 4BXS; 4BXW; 4IBL; 4MZZ;
3D   4NZQ; 4O03; 4Y6D; 4Y71; 4Y76; 4Y79; 4Y7A; 4Y7B; 4YLQ; 4ZH8; 4ZHA; 4ZMA;
3D   5EDK; 5EDM; 6BJR; 6C2W; 6M3B; 6M3C; 7TPP; 7TPQ;
DO   PDOC00011;
```

In [30]:
from Bio.ExPASy import Prosite
handle = open("mysingleprositerecord.dat")
record = Prosite.read(handle)

## 10.3 Parsing Prosite documentation records

We use the parser in `Bio.ExPASy.Prodoc` to parse Prosite documentation records. For example, to create a list of all accession numbers of Prosite documentation record, you can use
* Download test file : [`prosite.doc`](https://ftp.expasy.org/databases/prosite/prosite.doc)

In [31]:
from Bio.ExPASy import Prodoc
handle = open("prosite.doc")
records = Prodoc.parse(handle)

#accessions = [record.accession for record in records]
accessions = []
while(True):
    try:
        try:
            record = next(records)
        except StopIteration:
            break
        accessions.append(record.accession)
    except ValueError: # Line does not start with '{PDOC':
        continue

print(accessions)

['PDOC00000', 'PDOC00001', 'PDOC00004', 'PDOC00005', 'PDOC00006', 'PDOC00007', 'PDOC00008', 'PDOC00009', 'PDOC00010', 'PDOC00011', 'PDOC00012', 'PDOC00013', 'PDOC00014', 'PDOC00015', 'PDOC00016', 'PDOC00017', 'PDOC00018', 'PDOC00019', 'PDOC00020', 'PDOC00021', 'PDOC00022', 'PDOC00023', 'PDOC00024', 'PDOC00025', 'PDOC00026', 'PDOC00027', 'PDOC00028', 'PDOC00029', 'PDOC00030', 'PDOC00031', 'PDOC00032', 'PDOC00033', 'PDOC00034', 'PDOC00035', 'PDOC00036', 'PDOC00037', 'PDOC00038', 'PDOC00039', 'PDOC00040', 'PDOC00041', 'PDOC00042', 'PDOC00043', 'PDOC00044', 'PDOC00045', 'PDOC00046', 'PDOC00047', 'PDOC00048', 'PDOC00049', 'PDOC00050', 'PDOC00051', 'PDOC00052', 'PDOC00053', 'PDOC00054', 'PDOC00055', 'PDOC00056', 'PDOC00057', 'PDOC00058', 'PDOC00059', 'PDOC00060', 'PDOC00061', 'PDOC00062', 'PDOC00063', 'PDOC00064', 'PDOC00065', 'PDOC00066', 'PDOC00067', 'PDOC00068', 'PDOC00069', 'PDOC00070', 'PDOC00071', 'PDOC00072', 'PDOC00073', 'PDOC00074', 'PDOC00075', 'PDOC00076', 'PDOC00077', 'PDOC00078'

Again a `read()` function is provided to read exactly one Prosite documentation record from the handle.

## 10.4 Parsing Enzyme records

ExPASy’s Enzyme database is a repository of information on enzyme nomenclature.

the first line shows the EC (Enzyme Commission) number of lipoprotein lipase (second line). Alternative names of lipoprotein lipase are "clearing factor lipase", "diacylglycerol lipase", and "diglyceride lipase" (lines 3 through 5). The line starting with "CA" shows the catalytic activity of this enzyme. Comment lines start with "CC". The "PR" line shows references to the Prosite Documentation records, and the "DR" lines show references to Swiss-Prot records. Not of these entries are necessarily present in an Enzyme record.

In Biopython, an Enzyme record is represented by the `Bio.ExPASy.Enzyme.Record` class. This record derives from a Python dictionary and has keys corresponding to the two-letter codes used in Enzyme files. To read an Enzyme file containing one Enzyme record, use the read function in `Bio.ExPASy.Enzyme`:
* Download test file : [lipoprotein.txt](https://github.com/biopython/biopython/blob/master/Tests/Enzymes/lipoprotein.txt)

In [32]:
from Bio.ExPASy import Enzyme
with open("lipoprotein.txt") as handle:
    record = Enzyme.read(handle)

In [33]:
record["ID"]

'3.1.1.34'

In [34]:
record["DE"]

'Lipoprotein lipase.'

In [35]:
record["AN"]

['Clearing factor lipase.', 'Diacylglycerol lipase.', 'Diglyceride lipase.']

In [36]:
record["CA"]

'Triacylglycerol + H(2)O = diacylglycerol + a carboxylate.'

In [37]:
record["PR"]

['PDOC00110']

In [38]:
record["CC"]

['Hydrolyzes triacylglycerols in chylomicrons and very low-density lipoproteins (VLDL).',
 'Also hydrolyzes diacylglycerol.']

In [39]:
record["DR"]

[['P11151', 'LIPL_BOVIN'],
 ['P11153', 'LIPL_CAVPO'],
 ['P11602', 'LIPL_CHICK'],
 ['P55031', 'LIPL_FELCA'],
 ['P06858', 'LIPL_HUMAN'],
 ['P11152', 'LIPL_MOUSE'],
 ['O46647', 'LIPL_MUSVI'],
 ['P49060', 'LIPL_PAPAN'],
 ['P49923', 'LIPL_PIG'],
 ['Q06000', 'LIPL_RAT'],
 ['Q29524', 'LIPL_SHEEP']]

The full set of Enzyme records can be downloaded as a single file (`enzyme.dat`) from the [ExPASy FTP site](ftp://ftp.expasy.org/databases/enzyme/enzyme.dat), containing 4877 records (release of 3 March 2009). To parse such a file containing multiple Enzyme records, use the `parse` function in `Bio.ExPASy.Enzyme` to obtain an iterator:

In [40]:
from Bio.ExPASy import Enzyme
handle = open("enzyme.dat")
records = Enzyme.parse(handle)

We can now iterate over the records one at a time. For example, we can make a list of all EC numbers for which an Enzyme record is available:

In [41]:
ecnumbers = [record["ID"] for record in records]
ecnumbers

['1.1.1.1',
 '1.1.1.2',
 '1.1.1.3',
 '1.1.1.4',
 '1.1.1.5',
 '1.1.1.6',
 '1.1.1.7',
 '1.1.1.8',
 '1.1.1.9',
 '1.1.1.10',
 '1.1.1.11',
 '1.1.1.12',
 '1.1.1.13',
 '1.1.1.14',
 '1.1.1.15',
 '1.1.1.16',
 '1.1.1.17',
 '1.1.1.18',
 '1.1.1.19',
 '1.1.1.20',
 '1.1.1.21',
 '1.1.1.22',
 '1.1.1.23',
 '1.1.1.24',
 '1.1.1.25',
 '1.1.1.26',
 '1.1.1.27',
 '1.1.1.28',
 '1.1.1.29',
 '1.1.1.30',
 '1.1.1.31',
 '1.1.1.32',
 '1.1.1.33',
 '1.1.1.34',
 '1.1.1.35',
 '1.1.1.36',
 '1.1.1.37',
 '1.1.1.38',
 '1.1.1.39',
 '1.1.1.40',
 '1.1.1.41',
 '1.1.1.42',
 '1.1.1.43',
 '1.1.1.44',
 '1.1.1.45',
 '1.1.1.46',
 '1.1.1.47',
 '1.1.1.48',
 '1.1.1.49',
 '1.1.1.50',
 '1.1.1.51',
 '1.1.1.52',
 '1.1.1.53',
 '1.1.1.54',
 '1.1.1.55',
 '1.1.1.56',
 '1.1.1.57',
 '1.1.1.58',
 '1.1.1.59',
 '1.1.1.60',
 '1.1.1.61',
 '1.1.1.62',
 '1.1.1.63',
 '1.1.1.64',
 '1.1.1.65',
 '1.1.1.66',
 '1.1.1.67',
 '1.1.1.68',
 '1.1.1.69',
 '1.1.1.70',
 '1.1.1.71',
 '1.1.1.72',
 '1.1.1.73',
 '1.1.1.74',
 '1.1.1.75',
 '1.1.1.76',
 '1.1.1.77',
 '1.1.1.

## 10.5 Accessing the ExPASy server

 do a search on Swiss-Prot, you can find three orchid proteins for Chalcone Synthase, id numbers O23729, O23730, O23731

In [42]:
from Bio import ExPASy
from Bio import SwissProt
accessions = ["O23729", "O23730", "O23731"]
records = []

for accession in accessions:
    handle = ExPASy.get_sprot_raw(accession)
    record = SwissProt.read(handle) # read() : to read one Swiss-Prot record from the handle
    records.append(record)
records

[<Bio.SwissProt.Record at 0x7fe2614f0f10>,
 <Bio.SwissProt.Record at 0x7fe2614f0070>,
 <Bio.SwissProt.Record at 0x7fe292eee970>]

In [43]:
for accession in accessions:
    handle = ExPASy.get_sprot_raw(accession)
    try:
        record = SwissProt.read(handle)
    except ValueException:
        print("WARNING: Accession %s not found" % accession)
    records.append(record)

In [44]:
from Bio import ExPASy
handle = ExPASy.get_prosite_raw("PS00001")
text = handle.read()
print(text)

ID   ASN_GLYCOSYLATION; PATTERN.
AC   PS00001;
DT   01-APR-1990 CREATED; 01-APR-1990 DATA UPDATE; 01-APR-1990 INFO UPDATE.
DE   N-glycosylation site.
PA   N-{P}-[ST]-{P}.
CC   /SITE=1,carbohydrate;
CC   /SKIP-FLAG=TRUE;
CC   /VERSION=1;
PR   PRU00498;
DO   PDOC00001;
//



In [45]:
from Bio import ExPASy
from Bio.ExPASy import Prosite
handle = ExPASy.get_prosite_raw("PS00001")
record = Prosite.read(handle)

In [46]:
record = Prodoc.read(handle)

In [47]:
from Bio import ExPASy
handle = ExPASy.get_prosite_entry("PS00001")
html = handle.read()
with open("myprodocrecord.html", "w") as out_handle:
    out_handle.write(html)

In [48]:
from Bio import ExPASy
handle = ExPASy.get_prodoc_entry("PDOC00001")
html = handle.read()
with open("myprodocrecord.html", "w") as out_handle:
    out_handle.write(html)

## 10.6 Scanning the Prosite database

In [49]:
sequence = (
    "MEHKEVVLLLLLFLKSGQGEPLDDYVNTQGASLFSVTKKQLGAGSIEECAAKCEEDEEFT"
    "CRAFQYHSKEQQCVIMAENRKSSIIIRMRDVVLFEKKVYLSECKTGNGKNYRGTMSKTKN"
)

In [50]:
from Bio.ExPASy import ScanProsite
handle = ScanProsite.scan(seq=sequence)

In [51]:
result = ScanProsite.read(handle)

In [52]:
type(result)

Bio.ExPASy.ScanProsite.Record

In [53]:
result.n_seq

1

In [54]:
result.n_match

1

In [55]:
len(result)

1

In [56]:
result[0]

{'sequence_ac': 'USERSEQ1',
 'start': 16,
 'stop': 98,
 'signature_ac': 'PS50948',
 'score': '8.873',
 'level': '0'}

In [57]:
handle = ScanProsite.scan(seq=sequence, lowscore=1)
result = ScanProsite.read(handle)
result.n_match

2