This notebook explores how references (i.e., database prefixes and identifiers) are used in BioPAX models and whether these are valid.

First, we load a BioPAX model.

In [1]:
import pybiopax
#model = pybiopax.model_from_owl_url('https://www.pathwaycommons.org/archives/PC2/v12/PathwayCommons12.pid.BIOPAX.owl.gz')
model = pybiopax.model_from_pc_query('pathsfromto', source=['MAPK1'], target=['ELK1'])

Processing OWL elements:   0%|          | 0.00/685 [00:00<?, ?it/s]

Next, we check all the references in the model and see if the prefixes and identifiers are valid.

In [3]:
import re
import bioregistry
from pybiopax.references import get_prefix_id_pairs

def validate_id(resource, identifier):
    pattern = resource.get_pattern()                                            
    if pattern is None:                                                     
        return None                                                         
    banana = resource.get_banana()                                              
    banana_peel = resource.get_banana_peel() or ':'                                                                 
    if banana:                                                              
        pattern = f'^({banana}{banana_peel})?{pattern[1:]}'
    return True if re.match(pattern, identifier) else False


for prefix, identifier in get_prefix_id_pairs(model):
    if not prefix:
        continue
    res = bioregistry.get_resource(prefix)
    if not res:
        print(prefix + '... UNKNOWN PREFIX')
    else:
        id_valid = validate_id(res, identifier)
        prefix_valid = (res.prefix == prefix)
        print(prefix + ' / ' + identifier + '... %s PREFIX / %sVALID ID' % ('VALID' if prefix_valid else 'NONSTANDARD', '' if id_valid else 'IN'))


uniprot knowledgebase... UNKNOWN PREFIX
uniprot knowledgebase... UNKNOWN PREFIX
chebi / CHEBI:14791... VALID PREFIX / VALID ID
pubmed / 12665513... VALID PREFIX / VALID ID
reactome / R-HSA-3009350... VALID PREFIX / VALID ID
biogrid / 111580... VALID PREFIX / VALID ID
reactome / R-ALL-29358... VALID PREFIX / VALID ID
chebi / CHEBI:2359... VALID PREFIX / VALID ID
reactome / R-HSA-112359... VALID PREFIX / VALID ID
hgnc symbol / MAPK6... NONSTANDARD PREFIX / VALID ID
reactome / R-HSA-109845... VALID PREFIX / VALID ID
hgnc symbol / MAPK3... NONSTANDARD PREFIX / VALID ID
chebi / CHEBI:38702... VALID PREFIX / VALID ID
biogrid / 260718... VALID PREFIX / VALID ID
pubmed / 8007980... VALID PREFIX / VALID ID
uniprot knowledgebase... UNKNOWN PREFIX
chebi / CHEBI:37045... VALID PREFIX / VALID ID
pubmed / 21730285... VALID PREFIX / VALID ID
pubmed / 23322625... VALID PREFIX / VALID ID
molecular interactions ontology... UNKNOWN PREFIX
chebi / CHEBI:16761... VALID PREFIX / VALID ID
pubmed / 15899879..