# Constructing BioPAX models from scratch
This notebook shows how PyBioPAX can be used to create new BioPAX models as Python objects either manually or programmatically from scratch. This is different from the workflow in which a BioPAX model is obtained (deserialized) from a file or web service.

The notebooks provides examples of the following:
- Creating protein and RNA physical entities
- Constructing protein modifications and attaching them to proteins
- Constructing entity references and attaching them to physical entities
- Creating conversions over physical entities (biochemical reaction, degradation)
- Representing gene expression through template reactions that produce RNA
- Creating controllers over conversions (e.g., regulator of a template reaction)
- Adding created objects to a BioPaxModel to make a model
- Serialize and print the completed model

You can learn more about the BioPAX specification and the semantics of various parts of it here:
- The BioPAX Level 3 specification: http://www.biopax.org/release/biopax-level3-documentation.pdf
- The BioPAX OWL docs: http://www.biopax.org/owldoc/Level3/

Note that in the PyBioPAX implementation, all class parameters and attributes use the Python convention for capitalization, so e.g., the `sequencePosition` attribute in the BioPAX specification is capitalized as `sequence_position` in PyBioPAX. Of course, this just affects the in-memory representation of BioPAX objects, and does not affect the serialized BioPAX XML PyBioPAX reads and writes.

In [1]:
import pybiopax
from pybiopax.biopax import *

## Creating a modification feature
A typical modification feature represents phosphorylation. The type of modification and the residue are defined using `SequenceModificationVocabulary` while the site is provided in a `SequenceSite`. Note that in PyBioPAX, simple types (str, int, float) are all represented as strings so e.g., the `sequence_position` is represented as a string.

In [2]:
sm = SequenceModificationVocabulary(uid='smf', term=['Phosphothreonine'])
ss = SequenceSite(uid='ss', sequence_position='202', position_status='EQUAL')
mf = ModificationFeature(uid='mf',
                         modification_type=sm,
                         feature_location=ss)

## Creating physical entities and entity references
One example of a physical entity is a protein. Physical entities can have different features representing their state. They usually refer to an entity reference which provides absolute grounding for the base entity without a specific state.

In [3]:
er = ProteinReference(uid='http://identifiers.org/uniprot/P27361', display_name='MAPK3')
p1 = Protein(uid='p1', display_name='Erk1', entity_reference=er)
p2 = Protein(uid='p2', display_name='Erk1(p)', feature=[mf], entity_reference=er)

## Creating a biochemical reaction
A `BiochemicalReaction` can have `left` and `right` sides and various further optional attributes (here we provide a `provenance` attribute as an example).

In [4]:
provenance = Provenance(uid='prov', display_name='My Database')
br = BiochemicalReaction(uid='br',
                         left=[p1],
                         right=[p2],
                         data_source=provenance)

## Creating a model from objects
We next create a `BioPaxModel` from all the objects we have created. When constructing a model manually, we also call the `add_reverse_links` function to make sure implicit reverse links between the objects are added explicitly. These reverse links only exist in memory and help in certain model traversal tasks.

In [5]:
objects = [p1, p2, er, sm, ss, mf, br, provenance]
model = BioPaxModel(objects=objects)
model.add_reverse_links()

## Printing the serialized model
We can now serialize the model into an XML string and print it or write it to a file. Note that the reverse links never appear in serialized form, they only exist in memory where they can be used for complex model traversal.

In [6]:
owl_str = pybiopax.model_to_owl_str(model)
print(owl_str)

Serializing OWL elements:   0%|          | 0/8 [00:00<?, ?it/s]

<?xml version='1.0' encoding='utf-8'?>
<rdf:RDF xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bp="http://www.biopax.org/release/biopax-level3.owl#" xml:base="http://www.biopax.org/release/biopax-level3.owl#">
  <owl:Ontology rdf:about="">
 <owl:imports rdf:resource="http://www.biopax.org/release/biopax-level3.owl#"/>
  </owl:Ontology>

<bp:Protein rdf:ID="p1">
 <bp:displayName rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Erk1</bp:displayName>
 <bp:entityReference rdf:resource="http://identifiers.org/uniprot/P27361"/>
</bp:Protein>

<bp:Protein rdf:ID="p2">
 <bp:displayName rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Erk1(p)</bp:displayName>
 <bp:entityReference rdf:resource="http://identifiers.org/uniprot/P27361"/>
 <bp:feature rdf:resource="#mf"/>
</bp:Protein>

<bp:ProteinReference rdf:about="http://identifiers.org/uniprot/P27361">
 <bp:displayName rdf:datatyp

## Representing the regulation of gene expression
Here we show an example of representing the expression of a gene and its control.

In [7]:
atf4 = Protein(uid='atf4', display_name='ATF4')
capn6 = Rna(uid='capn6', display_name='CAPN6')
tr = TemplateReaction(uid='tr',
                      product=[capn6],
                      template_direction='FORWARD')
trr = TemplateReactionRegulation(uid='trr',
                                 controller=[atf4],
                                 controlled=tr)
model = BioPaxModel([atf4, capn6, tr, trr])
print(pybiopax.model_to_owl_str(model))

Serializing OWL elements:   0%|          | 0/4 [00:00<?, ?it/s]

<?xml version='1.0' encoding='utf-8'?>
<rdf:RDF xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bp="http://www.biopax.org/release/biopax-level3.owl#" xml:base="http://www.biopax.org/release/biopax-level3.owl#">
  <owl:Ontology rdf:about="">
 <owl:imports rdf:resource="http://www.biopax.org/release/biopax-level3.owl#"/>
  </owl:Ontology>

<bp:Protein rdf:ID="atf4">
 <bp:displayName rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ATF4</bp:displayName>
</bp:Protein>

<bp:Rna rdf:ID="capn6">
 <bp:displayName rdf:datatype="http://www.w3.org/2001/XMLSchema#string">CAPN6</bp:displayName>
</bp:Rna>

<bp:TemplateReaction rdf:ID="tr">
 <bp:product rdf:resource="#capn6"/>
 <bp:templateDirection rdf:datatype="http://www.w3.org/2001/XMLSchema#string">FORWARD</bp:templateDirection>
</bp:TemplateReaction>

<bp:TemplateReactionRegulation rdf:ID="trr">
 <bp:controlled rdf:resource="#tr"/>
 <bp:

## Representing degradation

In [8]:
pr = Protein(uid='mdm2', display_name='MDM2')
deg = Degradation(uid='deg', left=[pr], right=[])
model = BioPaxModel([pr, deg])
print(pybiopax.model_to_owl_str(model))

Serializing OWL elements:   0%|          | 0/2 [00:00<?, ?it/s]

<?xml version='1.0' encoding='utf-8'?>
<rdf:RDF xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bp="http://www.biopax.org/release/biopax-level3.owl#" xml:base="http://www.biopax.org/release/biopax-level3.owl#">
  <owl:Ontology rdf:about="">
 <owl:imports rdf:resource="http://www.biopax.org/release/biopax-level3.owl#"/>
  </owl:Ontology>

<bp:Protein rdf:ID="mdm2">
 <bp:displayName rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MDM2</bp:displayName>
</bp:Protein>

<bp:Degradation rdf:ID="deg">
 <bp:left rdf:resource="#mdm2"/>
</bp:Degradation>
</rdf:RDF>

