The primary objective of this project is to convert BioPAX files into the GO-CAM format using Python. This serves as a modernized and optimized rewrite of the existing BioPAX to GO-CAM conversion code, which was originally implemented in Java. The goal is to produce code that is more readable, maintainable, and efficient.
-
Parsing: Utilize the
PyBioPAX
library to parse BioPAX data.- inferTransportProcess
- inferMolecularFunctionFromEnablers
- inferOccursInFromEntityLocations
- inferRegulatesViaOutputRegulates
- inferRegulatesViaOutputEnables
- inferProvidesInput
- inferSmallMoleculeRegulators
-
Transformation/Derivation Stage: Implement various inference steps to transform and derive new data values.
-
Transformer Classes: Create transformer classes, such as ReactomeTransformer, potentially using the Factory pattern if multiple transformers are needed.
The project is primarily developed in Python. Key libraries and frameworks include:
- PyBioPAX: For parsing and processing BioPAX files.
- rdflib: For working with RDF data, aiding in generating the GO-CAM models in .ttl format.
- Ontobio: Contains GO-CAM specific functions using rdflib for quick generation of GO-CAM TTL.
- OAK: Provides ontology parsing and traversal utilities for tasks requiring GO, CHEBI, and other ontologies.
- Clone the repository:
git clone https://github.com/geneontology/pybiopax2gocam.git
cd pybiopax2gocam
- Install the required packages:
pip install -r requirements.txt
To run the converter, use the following command:
python3 -m src.controllers.biopax_controller -t [parser_type] -v [view_type] -i [path_to_biopax]
-t, --parser_type
: Specifies the parser type. Choices areyeast
orreactome
.-v, --view_type
: Specifies the view type. Choices aregocamgen
,json
,yaml
, orvis
.-i, --biopax_path
: Path to the BioPAX file or folder containing BioPAX files.
Example:
python3 -m src.controllers.biopax_controller -t reactome -v json -i resources/test_biopax/reactome/R-HSA-204174_level3.owl
By the end of the project, the following deliverables are expected:
- A rewritten BioPAX to GO-CAM converter tool in Python.
- Comprehensive documentation detailing the usage of the converter tool.
- Validation and testing mechanisms for the converter tool.
- Integration with existing systems.
- A final project report detailing the development process, challenges, and outcomes.