PairProphet

Check out our project's documention via Read The Docs!

Background

The following repository is for the project code associated with the three courses: Data Science Methods for Clean Energy Research (ChemE 545) Software Engineering for Molecular Data Scientists (ChemE 546), and Molecular Data Science Capstone (ChemE 547) and at UW.

PairProphet is a project developed by Humood Alanzi, Ryan Francis, Amin Mosallenejad, Logan Roberts, and Chau Vuong.

Purpose: This package is developed to validate functionality between a pair of protein sequences.

Overview

Protein pair validation is time consuming and resource intensive, given that proteins can be related through many unique functions, both direct and inferred. Many unique softwares specialize in characterizing protein based on a few of these functions. Our pipeline aims to combine different softwares, spanning sequence alignment, structure and folding prediction, and residue conservation into a single pipeline to improve prediction quality and streamline the characterization process.

Requirments & Installation

To create and activate the environment specified in environment.yml and install the PairProphet package, do the following commands:

conda env create --file environment.yml
conda activate pairpro
pip install .

PairProphet is dependent on Python 3.11. This package requires a conda environment with external dependencies of biopython and hmmer. For a more detailed exposition on the modular/importable code, please see component docs.

Workflow

Retrieve data from Learn2Therm DB
Sample from large DB, select features, format data.

a) sampling notebook found in notebooks
Family identification with Pfam

a) examples of generating outputs from Pfam in examples
Structure-based family identification with Fatcat + other softwares
Develop model to predict protein pair functionality. Include engineered features in addition to features selected in 1).

a) examples of model development in notebooks
Report whether each protein pair is functional along with confidence metrics.

Outputs

Boolean prediction of whether protein pair is functional. Text file with confidence statistics.

Community Guidelines

Our software is open-source. We recommend submission of feature requests and report bugs. Check out our code of conduct for more information. Then, please see our contributing guidelines for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 649 Commits
.github/workflows		.github/workflows
data		data
docs		docs
examples		examples
logs		logs
notebooks		notebooks
pairpro		pairpro
scripts		scripts
testing		testing
tmp		tmp
.dockerignore		.dockerignore
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.debug.yml		docker-compose.debug.yml
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
setup.py		setup.py

License

learn2therm/PairProphet

Folders and files

Latest commit

History

Repository files navigation

PairProphet

Background

Table of contents

Overview

Requirments & Installation

Workflow

Outputs

Community Guidelines

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages