Phosphorylation_prediction

Bioinformatics research project at Charles University. Supervisors are Marian Novotny and David Hoksza with help from Hamza Gamouh and Michael Heinzinger. I'm trying to predict phosphorylated residues from sequence and structure data Prediction algorithm is a GNN.
The data analysis part was already done in a previous master thesis from another student.

Setup: clone the github repo
Download the Anaconda Navigator
go to the environments tab and import the environment.yml\ start the anaconda command line and run this command:
pip install -U "bio-embeddings[allennlp] @ git+https://github.com/sacdallago/bio_embeddings.git"

How do i use this?

For Training: from the file 5dlt, the last residue has to be removed:
HETATM 6575 CA CA A 907
otherwise it will create a conflict with the sequence length

For Prediction:\

install the anaconda environment "environment.yml"\
Download a protein structure file. It has to be in the .pdb format. Put it in the ML_data/new_pdbs directory.\
Head to line 129 of the parse_new_pdb.py file. Change the file name to your protein file name. Provide model and chain id.\
Run emb_graph_new_pdb.py to generate embeddings and a graph
Run predictor.py to obtain the predictions in a format that is usable in PyMol

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.idea		.idea
ML_data		ML_data
code		code
phos_info		phos_info
README.md		README.md
environment.yml		environment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phosphorylation_prediction

About

Releases

Packages

Languages

kiwi2kiwi/Phosphorylation_prediction

Folders and files

Latest commit

History

Repository files navigation

Phosphorylation_prediction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages