A biological data mining project as part of a Masters degree at the University of Edinburgh.
There are three parts to this project. Firstly, we're trying to replicate and build on previous work in the area of protein interaction prediction. Secondly, applying various different supervised classification algorithms to this data as a machine learning exercise. Thirdly, Using this classification algorithm to build a weighted graph to see if the weighted graph can improve the performance of a community detection algorithm used at the University of Edinburgh.
For detailed information about the proposed project, look at the proposal.
Using the code
If you would like to repeat the weighting of the edges and community detection(and you have access to the data required), from start to finish:
- Run the Training and Test set generation notebook
- Run the Classifier Training notebook
- Run the Bayesian edge weighting notebook
- Run the Community Detection notebook
This assumes you have access to the extracted features. If you would like to repeat those, the notebooks required are referenced in the report's appendices.
The dedicated code that was written as part of this project.
This is mostly useful for accessing the various pre-extracted features, if you have access to the data.
Instructions on how to use it in general can be found in the ocbio.extract usage notes.
The other pieces of Python code in ocbio are not intended to be used directly - they are simply imported in
All that's required for most of this code is pylab and Scikit-learn. For a full list of all possible requirements and their versions look at the notebook on required packages
This project has completed. The report is available in the repository.