GitHub - matteo-bastico/DrOGA

DrOGA is a benchmark to predict the driver-status of somatic non-synonymous DNA mutations.
Explore the docs »

Report Bug · Request Feature

Table of Contents

About The Project
- Citation
- Built With
Usage
- Dataset
- Training
- Testing
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

Citation

Our paper is available in

  @article{
    
  }

(back to top)

Built With

Our released implementation is tested on:

Ubuntu 20.04
Python 3.9.7

(back to top)

Usage

Dataset

Our documented dataset for training and testing can be downloaded . After downloading and extracting the files into the Data folder, you will get a data structure as follows:

  Data
  ├── train.csv	# Dataset for training
  ├── test.csv	# Dataset for testing
  └── SupplemetaryMaterial1.xlsx # Complete documentation of the dataset

Training

Training traditional ML models

Traditional ML models are trained with the data obtained in previous step and contained in "Data" folder. The approach followed uses Random Search for hyperparameter optimization, considering the following models and parameters:

Model	Parameter	Set of values
Logistic Regression	penalty	'l1', 'l2', 'elasticnet', 'none'
	C	100, 10, 1.0, 0.1, 0.01
	tol	1e-3, 1e-4, 1e-5
SVM	C	0.01, 0.1, 1, 10, 100
	kernel	'poly', 'rbf', 'sigmoid', 'linear'
	gamma	10, 1, 0.1, 0.01, 0.001, 0.0001, 'scale', 'auto'
	tol	1e-3, 1e-4, 1e-2
Decision Tree	max_depth	2-20
	criterion	'gini', 'entropy'
Random Forest	max_depth	5, 10, 'None'
	criterion	'gini', 'entropy'
	max_features	'auto', 'log2', 'None'
	n_estimators	100, 200, 300, 400, 500, 600, 1000, 2000, 3000, 4000, 5000, 6000
	bootstrap	True, False
XGBoost	n_estimators	100, 500, 1000, 2000
	learning rate	0.001, 0.01, 0.05, 0.1, 0.3, 0.5
	max_depth	5-20, 'None'
	booster	'gbtree', 'gblinear', 'dart'
	reg_alpha	1, 0.1, 0.01, 0.001, 0
	reg_lambda	1, 0.1, 0.01, 0.001, 0

For training the models, these parameters can be modified directly from the source code, and later run by:

python train_traditional_ml.py

Training DL models

Deep Learning models are also trained with the data obtained in previous step and contained in "Data" folder. The approach followed uses Ray for hyperparameter optimization, considering the following models and parameters:

Model	Parameter	Set of values	Parameter	Set of values
Deep Multi-Layer Perceptron	number of layers	1, 3, 5, 7	lr	1e-8-1e-6
	starting exponent	4, 5, 6, 7	weight decay	0, 0.1
	alpha	1.00-3.00	batch size	32, 64, 128, 256
	gamma	1.0-4.0	warm up steps	0-100
Convolutional Neural Network	number of filters 1	8, 16	alpha	1.00-3.00
	number of filters 2	32, 64	gamma	1.0-4.0
	number of neurons 1	512, 256	lr	1e-8-1e-6
	number of neurons 2	256, 128, 64	weight decay	0, 0.1
	number of neurons 3	64, 32	batch size	32, 64, 128, 256
	number of filters skip	32, 64	warm up steps	0-100
	number of neurons skip	4, 8, 16

For training the models, first you need to select which model you want to train from the 3 available:

MLP: 'mlp'
CNN: 'cnn'
CNN with skip connections: 'cnn-skip'

these parameters can be modified directly from the source code. An example of use for training CNN model: weights

python train_dl.py --model cnn

Testing

Testing pre-trained algorithms is also available to check results provided in our publication. In order to test all traditional ML and DL models, download weights here and add them to the folder as follows:

  weights
  ├── CNN	          # Weights and parameters of CNN
  │    └── ...	  
  ├── CNN_SKIP	          # Weights and parameters of CNN with skip-connections
  │    └── ...	  
  ├── MLP	          # Weights and parameters of MLP
  │    └── ...	  
  ├── DecisionTree.h5     # Weights of Decision Tree
  ├── Logistic.h5	  # Weights of Logistic Classification
  ├── RF.h5	          # Weights of Random Forest
  ├── SVM.h5	          # Weights of Support Vector Machine
  └── XGB.h5              # Weights of XGB

These models can be tested together obtaining metrics regarding accuracy, precision, recall and F1 over the test slit of our dataset. It is recommended to use GPU to accelerate testing process, but CPU is set automatically if there is not CUDA device found.

python test.py

(back to top)

Roadmap

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/my_feature)
Commit your Changes (git commit -m 'Add my_feature')
Push to the Branch (git push origin feature/my_feature)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Matteo Bastico - @matteobastico - matteo.bastico@gmail.com

Project Link: https://github.com/matteo-bastico/DrOGA

(back to top)

Acknowledgments

This work was supported by the H2020 European Project: GenoMed4ALL https://genomed4all.eu/ web Grant no. 101017549. The authors are with the Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, Spain (e-mail: mab@gatv.ssr.upm.es, afg@gatv.ssr.upm.es, abh@gatv.ssr.upm.es, sum@gatv.ssr.upm.es).

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Data		Data
datasets		datasets
img		img
models		models
utils		utils
LICENSE.txt		LICENSE.txt
README.md		README.md
test.py		test.py
train_dl.py		train_dl.py
train_traditional_ml.py		train_traditional_ml.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About The Project

Citation

Built With

Usage

Dataset

Training

Training traditional ML models

Training DL models

Testing

Roadmap

Contributing

License

Contact

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

matteo-bastico/DrOGA

Folders and files

Latest commit

History

Repository files navigation

About The Project

Citation

Built With

Usage

Dataset

Training

Training traditional ML models

Training DL models

Testing

Roadmap

Contributing

License

Contact

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages