-
Notifications
You must be signed in to change notification settings - Fork 3
Prediction and characterization of human ageing-related proteins by using machine learning. Related publication: https://doi.org/10.1038/s41598-018-22240-w
kerepesi/aging_ml
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
******************************************************************************************* Feature tables and main scripts of the paper "Prediction and characterization of human ageing-related proteins by using machine learning" ******************************************************************************************** Citation of the paper: Kerepesi C, Daróczy B, Sturm Á, Vellai T, Benczúr A Prediction and characterization of human ageing-related proteins by using machine learning Scientific Reports, Vol 8, 4094 (2018). Requirements (versions used by us in parenthesis): - Linux (Ubuntu 16.04.3 LTS) - Python 3.5.2 - Python packages: pandas (0.20.3), numpy (1.13.1), sklearn (0.19.0) Running commands: - Running 20 experiments of XGBoost with 5 fold CV (predictions are averaged, parameters: max_depth=1, n_est=50, n_exp=20) $ python XGBoost_CV.py 1 50 20 Final_32_features.csv - Dumping trees of the XGBoost model (max_d=1, n_est=50, inputfile=Final_feature_table.csv): $ python TreeDumper.py 1 50 Final_32_features.csv Description of the files (for more information please see Methods of the paper): - aging_labels.csv: Labels of the classification ('1' if the given protein included in GenAge database, 0 otherwise) - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv.zip: Gene Ontology features with ancestors and with ageing GOs. - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv-woAgingGOs.csv.zip Gene Ontology features with ancestors and without ageing GOs ( called 'GO' in Table 4 of the paper). - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv-woAgingGOs.csv-f_sel_based_on_GO_stats.py-thr1000.csv.zip Feature set containing only the GO features that occur in least 1000 proteins (called 'Frequent GOs' in Table 5 of the paper). - uniprot_sprot_human.dat-InteractionDigger.py.txt-CreateNetworkFT_paired.py.csv-join_CytoscapeStats.csv: PPI Network features. - RNA_CoEx_vs_GenAge_FT.csv: Co-expression features. - Final_32_features.csv: Table of the final 32 features (selected by XGBoost). - Final_32_features.csv-XGBoost_CV_preds-n_est50-exp20.csv: Output file of the command 'python XGBoost_CV.py 1 50 20 Final_32_features.csv'. - Final_32_features.csv_Trees-n_est50-max_d1.txt: Output file of the command 'python TreeDumper.py 1 50 Final_32_features.csv'.
About
Prediction and characterization of human ageing-related proteins by using machine learning. Related publication: https://doi.org/10.1038/s41598-018-22240-w
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published