Skip to content

genepattern/RandomForest.GPU

Repository files navigation

Random Forest GPU

Omar Halawa (ohalawa@ucsd.edu) of the GenePattern Team @ Mesirov Lab - UCSD


The following repository is a GenePattern module written in Python 3, using the following Singularity container.

It performs random forest classification, a machine learning algorithm that is an ensemble of decision trees, through either: cross-validation (takes one dataset as input) or test-train prediction (takes two datasets, test and train). Each dataset consists of two file inputs, one for feature data (.gct), and one for target data (.cls). It processes files and performs classification via RAPIDS.ai's RandomForestClassifier, generating a prediction results file (.pred.odf) which the "true" class to the model's prediction and outputting a feature importance file (.feat.odf) in the case of test-train prediction with a training dataset. The module also supports importing and exporting trained models. Created for GenePattern module usage through optional arguments for classifier parameters.

Documentation on usage and implementation is found here. A detailed step-by-step explanation behind how the Random Forest algorithm works is found here. All source files, including cross-validation runs for all_aml_train (.gct, .cls), BRCA_HUGO (.gct, .cls), and iris (.gct, .cls) datasets as well as a test-train run with all_aml_test (.gct, .cls) and all_aml_train (.gct, .cls), all with output examples ("examples," as the classifier utilizes randomness, so each run likely varies) are available for better reproducibility and portability. However, to see how randomness could potentially be "reproduced," read this.

This module is a GPU-backed implementation which may be faster compared to the original RandomForest module.

About

Random Forest GPU, GenePattern - Mesirov Lab, UCSD

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors