chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R Update documentation Nov 29, 2018
data-raw
data Feature complete and ready for CRAN. Apr 21, 2017
man
.Rbuildignore Add license file to .Rbuildignore Mar 2, 2018
.gitattributes Change .gitattribute file to .gitattributes Mar 9, 2018
.gitignore Add index files to .gitattribute so github detects the right language Mar 5, 2018
.travis.yml Update .travis.yml Nov 29, 2018
DESCRIPTION Update documentation Nov 29, 2018
LICENSE chemmodlab 1.0.0 Apr 22, 2017
NAMESPACE Update documentation Nov 29, 2018
NEWS.md Update NEWS.md Mar 5, 2018
README-unnamed-chunk-3-1.png Feature complete and ready for CRAN. Apr 21, 2017
README-unnamed-chunk-3-2.png Feature complete and ready for CRAN. Apr 21, 2017
README-unnamed-chunk-3-3.png Feature complete and ready for CRAN. Apr 21, 2017
README-unnamed-chunk-3-4.png
README-unnamed-chunk-3-5.png
README.Rmd Add badges to readme Mar 2, 2018
README.md Add badges to readme Mar 2, 2018
chemmodlab.Rproj Applicability domain, descriptor calculation, prediction functions (i… Sep 28, 2018
cran-comments.md Update cran comments Mar 3, 2018
index.html Add index.html files Mar 5, 2018
index.rmd Add index.html files Mar 5, 2018

README.md

chemmodlab

Build Status CRAN

chemmodlab contains a suite of methods for fitting machine learning models and for validating the resulting models:

  • ModelTrain() fits a series of classification or regression models to sets of descriptors and computes cross-validated measures of model performance. Repeated k-fold cross validation is performed with multiple, different fold assignments for the data ("splits").

  • MakeModelDefaults() makes a list containing the default parameters for all models implemented in ModelTrain so that they can be modified.

  • plot.chemmodlab() takes a chemmodlab object output by the ModelTrain function and creates a series of accumulation curve plots for assesing model and descriptor set performance.

  • CombineSplits() evaluates a specified performance measure across all splits created by ModelTrain and conducts statistical tests to determine the best performing descriptor set and model (D-M) combinations.

  • Performance() can evaluate many performance measures across all splits created by ModelTrain, then outputs a data frame for each D-M combination.

  • chemmodlab() is the constructor for the chemmodlab object.

    The statistical methodologies comprise a comprehensive collection of approaches whose validity and utility have been accepted by experts in the Cheminformatics field. As promising new methodologies emerge from the statistical and data-mining communities, they will be incorporated into the laboratory. These methods are aimed at discovering quantitative structure-activity relationships (QSARs). However, the user can directly input their own choices of descriptors and responses, so the capability for comparing models is effectively unlimited.

Installation

# install from CRAN
install.packages("chemmodlab")

# Or use the development from GitHub:
# install.packages("devtools")
devtools::install_github("jrash/chemmodlab")

Usage

library(chemmodlab)

data(aid364)
cml <- ModelTrain(aid364, ids = TRUE, xcol.lengths = c(24, 147),
                  des.names = c("BurdenNumbers", "Pharmacophores"))
plot(cml, splits = 1, meths = c("NNet", "KNN"))

CombineSplits(cml, metric = "enhancement", m = 100)