chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
R Update documentation Nov 29, 2018
data Feature complete and ready for CRAN. Apr 21, 2017
.Rbuildignore Add license file to .Rbuildignore Mar 2, 2018
.gitattributes Change .gitattribute file to .gitattributes Mar 9, 2018
.gitignore Add index files to .gitattribute so github detects the right language Mar 5, 2018
.travis.yml Update .travis.yml Nov 29, 2018
DESCRIPTION Update documentation Nov 29, 2018
LICENSE chemmodlab 1.0.0 Apr 22, 2017
NAMESPACE Update documentation Nov 29, 2018 Update Mar 5, 2018
README-unnamed-chunk-3-1.png Feature complete and ready for CRAN. Apr 21, 2017
README-unnamed-chunk-3-2.png Feature complete and ready for CRAN. Apr 21, 2017
README-unnamed-chunk-3-3.png Feature complete and ready for CRAN. Apr 21, 2017
README.Rmd Add badges to readme Mar 2, 2018 Add badges to readme Mar 2, 2018
chemmodlab.Rproj Applicability domain, descriptor calculation, prediction functions (i… Sep 28, 2018 Update cran comments Mar 3, 2018
index.html Add index.html files Mar 5, 2018
index.rmd Add index.html files Mar 5, 2018


Build Status CRAN

chemmodlab contains a suite of methods for fitting machine learning models and for validating the resulting models:

  • ModelTrain() fits a series of classification or regression models to sets of descriptors and computes cross-validated measures of model performance. Repeated k-fold cross validation is performed with multiple, different fold assignments for the data ("splits").

  • MakeModelDefaults() makes a list containing the default parameters for all models implemented in ModelTrain so that they can be modified.

  • plot.chemmodlab() takes a chemmodlab object output by the ModelTrain function and creates a series of accumulation curve plots for assesing model and descriptor set performance.

  • CombineSplits() evaluates a specified performance measure across all splits created by ModelTrain and conducts statistical tests to determine the best performing descriptor set and model (D-M) combinations.

  • Performance() can evaluate many performance measures across all splits created by ModelTrain, then outputs a data frame for each D-M combination.

  • chemmodlab() is the constructor for the chemmodlab object.

    The statistical methodologies comprise a comprehensive collection of approaches whose validity and utility have been accepted by experts in the Cheminformatics field. As promising new methodologies emerge from the statistical and data-mining communities, they will be incorporated into the laboratory. These methods are aimed at discovering quantitative structure-activity relationships (QSARs). However, the user can directly input their own choices of descriptors and responses, so the capability for comparing models is effectively unlimited.


# install from CRAN

# Or use the development from GitHub:
# install.packages("devtools")



cml <- ModelTrain(aid364, ids = TRUE, xcol.lengths = c(24, 147),
                  des.names = c("BurdenNumbers", "Pharmacophores"))
plot(cml, splits = 1, meths = c("NNet", "KNN"))

CombineSplits(cml, metric = "enhancement", m = 100)