Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Neural Network package validation
Most R packages that implement neural networks of perceptron type (one input layer, one normalized layer, one hidden layer with nonlinear activation function usually tanh(), one normalized layer, one output output layer) for regression purpose (i.e. NN(X1, …, Xn) = E[Y], as opposite to classification) use very poor learning algorithm(s) and never find the global minimum of the objective function in the parameter space. Most of the time, a first order algorithm is used when neural networks, as any nonlinear function, require a second order algorithm.
In 2015, Patrice Kiener conducted a private benchmark on more than twenty R packages with a few known datasets. The result was a disaster. More than 18 packages did not converge correctly and only 2 packages found the right values. We feel that an updated and more formal evaluation should be realized and communicated to the whole R community. We therefore invite a student to apply for this new benchmark under our guidance and publish his results in one task view and in the R-Journal.
Such a benchmark has never been conducted on R packages so far. This work is acknowledged by AIM SIG who received some funding from the R-Consortium (section PSI application for collaboration to create online R package validation repository) to validate Base and Recommended R packages in the pharmaceutical field and clinical trials. Visit the pharmar website. Some connections can also be established with the histoRicalg project.
Details of your coding project
We expect a student with a sound knowledge of nonlinear regression algorithms (BFGS, Levenberg-Marquardt). The purpose of this work is (1) to benchmark 20 to 30 R packages with 3 to 5 simple datasets and (2) to write a comprehensive report on the performance of each package.
(1) A simple code to call each package, test them against the datasets and collect the results is to be written. This can lead to a meta-package to ease the benchmark procedure as well as to perform other benchmarks in the future.
(2) The biggest effort will be on writing the results in a nice report, if possible directly in the R-Journal format (which can be easily accessed through the “rticles” package). An introduction to both neural networks and optimization methods is expected.
With this work, we wish to alert R users about the varying performance of neural network packages. Users, both from academia and private companies, should be aware of the strengths and the weaknesses of the packages they use. We expect a bigger impact on package maintainers and authors and hope that such a benchmark will convince them to shift to better algorithms.
Neural networks are understood as black boxes, especially nowadays with the advent of machine learning and artifical intelligence procedures, but a minimum of care and a sound mathematical approach should be taken when writing an R package.
Students, please contact mentors below after completing at least one of the tests below.
- Patrice Kiener <firstname.lastname@example.org> is the author of the R package FatTailsR and has 18 years of experience with neural networks of perceptron type.
- Christophe Dutang <email@example.com> has authored or contributed to more than 10 packages and maintains the task views related to Probability Distributions and Extreme Value Analysis. He also had previous GSOC experience with the markovchain package in 2015 and 2016, see https://summerofcode.withgoogle.com/archive/2016/.
Students, please do one or more of the following tests before contacting the mentors above.
- Can you explain the difference between first and second order algorithms?
- Can you cite a few books on this topic? Which one have you read? understood?
- Is back-propagation necessary for neural networks?
- Is back-propagation useful for neural networks? Why?
- How many local minima can you expect after regression?
- In case of several outputs, why is it better to have one model per output?
- Give a few examples of your R code.
- Have you ever contributed to an R package? if yes, which one?
If several students have an equal score after this first serie of questions, a few other (unpublished) questions will be asked.
Solutions of tests
Students, please send your test results to Patrice Kiener <firstname.lastname@example.org>.
List your answers to the various questions in the body of the email. For the R code, one attached -.R file at a maximum or one -.7z compressed file that concatenates a few -.R (-.Rmd) files.