Skip to content

gbganalyst/NFLT-journal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NFLT journal repository

This repository contains binary class, multi-class, and regression datasets alongside the R scripts to show empirically that the no free lunch theorem (NFLT) of statistical machine learning is indeed valid for every learning problem.

Abstract

In this paper, we provide a substantial empirical demonstration of the statistical machine learning result known as the No Free Lunch Theorem (NFLT). We specifically compare the predictive performances of a wide variety of machine learning algorithms/methods on a wide variety of qualitatively and quantitatively different datasets. Our research work conclusively demonstrates a great evidence in favor of the NFLT by using an overall ranking of methods and their corresponding learning machines, revealing in effect that none of the learning machines considered predictively outperforms all the other machines on all the widely different datasets analyzed. It is noteworthy however that while evidence from various datasets and methods support the NFLT somewhat emphatically, some learning machines like Random Forest, Adaptive Boosting, and Support Vector Machines (SVM) appear to emerge as methods with the overall tendency to yield predictive performances almost always among the best.

Keywords: Learning Machine, Generalization, Bayes Risk, Predictive Performance, No Free Lunch Theorem (NFLT), Empirical Evidence, Statistical Learning, Data science, Dataset, Function Space, Random Split, Score Function.

Implementation

For the purpose of showing tangible practical evidence that the NFLT is indeed valid, we trained fifteen (15) different models chosen from linear and/or non-linear, parametric and/or non-parametric on different binary class, multi-class, and regression datasets by using 80% of data and the remaining 20% for model performance. It was seen from the evaluation (misclassification rate) of the models using the test set that the performance of each learning model is different for various datasets that were involved.