# Simple classifier comparison
The cells below illustrate the behavior of several different classifiers using three different kinds of 2-dimensional data sets:
 1. linearly separable data: should be easy for most classifiers if there are minimal overlaps between the classes
 2. overlapping crescents: should be difficult for any strictly linear models but feasible for many others
 3. concentric circles: pretty much impossible for strictly linear models, but other models fare better
 
The code has been shamelessly copied from the [scikit-learn website](https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html), which provides brief summaries for each of the models explored here, along with some excellent demonstration scripts such as the one below.
 
## Model codes
To make the command-line interface easy to use, the script uses codes to identify each type of classification model.  These are `K, L, R, P, D, F, N, A, B, Q` and `G`.  They map to classifier models as follows:

In [1]:
from comparison_utils import *

# Show the models available:
for c in MODEL_CODES:
    print('  {} = {}'.format(c, MODEL_NAME[c]))

  K = k-NN
  L = Linear SVM
  R = RBF SVM
  P = Gauss. Proc.
  D = Dec. Tree
  F = RF
  N = Neural Net
  A = AdaBoost
  B = Naive Bayes
  Q = QDA
  G = Grad. Boost


In [None]:
# To keep things manageable for the first run, we restrict our set to just
# five classifiers: k-nearest-neighbors (k-NN), linear SVM, decision tree, random forest (RF) and naive bayes.
# Once you've seen this all the way through at least once, try some other codes from the list above
# to see how they perform.  As you learn more about the models themselves, try to figure out
# why some perform better than others.
# All codes: KLRPDFNABQG
selected_codes = 'KLDFB'
selected_models = [MODEL_NAME[c] for c in selected_codes]

# Set up the classifiers
classifiers = {}
for name in selected_models:
    print('  instantiating {}'.format(name))
    classifiers[name] = model_factory(name)

In [None]:
# Generate three datasets for testing with 100 data points each (random seed 1)
# Try using different dataset sizes too: 10, 1000, 1000000 ... but beware that large
# datasets will take longer to train and test.
datasets = generate_datasets(100, 1)

print('Created the following datasets:')
for name in datasets:
    print('  {}'.format(name))

In [None]:
# This is really the meat of the script: it runs all the classifiers and render their results.
# The final output will not appear until all classifiers have been trained and tested on all
# data sets.  If you use a large number of classifiers and a large dataset, you can expect to wait
# several minutes or more to see results.  For this reason we have verbose turned on.
compare_classifiers(datasets, classifiers, 1, output=None, verbose=True)