# Simple classifier comparison
The cells below illustrate the behavior of several different classifiers using three different kinds of 2-dimensional data sets.  The code has been shamelessly copied from the [scikit-learn website](https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html).  In addition to excellent demonstration scripts such as this one, the scikit-learn site also provides nice summaries for each of the models explored here.
 
## Model codes
To make the command-line interface easy to use, the script uses codes to identify each type of classification model.  These are `K, L, R, P, D, F, N, A, B, Q` and `G`.  Run the following cell to see how they map to classifier models:

In [None]:
from comparison_utils import *

# Show the models available:
for c in MODEL_CODES:
    print('  {} = {}'.format(c, MODEL_NAME[c]))

In [None]:
# To keep things manageable for the first run, we restrict our set to just five classifiers:
# k-nearest-neighbors (k-NN), linear SVM, decision tree, random forest (RF) and naive bayes.
# Once you've run all cells through at least once, try some other codes from the list above
# to see how they perform.  As you learn more about the models themselves, try to figure out
# why some perform better than others.
# All codes: KLRPDFNABQG
selected_codes = 'KLDFB'
selected_models = [MODEL_NAME[c] for c in selected_codes]

# Set up the classifiers
classifiers = {}
for name in selected_models:
    print('  instantiating {}'.format(name))
    classifiers[name] = model_factory(name)

## Simulated data
The following cell generates three different kinds of 2-dimensional data sets:
 1. **linearly separable** data: should be easy for most classifiers if there are minimal overlaps between the classes
 2. **overlapping crescents**: should be difficult for any strictly linear models but feasible for many others
 3. **concentric circles**: pretty much impossible for strictly linear models, but other models fare better


In [None]:
# Generate three datasets for testing with 100 data points each (random seed 1)
# Try using different dataset sizes too: 10, 1000, ... but beware that large
# datasets will take longer to train, test and render.
datasets = generate_datasets(100, 1)

print('Created the following datasets:')
for name in datasets:
    print('  {}'.format(name))

## Testing the classifiers
This final cell runs each of the classifiers selected above on the three datasets.
Each time, it will train the model on a subset of the data and test the model on remaining data.
It will also produce a scatter plot showing the test data as a scatter plot, along with contours
that depict the decision boundaries for the model.

In [None]:
# This is really the meat of the script: it runs all the classifiers and renders their results.
# The final output will not appear until all classifiers have been trained and tested on all
# data sets.  If you use a large number of classifiers and a large dataset, you can expect to wait
# several minutes or more to see results.  For this reason we have verbose turned on.
compare_classifiers(datasets, classifiers, 1, output=None, verbose=True)