In [None]:
import numpy, pandas
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report
from skimage.color import rgb2lab, rgb2hsv
from helpers.colour import plot_predictions
import matplotlib.pyplot as plot
%matplotlib inline

# Exercise: colour words

In this exercise, we will try to convert RGB colours (which you might have seen in graphics programs or CSS) to colour words (like "red").

The data we have was collected from SFU students by having them [select a word for a randomly-chosen colour](http://cmpt732.csil.sfu.ca/colour/). The colours in the data file have each of the red/green/blue components on a scale 0&ndash;1, instead of 0&ndash;255 which you might have seen before. (The colour tools we're using expect 0&ndash;1, so this will make them happy.)

The labels we have are the [English basic colour terms](https://en.wikipedia.org/wiki/Color_term#Basic_color_terms): "black", "white", "red", "green", "yellow", "blue", "brown", "orange", "pink", "purple", and "grey".

We hope that we can predict that (1,0,0) is called "red", (0,1,0) is called "green", and many other values that are less obvious.

Here is some data.

TODO: 
* Create feature and label arrays `X` and `y`.
* Split into training and testing data.

In [None]:
data = pandas.read_csv('data/colour-data.csv')
data

## Attempt 1: Just Use RGB Colours and Naive Bayes

To start with, don't do any work on the features: just use a Gaussian naive Bayes classifier as we have before on the data to see what you get.

TODO:
* Create a model. Train it on the training data.
* Calculate an accuracy score on the testing data.
* We have provided a helper to plot the predictions being made. Try: `plot_predictions(model)`

## Attempt #2: Convert to HSV Colours in a Pipeline

You probably didn't get great results above. One of the problems: the RGB colour space isn't perfectly arranged to represent the things we call colours. It is designed for computer screens, not human eyes.

The [HSV colour](https://en.wikipedia.org/wiki/HSL_and_HSV) represents colours using values for "hue", "saturation", and "value". The value for "hue" is probably closely related to the thing called "colour" that we're trying to predict. Maybe if we **transform** to that colour space before going to the classifier, it will have more meaningful values to work with.

The function `transform_rgb2hsv` provided below will convert an array of RGB colour values to an array of HSV colour values. It's something you can use with a `FunctionTransformer` in a pipeline.

TODO:
* Create a pipeline model containing a `FunctionTransformer` and a `GaussianNB`.
* Train, test, and plot as above.

In [None]:
def transform_rgb2hsv(Xrgb):
    return rgb2hsv(Xrgb.reshape(1, -1, 3)).reshape(-1, 3)