<strong>Problem formulation:</strong>
<p>Given the coordinates and color of a set of pixels from an image, predict using 3 supervised learning classifiers  the color of any given pixel.</p>
<p>You have to choose the first classifier between naive bayes classifier and decision tree classifier and the second classifier between ANN and SVM. The third classifier is of your choice and can be any classifiers different from the other two already used.</p>

<strong>Example:</strong>
<img src = "files/data.png" width = "256" height = "256"/>
<center><strong>Figure 1</strong>

For more examples of images and a better understanding of the problem formulation, please see:
<a href="http://playground.tensorflow.org">http://playground.tensorflow.org</a>

<strong>Task to be completed:</strong>

<strong>1.</strong> Generate 2 images using Paint, similar to Figure 1, containing sets of points of two or more colors (minimum 200 points), such that in the first image the set of points should be linearly separable and in the second image, the set of points should be non-linearly separable. Save the images as "data1.png" and "data2.png".

<strong>2.</strong> Create two datasets based on the 2 images from step 1, using the code give below. Analyze and comment this code.

In [41]:
from PIL import Image
import numpy as np

def rgb_to_int(r,g,b):
    return (r<<16) + (g<<8) + b

def read_data(filename):
    x = []
    y = []
    back_color = rgb_to_int(255,255,255)
    
    image = Image.open(filename)
    width,height = image.size
    pixels = image.load()
 
    for i in range(width):
        for j in range(height):
            r,g,b = pixels[i,j]
            color = rgb_to_int(r,g,b)
            
            if (color != back_color):
                x.append([i,j])
                y.append(color)
    return x,y

In [42]:
# reate the datasets here
x1, y1 = read_data('data1.png')
x2, y2 = read_data('data2.png')

<strong>3.</strong> Split the first dataset into a training set and a test set  (using 70% for training and 30% for validation)

In [43]:
#your code here

import numpy as np
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt

# create training and testing vars


X1_train, X1_test, y1_train, y1_test = train_test_split(x1, y1, test_size=0.3)


<strong>4.</strong> Choose either MultinomialNB (naive bayes) or DecisionTreeClassifier from sklearn and train it on the training set generated in step 3.

In [44]:
# your code here (Multinomial naive bayes)

import numpy as np
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier()
clf.fit(X1_train, y1_train)

z1 = clf.predict(X1_train[2:3])
print('predicted', z1)
print('target', y1_train[2:3])


predicted [16711680]
target [16711680]


<strong>5.</strong> Use the classifier trained in step 4 to make predictions on the test set generated in step 3.

In [45]:
#your code here

w1 = clf.predict(X1_test)

<strong>6.</strong> Compute the accuracy of the classifier on the test set generated in step 3.

In [46]:
#your code here

from sklearn.metrics import accuracy_score

a1 = accuracy_score(y1_test, w1)

print(a1)

1.0


<strong>7.</strong> Compute precision and recall of the classifier on the test set generated in step 3 and save to file or display the results. Define (theoretically) precision and recall.

In [47]:
#your code here

from sklearn.metrics import classification_report 

b1 = classification_report(y1_test,w1)
print(b1)

             precision    recall  f1-score   support

    1908872       1.00      1.00      1.00      1140
   16711680       1.00      1.00      1.00      1298

avg / total       1.00      1.00      1.00      2438



<strong>8.</strong> Predict the color for all the pixels of the first image and save the predicted colors to a new image using the code below (Partial code given. Must be completed). Be able to explain the code below. 

In [48]:
image = Image.open('data1.png')
width, height = image.size

def generate_pixel_coordinates():
    points = []
    for i in range (width):
        for j in range(height):
            points.append([i,j])
            
    return points
        
def getRGBfromI(RGBint):#convert int color code to rgb color code
    blue =  RGBint & 255
    green = (RGBint >> 8) & 255
    red =   (RGBint >> 16) & 255
    return red, green, blue

def save_data(pixels, colors, output_filename):
    
    im = Image.new("RGB", (width, height))
    pix = im.load()
    for i in range(len(pixels)):
             pix[pixels[i][0],pixels[i][1]] = getRGBfromI(colors[i])

    im.save(output_filename, "PNG")    

In [49]:
#your code here 

g1 = generate_pixel_coordinates()

z2 = clf.predict(g1)

save_data(g1,z2,'results2.png')

<img src = "files/data1.png" width = "256" height = "256"/>
<img src = "files/results2.png" width = "256" height = "256"/>

<strong>9.</strong> Repeat the steps 3-8, using in the step 3, 5-fold cross-validation for splitting the data in training and test sets and compute the cross-validation accurracy and the mean accurracy. Report results for all runs and compare them

In [50]:
#your code here



<strong>10.</strong> Repeat steps 3-8 for the second classifier (chosen between ANN and SVM).

<strong>11.</strong> From scikit-learn study the documentation for the second classifier chosen, select two representative hyperparameters and repeat at least 2 times steps 4-8 for different values of these hyperparameters. Report results for all runs and compare them.

<strong>12.</strong>Use grid search cross validation for optimizing the hyperparameters of the classifiers. Report the optimal parameters given by the search.  Predict the color for all the pixels of the first image using the model with optimal parameters.

<strong>13.</strong> Repeat all the steps from above for the second image (using the same classifiers). Compare the results obtained with the same classifier respectively for the linear and non-linear cases.

<strong>14.</strong> Use the third classifier for predicting the color for all pixels of the first and second image. Compare the results obtained for the linear and non-linear cases. Compare the results with those of the other two classifiers used.

<strong>15.</strong> For the assessment, you must complete all the required steps from above, report and compare the results, make a short presentation of the chosen classifiers  (directly in the Jupyter Notebook) and answer questions related to the presentation. 
<p>For better understanding on how the results must be represented see the Fig. 2</p>
<img src="files/Example.PNG" width="1024" height="1024"/>
<center><strong>Figure 2</strong>