<a href="https://colab.research.google.com/github/mdaugherity/MachineLearning2023/blob/main/Tutorial_5_A_Classifier_Class.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets

# Tutorial 5 - A Classifier Class
Dr. Daugherity, PHYS 453

VERY soon we will start using sklearn's built-in classifiers.  To make sure we understand what is going on under the hood, we will turn our homework 1 solution into a more user-friendly class.

**Classes for Novice Programmers**

The goal is to make a single package that contains our data with functions to act on it with an obvious interface.  In addition to the initialization function where we set the parameters, every sklearn classifier also has:


*  fit(X,y) - trains the classifier on features X that represent target y
*  predict(y) - uses the trained classifier to predict the target of the given features

The goal of this tutorial is to implement these functions for our lousy 1D classifier.

# Python Class Example
Python is a beautifully designed, clean, and simple language.  One thing that looks a bit ugly and feels clunky is in defining classes.  Remember that these design choices are based on removing any possible ambiguity about which variable we are using.

How to write python classes:
*   any variable saved into the class must start with a ```self.```  For example use ```self.x``` instead of ```x```  
*   functions must also take ```self``` as the first parameter
*   to initialize the object define a function called ```___init___```

An example will help




In [None]:
class example:

  def __init__(self,a=0,b=0):    # initialization function that runs automatically when we make an object 
    self.a = a
    self.b = b

  def set(self,a,b):
    self.a = a
    self.b = b
  
  def show(self):
    print(self.a,self.b)

x = example()  # make an object of our class
x.show()
x.set(1,2)
x.show()

0 0
1 2


In [None]:
y = example()
y.show()

0 0


In [None]:
y.b

0

## Problem Description
Use **brutal force** to find a single decision boundary threshold for a one feature classifier.  


## Solution Method
Plan is to write two functions:
1. Given a feature number and a threshold, calculate the accuracy in using this threshold to classify species 0 from species 1.  **You can make the (sometimes terrible) assumption that $ X < thresh $ is always species 0**
1. Use brute force to try lots and lots of possible thresholds and find the best one.
 

# CREATE A CLASS

In [None]:
class bfc:
  """Our 1D BRUTE FORCE CLASSIFIER!!!!!!"""""
  def __init__(self):
    self.thresh = 0

  def find_accuracy(self,X, thresh):
    """Finds the accuracy using features X, targets y, and a threshold""" 
    count = 0
    for i in range(100):
      if X[i] < thresh:
        pred = 0
      else:
        pred = 1
      
      if y[i] == pred:
        count +=1
      #print(f"{X[i,3]} \t {thresh} \t {pred} \t {y[i]}")
    return count

  def fit(self,X,y):
    """Finds the threshold of features in X with targets in y"""
    thresholds = np.linspace(X.min(), X.max(),num=1000 )

    A = np.zeros_like(thresholds)

    for i in range(len(thresholds)):
      thresh = thresholds[i]
      A[i] = self.find_accuracy(X, thresh)

    self.thresh = thresholds[ A.argmax()]  # best threshold
    return 

  def predict(self,X):
    y = np.zeros_like(X) 
    for i in range(len(X)):
      if X[i] < self.thresh:
        y[i] = 0
      else:
        y[i] = 1
    return y


## Input


In [None]:
Xall, yall = datasets.load_iris(return_X_y=True)
X = Xall[yall<2]  # get rid of species 2
y = yall[yall<2]

## Analysis
Do all the work here


In [None]:
clf = bfc()
clf.fit(X[:,0],y)
print(clf.thresh)

5.402702702702703


In [None]:
# How well did we do on the training data?
from sklearn.metrics import classification_report
y_pred = clf.predict(X[:,0])
print(classification_report(y,y_pred))

              precision    recall  f1-score   support

           0       0.88      0.90      0.89        50
           1       0.90      0.88      0.89        50

    accuracy                           0.89       100
   macro avg       0.89      0.89      0.89       100
weighted avg       0.89      0.89      0.89       100



In [None]:
# Predict new samples
X_pred = np.array([3,4,4.5,7,8,212345])
y_pred = clf.predict(X_pred)
print(y_pred)

[0. 0. 0. 1. 1. 1.]
