<a href="https://colab.research.google.com/github/guptaankit894/AAIM/blob/main/google_colab_files/machine_learning_algorithms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Programming Exercise**

This Exercise will provide a gist of various machine learning methods. It will follow the same order as in slides.

1. Supervised Learning Methods (Decision Tree, Random Forest, k-Nearest Neighbour, and Support Vector Machines).

2. Semi Suspervised Methods (Support Vector Machines).

3. Unsupervised Methods (k-means clustering).

4. Regression

**A customized Function**

First let's create a customized function, which allows you to visualize the decision boundary. For doing so, we will only use two features, and labels instead of all features. This function will require following  libraries **Matplotlib.pyplot**, **sklearn**, and **numpy**.  

In [None]:
from sklearn.inspection import DecisionBoundaryDisplay
import matplotlib.pyplot as plt
import numpy as np

# few parameters for plotting
plot_colors="ryb" # Red, yellow, blue


def plot_decision_boundary(clf, X, y, ax, xlabel, ylabel, n_classes, labels, plot_colors):
  DecisionBoundaryDisplay.from_estimator(
        clf,
        X,
        cmap=plt.cm.RdYlBu,
        response_method="predict",
        ax=ax,
        xlabel=xlabel,
        ylabel=ylabel,
    )
  # Plot the training points
  for i, color in zip(range(n_classes), plot_colors):
    idx = np.where(y == i)
    plt.scatter(X[idx, 0], X[idx, 1], c=color, label=labels[i],edgecolor="black",s=15)

In [None]:
# load dataset, and split data for classification
from sklearn.datasets import load_iris  # use only 2 features (1:3)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score  # performance metric for classification
iris=load_iris()
X = iris.data[:, 1:3]
y = iris.target
pair=[1,3]
X_train, X_test, Y_train, Y_test=train_test_split(X,y, test_size=0.3)

**Decision Trees**

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
model = DecisionTreeClassifier()
model.fit(X_train, Y_train)

ax=plt.subplot(1,1,1)

xlabel=iris.feature_names[pair[0]],
ylabel=iris.feature_names[pair[1]],


plot_decision_boundary(model, X_test, Y_test, ax,xlabel, ylabel, 3, iris.target_names, plot_colors)

In [None]:
accuracy_score(model.predict(X_test), Y_test)

In [None]:
from sklearn import metrics
confusion_matrix = metrics.confusion_matrix(Y_test, model.predict(X_test))

cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix = confusion_matrix, display_labels = [0, 1, 2])

cm_display.plot()
plt.show()

In [None]:
# Plot tree
plt.figure(figsize=(20, 20))
tree.plot_tree(model, feature_names=iris.feature_names, fontsize=14)

**Random Forest**

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn import tree
model = RandomForestClassifier()
model.fit(X_train, Y_train)

ax=plt.subplot(1,1,1)
pair=[1,3]
xlabel=iris.feature_names[pair[0]],
ylabel=iris.feature_names[pair[1]],


plot_decision_boundary(model, X_test, Y_test, ax,xlabel, ylabel, 3, iris.target_names, plot_colors)

In [None]:
# Tree plotting
plt.figure(figsize=(20,20))
tree.plot_tree(model.estimators_[0])

**k-Nearest Neigbour**

In [None]:
from sklearn.neighbors import KNeighborsClassifier
model=KNeighborsClassifier(3)
model.fit(X_train,Y_train)

ax=plt.subplot(1,1,1)
pair=[1,3]
xlabel=iris.feature_names[pair[0]],
ylabel=iris.feature_names[pair[1]],


plot_decision_boundary(model, X_test, Y_test, ax,xlabel, ylabel, 3, iris.target_names, plot_colors)

In [None]:
accuracy_score(model.predict(X_test), Y_test)

**Naive Bayes**

In [None]:
from sklearn.naive_bayes import GaussianNB
model=GaussianNB()

model.fit(X_train, Y_train)

ax=plt.subplot(1,1,1)
pair=[1,3]
xlabel=iris.feature_names[pair[0]],
ylabel=iris.feature_names[pair[1]],


plot_decision_boundary(model, X_test, Y_test, ax,xlabel, ylabel, 3, iris.target_names, plot_colors)

In [None]:
accuracy_score(model.predict(X_test),Y_test)

**Semi-supervised Learning**
SVM using linear kernel

In [None]:
from sklearn import svm
model=svm.SVC()  # when no kernel is specified, we are using linear SVM
model.fit(X_train, Y_train)

ax=plt.subplot(1,1,1)
pair=[1,3]
xlabel=iris.feature_names[pair[0]],
ylabel=iris.feature_names[pair[1]],


plot_decision_boundary(model, X_test, Y_test, ax,xlabel, ylabel, 3, iris.target_names, plot_colors)

In [None]:
accuracy_score(model.predict(X_test),Y_test)

**Unsupervised Learning**
k-means clustering

In [None]:
from sklearn.cluster import KMeans
model=KMeans(3)
model.fit(X_train)

ax=plt.subplot(1,1,1)
pair=[1,3]
xlabel=iris.feature_names[pair[0]],
ylabel=iris.feature_names[pair[1]],


plot_decision_boundary(model, X_test, Y_test, ax,xlabel, ylabel, 3, iris.target_names, plot_colors)

**Regression Task**

For regression Task, we will use R-squared **(Coefficient of Determination)**, and **housing dataset** for prediction.

For plotting the fitting line, we will use seaborn library. You can install it using the following pip command:

**pip install seaborn**

**Linear Regression**

In [None]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
import seaborn as sns
from sklearn.metrics import r2_score # Performance metric for regression
diabetes=load_diabetes()
X = diabetes.data[:,2]
y = diabetes.target

X_train, X_test, Y_train, Y_test=train_test_split(X,y, test_size=0.3)
X_train=X_train.reshape(-1,1)
X_test=X_test.reshape(-1,1)

In [None]:
model=LinearRegression()
model.fit(X_train, Y_train)
y_pred = model.predict(X_test)


In [None]:
plt.scatter(X_test, Y_test, color="black")
plt.plot(X_test, y_pred, color="blue", linewidth=3)

In [None]:
r2_score(y_pred, Y_test)

**Polynomial Regression**

For this we need to use a function  **PolynomialFeatures** from **sklearn.preprocessing**

In [None]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score # Performance metric for regression
diabetes=load_diabetes()
X = diabetes.data[:,3]
y = diabetes.target

X_train, X_test, Y_train, Y_test=train_test_split(X,y, test_size=0.3)
X_train=X_train.reshape(-1,1)
X_test=X_test.reshape(-1,1)

In [None]:
poly = PolynomialFeatures(degree=2, include_bias=False)

In [None]:
poly_X_train = poly.fit_transform(X_train.reshape(-1, 1))
poly_X_test = poly.fit_transform(X_test.reshape(-1, 1))

In [None]:
model=LinearRegression()
model.fit(poly_X_train, Y_train)
y_pred=model.predict(poly_X_test)

In [None]:
plt.scatter(X_test[10:20], Y_test[10:20], color="black")
plt.plot(X_test[10:20], y_pred[10:20], color="blue", linewidth=3)

In [None]:
r2_score(y_pred, Y_test)