<a href="https://colab.research.google.com/github/ronbalanay/MAT-422/blob/main/MAT422%20HW3.5%2C3.6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3.5 K-Means

We implement the k-means clustering algorithm to partition a set of observations into *k* clusters, with each observation assigned to the nearest cluster center. This algorithm minimizes the within-cluster sum of squares (WCSS) to ensure observations in each cluster are closely grouped around their centroid. We start by selecting kk random points as initial centroids, then assign each observation to its nearest centroid using Euclidean distance. Next, we recompute the centroid of each cluster by calculating the mean of points within it. These steps of reassignment and recomputation are repeated until convergence, when no observations switch clusters between iterations. Our function outputs the final cluster assignments, centroids, and the minimized WCSS values, demonstrating the algorithm's effectiveness in achieving local optimality for clustering.

This code partitions the data into clusters based on proximity to centroids, iteratively updating until convergence. The function returns and prints the data, cluster assignments, final centroids, and minimized WCSS. This demonstrates k-means clustering’s capacity to group data points by minimizing distances to cluster centers.

In [1]:
import numpy as np

def kmeans(data, k, max_iters=100, tolerance=1e-4):
    # randomly initialize k centroids from data points
    centroids = data[np.random.choice(data.shape[0], k, replace=False)]

    for iteration in range(max_iters):
        # assign each point to the nearest centroid
        distances = np.linalg.norm(data[:, np.newaxis] - centroids, axis=2)
        clusters = np.argmin(distances, axis=1)

        # calculate new centroids by averaging points in each cluster
        new_centroids = np.array([data[clusters == i].mean(axis=0) for i in range(k)])

        # check for convergence by evaluating centroid change
        if np.all(np.abs(new_centroids - centroids) < tolerance):
            break
        centroids = new_centroids

    # calculate final within-cluster sum of squares (WCSS)
    wcss = sum(np.sum((data[clusters == i] - centroids[i]) ** 2) for i in range(k))

    print("Data:", data)
    print("Cluster Assignments:", clusters)
    print("Centroids:", centroids)
    print("Final WCSS:", wcss)
    return clusters, centroids, wcss

# example usage with random data
data = np.random.rand(100, 2)
kmeans(data, k=3)


Data: [[0.17815032 0.16649381]
 [0.26395108 0.09592343]
 [0.0489664  0.70168597]
 [0.39328807 0.19526838]
 [0.0786828  0.45932218]
 [0.46469784 0.69805083]
 [0.70609367 0.45976031]
 [0.76773662 0.73898506]
 [0.83185063 0.33068694]
 [0.36928968 0.04275911]
 [0.13435237 0.77124807]
 [0.87680827 0.56379172]
 [0.74995786 0.34686866]
 [0.12162982 0.52562682]
 [0.68830562 0.7706164 ]
 [0.35481756 0.96936797]
 [0.76970986 0.13308975]
 [0.68582971 0.2483069 ]
 [0.55575035 0.43682401]
 [0.99281436 0.83198675]
 [0.90062359 0.02362868]
 [0.57813343 0.71573626]
 [0.94034646 0.44281004]
 [0.30550552 0.0907058 ]
 [0.96180695 0.42356202]
 [0.52879098 0.18129932]
 [0.18197054 0.61245402]
 [0.98562424 0.32168385]
 [0.56411705 0.08023373]
 [0.57295438 0.43803939]
 [0.71552084 0.27692609]
 [0.65778979 0.98413155]
 [0.25083718 0.83160855]
 [0.32760401 0.06460335]
 [0.0781543  0.65326539]
 [0.36837947 0.23884985]
 [0.45488507 0.16857916]
 [0.7134972  0.54057113]
 [0.93834854 0.76852562]
 [0.93260223 0.4224

(array([2, 2, 2, 2, 2, 0, 1, 0, 1, 2, 0, 1, 1, 2, 0, 0, 1, 1, 1, 0, 1, 0,
        1, 2, 1, 1, 2, 1, 1, 1, 1, 0, 0, 2, 2, 2, 2, 1, 0, 1, 0, 1, 1, 1,
        1, 0, 2, 2, 2, 0, 1, 1, 1, 1, 1, 2, 0, 2, 0, 0, 2, 2, 1, 2, 1, 2,
        2, 2, 0, 0, 2, 2, 1, 2, 0, 0, 0, 2, 0, 1, 2, 2, 0, 2, 1, 1, 2, 1,
        0, 0, 2, 1, 1, 0, 1, 0, 0, 2, 2, 0]),
 array([[0.55946568, 0.8186213 ],
        [0.7551256 , 0.31388377],
        [0.2258632 , 0.31240088]]),
 6.056855454334686)

#3.6 Support Vector Machine
We implement a basic Support Vector Machine (SVM) for binary classification using stochastic gradient descent (SGD). The SVM model aims to find an optimal hyperplane that separates two classes with the maximum possible margin. We begin by initializing the SVM with a learning rate, a regularization parameter (lambda) to control the trade-off between maximizing margin size and correctly classifying data points, and the number of iterations for training. The fit function iteratively updates the weight vector and bias term based on whether each data point lies on the correct side of the margin boundary. For each training point, if it satisfies the condition yi(w⋅xi−b)≥1 (meaning it’s correctly classified), only the regularization term influences the weight update. If misclassified, however, both the regularization term and the data point’s contribution adjust the weights and bias to help align the point with the desired class boundary. Finally, in the predict function, the model computes the sign of each test point’s distance from the hyperplane to classify it. This approach can be easily extended to more complex datasets, where tuning the learning rate, lambda, and iterations helps the model generalize and reach an optimal separation boundary.

In [2]:
import numpy as np

class SupportVectorMachine:
    def __init__(self, learning_rate=0.001, lambda_param=0.01, n_iters=1000):
        # initializing learning rate, lambda for regularization, and number of iterations
        self.learning_rate = learning_rate
        self.lambda_param = lambda_param
        self.n_iters = n_iters
        self.w = None
        self.b = None

    def fit(self, X, y):
        # initializing weights and bias, number of samples, and features
        n_samples, n_features = X.shape
        self.w = np.zeros(n_features)
        self.b = 0

        # gradient descent optimization
        for _ in range(self.n_iters):
            for idx, x_i in enumerate(X):
                condition = y[idx] * (np.dot(x_i, self.w) - self.b) >= 1
                if condition:
                    # update weights and bias for correctly classified samples
                    self.w -= self.learning_rate * (2 * self.lambda_param * self.w)
                else:
                    # update weights and bias for misclassified samples
                    self.w -= self.learning_rate * (2 * self.lambda_param * self.w - np.dot(x_i, y[idx]))
                    self.b -= self.learning_rate * y[idx]

    def predict(self, X):
        # predict the class by finding the sign of the distance from the hyperplane
        linear_output = np.dot(X, self.w) - self.b
        return np.sign(linear_output)


# example usage:
# let's create some data and train an SVM model with it
if __name__ == "__main__":
    # creating a dataset
    X = np.array([
        [1, 2],
        [2, 3],
        [3, 3],
        [2, 1],
        [3, 2]
    ])
    y = np.array([1, 1, 1, -1, -1])  # labels must be either 1 or -1

    # initializing and training the SVM
    svm = SupportVectorMachine(learning_rate=0.001, lambda_param=0.01, n_iters=1000)
    svm.fit(X, y)

    # making predictions
    predictions = svm.predict(X)
    print("Predictions:", predictions)

    # printing the model parameters
    print("Weights:", svm.w)
    print("Bias:", svm.b)


Predictions: [ 1.  1.  1. -1. -1.]
Weights: [-1.01430102  1.25080626]
Bias: 0.23700000000000018
