<a href="https://www.kaggle.com/code/naychilynn/linear-regression-k-nearest-neighbor-with-python?scriptVersionId=165327444" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

Below is an example case-study of a Python code focusing on  fundamental implementing of machine learning algorithms, and practice in Python programming efficiently.

At the end of this notebook, you would be able to understand concepts in two machine learning algorithms, evaluation metrics, and use libraries like NumPy effectively. 


The datasets provided from scikit-learn library are applied here in training and evaluating the implemented algorithms.

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

Implementing *Linear Regression* model from scratch using Python with functions for:

1. Calculating the coefficients using the least squares method.
2. Predicting the output given the input and coefficients.

In [2]:
import numpy as np

def linear_regression_fit(X, y):
    # Add bias term
    X_bias = np.c_[np.ones(X.shape[0]), X]
    
    # Compute coefficients using least squares
    coefficients = np.linalg.inv(X_bias.T.dot(X_bias)).dot(X_bias.T).dot(y)
    return coefficients

def linear_regression_predict(X, coefficients):
    # Add bias term
    X_bias = np.c_[np.ones(X.shape[0]), X]
    
    # Predict output
    y_pred = X_bias.dot(coefficients)
    return y_pred


Implementing *K-Nearest Neighbors algorithm* with functions for:

1. Calculating the Euclidean distance between two points.
2. Finding the k-nearest neighbors.
3. Predicting the class label based on the majority vote of nearest neighbors.

In [3]:
from collections import Counter

def euclidean_distance(x1, x2):
    return np.sqrt(np.sum((x1 - x2) ** 2))

def knn_predict(X_train, y_train, x_test, k):
    distances = [euclidean_distance(x_test, x_train) for x_train in X_train]
    k_indices = np.argsort(distances)[:k]
    k_nearest_labels = [y_train[i] for i in k_indices]
    most_common = Counter(k_nearest_labels).most_common(1)
    return most_common[0][0]


For *evaluation metrics*, we need to write a function to compute the accuracy of a classification model given the actual labels and predicted labels.

In [4]:
def accuracy_score(y_true, y_pred):
    correct = sum(y_t == y_p for y_t, y_p in zip(y_true, y_pred))
    total = len(y_true)
    accuracy = correct / total
    return accuracy


Adjust the parameter settings as needed based on the specific requirements.

In [5]:
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Regression
coefficients = linear_regression_fit(X_train, y_train)
y_pred_lr = linear_regression_predict(X_test, coefficients)

# K-Nearest Neighbors
k = 5
y_pred_knn = [knn_predict(X_train, y_train, x_test, k) for x_test in X_test]

# Evaluation
accuracy_lr = accuracy_score(y_test, y_pred_lr)
accuracy_knn = accuracy_score(y_test, y_pred_knn)

print("Linear Regression Accuracy:", accuracy_lr)
print("K-Nearest Neighbors Accuracy:", accuracy_knn)


Linear Regression Accuracy: 0.0
K-Nearest Neighbors Accuracy: 1.0


Now, we are abile to implement two effective machine learning algorithms, understand evaluation metrics, and utilize common machine learning library like NumPy effectively. 
* Happy Coding with ML !!!*
> Nay Chi Lynn