Welcome to the K-Nearest Neighbors (KNN) Classifier project! This repository contains code to implement a basic machine learning classifier using the K-Nearest Neighbors algorithm. This algorithm classifies data points based on the majority class of the closest data points, or "neighbors."
Email: ktobia10@wgu.edu
LinkedIn: Kelvin R. Tobias
Bluesky: @kelvintechnical.bsky.social
Instagram: @kelvinintech
The K-Nearest Neighbors (KNN) Classifier is a simple and effective classification algorithm that works well with smaller datasets and is easy to understand. In this project, we use Python libraries like numpy
, matplotlib
, and scikit-learn
to build and test our KNN model on the popular Iris dataset, classifying different species of iris flowers based on physical features.
While implementing and reviewing the code, here are five key insights I gained:
- Importance of Data Splitting: Splitting data into training and testing sets helps prevent overfitting and ensures the model generalizes well to unseen data.
- Cross-Validation: Using cross-validation provides a better estimate of model performance by evaluating it on multiple subsets of the dataset.
- Feature Extraction: The
CountVectorizer
transforms text into numerical data by counting word occurrences, which is crucial for text-based machine learning. - Saving Models: The
joblib
library allows for saving and loading models and preprocessing tools, making it easier to deploy and reuse them. - Understanding Metrics: Metrics like precision, recall, and F1-score provide a detailed view of model performance beyond just accuracy.
Here is an overview of the key components of the code in this project:
# Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
# Load dataset
data = load_iris()
X = data.data
y = data.target
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the KNN model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# Test the model
accuracy = knn.score(X_test, y_test)
print("Model accuracy:", accuracy)
data = load_iris()
: Loads the Iris dataset, which contains data for classifying iris flowers.train_test_split
: Splits data into training and testing sets.KNeighborsClassifier
: Initializes the KNN classifier withn_neighbors=3
.
If you want to use or modify this code, you can "fork" it to make your own copy:
- Fork this repository by clicking the "Fork" button at the top-right of this page.
- Clone the forked repository to your local machine:
git clone https://github.com/your-username/K-Nearest-Neighbors-KNN-Classifier.git
- Navigate to the project directory:
cd K-Nearest-Neighbors-KNN-Classifier
- Install the necessary libraries:
pip install numpy scikit-learn matplotlib
- Run the code:
python knn_classifier.py
Contributions are welcome! Feel free to make pull requests to improve the code or add new features.
This project is open-source and free to use. Please credit this repository if you use it in your own projects.