Skip to content

kelvintechnical/K-Nearest-Neighbors-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

K-Nearest Neighbors (KNN) Classifier

Welcome to the K-Nearest Neighbors (KNN) Classifier project! This repository contains code to implement a basic machine learning classifier using the K-Nearest Neighbors algorithm. This algorithm classifies data points based on the majority class of the closest data points, or "neighbors."


📫 How to reach me:

Email: ktobia10@wgu.edu
LinkedIn: Kelvin R. Tobias
Bluesky: @kelvintechnical.bsky.social
Instagram: @kelvinintech


Project Overview

The K-Nearest Neighbors (KNN) Classifier is a simple and effective classification algorithm that works well with smaller datasets and is easy to understand. In this project, we use Python libraries like numpy, matplotlib, and scikit-learn to build and test our KNN model on the popular Iris dataset, classifying different species of iris flowers based on physical features.


5 Things I Learned from This Project

While implementing and reviewing the code, here are five key insights I gained:

  1. Importance of Data Splitting: Splitting data into training and testing sets helps prevent overfitting and ensures the model generalizes well to unseen data.
  2. Cross-Validation: Using cross-validation provides a better estimate of model performance by evaluating it on multiple subsets of the dataset.
  3. Feature Extraction: The CountVectorizer transforms text into numerical data by counting word occurrences, which is crucial for text-based machine learning.
  4. Saving Models: The joblib library allows for saving and loading models and preprocessing tools, making it easier to deploy and reuse them.
  5. Understanding Metrics: Metrics like precision, recall, and F1-score provide a detailed view of model performance beyond just accuracy.

Code Explanation

Here is an overview of the key components of the code in this project:

# Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the KNN model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Test the model
accuracy = knn.score(X_test, y_test)
print("Model accuracy:", accuracy)

Explanation:

  • data = load_iris(): Loads the Iris dataset, which contains data for classifying iris flowers.
  • train_test_split: Splits data into training and testing sets.
  • KNeighborsClassifier: Initializes the KNN classifier with n_neighbors=3.

How to Use This Repository

If you want to use or modify this code, you can "fork" it to make your own copy:

  1. Fork this repository by clicking the "Fork" button at the top-right of this page.
  2. Clone the forked repository to your local machine:
git clone https://github.com/your-username/K-Nearest-Neighbors-KNN-Classifier.git
  1. Navigate to the project directory:
cd K-Nearest-Neighbors-KNN-Classifier
  1. Install the necessary libraries:
pip install numpy scikit-learn matplotlib
  1. Run the code:
python knn_classifier.py

Contributing

Contributions are welcome! Feel free to make pull requests to improve the code or add new features.


License

This project is open-source and free to use. Please credit this repository if you use it in your own projects.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages