# 📌 Student Details
**Name:** Or Kattan  
**ID:** 211312657  

This notebook solves a classification task on the Wine dataset.
The goal is to predict the wine category based on chemical attributes using KNN.

### 🤖 Prompt/LLM Usage
This project was developed with the help of ChatGPT for technical assistance in Python, machine learning flow, and code refinement.


# 🧪 Wine Classification Project
**Student:** Or Kattan  
**ID:** 211312657  

This notebook performs a full classification flow using the Wine dataset, including scaling, training a KNN model, tuning via GridSearchCV, evaluation, and PCA visualization.

In [92]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.decomposition import PCA

In [94]:
# Load the training and test datasets
train_df = pd.read_csv('wine_train.csv')
test_df = pd.read_csv('wine_test.csv')
train_df = train_df.drop_duplicates()

In [96]:
features = [col for col in train_df.columns if col != 'target']

X = train_df[features]
y = train_df['target']
X_test = test_df[features]
y_test = test_df['target']

In [98]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_test_scaled = scaler.transform(X_test)

In [100]:
X_train, X_val, y_train, y_val = train_test_split(
    X_scaled, y, test_size=0.2, stratify=y, random_state=42
)

In [102]:
param_grid = {'n_neighbors': [3, 5, 7, 9]}
grid = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5, scoring='f1_macro')
grid.fit(X_train, y_train)
best_model = grid.best_estimator_

In [103]:
y_val_pred = best_model.predict(X_val)
print('Validation Classification Report:')
print(classification_report(y_val, y_val_pred))
print('Validation Confusion Matrix:')
print(confusion_matrix(y_val, y_val_pred))

Validation Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.92      0.96        12
           2       0.88      1.00      0.93         7

    accuracy                           0.97        29
   macro avg       0.96      0.97      0.96        29
weighted avg       0.97      0.97      0.97        29

Validation Confusion Matrix:
[[10  0  0]
 [ 0 11  1]
 [ 0  0  7]]


In [106]:
y_test_pred = best_model.predict(X_test_scaled)
print('Test Classification Report:')
print(classification_report(y_test, y_test_pred))

Test Classification Report:
              precision    recall  f1-score   support

           0       0.92      1.00      0.96        11
           1       1.00      0.86      0.92        14
           2       0.92      1.00      0.96        11

    accuracy                           0.94        36
   macro avg       0.94      0.95      0.95        36
weighted avg       0.95      0.94      0.94        36



In [108]:
# GridSearchCV results summary
pd.DataFrame(grid.cv_results_)[['param_n_neighbors', 'mean_test_score', 'rank_test_score']]

Unnamed: 0,param_n_neighbors,mean_test_score,rank_test_score
0,3,0.949019,3
1,5,0.939212,4
2,7,0.956184,1
3,9,0.956184,1


###  Additional Models:

In [111]:
 ### Decision Tree

In [113]:
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier(random_state=42)
tree.fit(X_train, y_train)
y_val_tree_pred = tree.predict(X_val)
print('Validation Report – Decision Tree')
print(classification_report(y_val, y_val_tree_pred))

Validation Report – Decision Tree
              precision    recall  f1-score   support

           0       1.00      0.90      0.95        10
           1       0.91      0.83      0.87        12
           2       0.78      1.00      0.88         7

    accuracy                           0.90        29
   macro avg       0.90      0.91      0.90        29
weighted avg       0.91      0.90      0.90        29



In [115]:
# First 5 predictions on test set
y_test_pred[:5]

array([2, 0, 2, 1, 2], dtype=int64)

In [117]:
### Logistic Regression

In [119]:

from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(max_iter=1000, random_state=42)
logreg.fit(X_train, y_train)
y_val_logreg = logreg.predict(X_val)
print("Validation Report – Logistic Regression")
print(classification_report(y_val, y_val_logreg))
print("Confusion Matrix – Logistic Regression")
print(confusion_matrix(y_val, y_val_logreg))


Validation Report – Logistic Regression
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.92      0.96        12
           2       0.88      1.00      0.93         7

    accuracy                           0.97        29
   macro avg       0.96      0.97      0.96        29
weighted avg       0.97      0.97      0.97        29

Confusion Matrix – Logistic Regression
[[10  0  0]
 [ 0 11  1]
 [ 0  0  7]]
