# Plant Recommendation Using KNN Classifier

This notebook loads a balanced synthetic dataset, preprocesses the data, performs hyperparameter tuning with GridSearchCV and cross-validation for a KNN classifier, and saves the trained model. It then loads the saved model and tests it with a sample sensor reading.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score
import joblib

## Data Loading and Preprocessing

The dataset is loaded from `synthetic_combined_dataset_balanced.csv`. Features are scaled and target labels are encoded.

In [2]:
data = pd.read_csv('synthetic_combined_dataset_balanced.csv')
X = data.drop('Plant_Type', axis=1)
y = data['Plant_Type']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
le = LabelEncoder()
y_encoded = le.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_encoded, test_size=0.2, random_state=42)

## Hyperparameter Tuning and Model Training

A KNN classifier is tuned using GridSearchCV with 5-fold cross-validation.

In [3]:
knn = KNeighborsClassifier()
param_grid = {'n_neighbors': [3, 5, 7, 9, 11], 'weights': ['uniform', 'distance']}
grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)
print("Best parameters found:", grid_search.best_params_)
best_knn = grid_search.best_estimator_

Best parameters found: {'n_neighbors': 5, 'weights': 'uniform'}


## Cross-Validation and Test Evaluation

In [4]:
cv_scores = cross_val_score(best_knn, X_scaled, y_encoded, cv=5, scoring='accuracy')
print("Cross-validation scores:", cv_scores)
print("Mean CV Accuracy: {:.3f}".format(cv_scores.mean()))
y_pred = best_knn.predict(X_test)
print("Test Accuracy: {:.3f}".format(accuracy_score(y_test, y_pred)))
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=le.classes_))

Cross-validation scores: [1. 1. 1. 1. 1.]
Mean CV Accuracy: 1.000
Test Accuracy: 1.000

Classification Report:
              precision    recall  f1-score   support

      Bonsai       1.00      1.00      1.00       389
       Ficus       1.00      1.00      1.00       411
      Orchid       1.00      1.00      1.00       385
   Succulent       1.00      1.00      1.00       415

    accuracy                           1.00      1600
   macro avg       1.00      1.00      1.00      1600
weighted avg       1.00      1.00      1.00      1600



## Save the Model and Preprocessors

In [5]:
joblib.dump(best_knn, 'knn_plant_identifier.pkl')
joblib.dump(scaler, 'scaler_knn.pkl')
joblib.dump(le, 'label_encoder_knn.pkl')
print("Model and preprocessors saved.")

Model and preprocessors saved.


## Model Testing with a Sample Input

The following cell loads the saved model and tests it with a sample sensor reading. The features (in order) are:
- AQI, CO2_Level_ppm, NO2_Level_ppm, PM2_5_ug_m3, PM10_ug_m3, VOC_Level_ppm,
- pH_Level, Nitrogen_mg_kg, Phosphorus_mg_kg, Potassium_mg_kg, Moisture_Level_%, Organic_Matter_%, Plot_Area_m2.

In [6]:
knn = joblib.load('knn_plant_identifier.pkl')
scaler = joblib.load('scaler_knn.pkl')
le = joblib.load('label_encoder_knn.pkl')
test_sample = np.array([[180, 900, 40, 30, 35, 3.0, 6.5, 70, 45, 50, 55, 2.5, 100]])
test_sample_scaled = scaler.transform(test_sample)
predicted_class = knn.predict(test_sample_scaled)
predicted_label = le.inverse_transform(predicted_class)
print("Test Sample Prediction:", predicted_label[0])

Test Sample Prediction: Bonsai


