# Plant Recommendation Using KNN Classifier

This notebook loads a balanced synthetic dataset, preprocesses the data, performs hyperparameter tuning with GridSearchCV and cross-validation for a KNN classifier, and saves the trained model. It then loads the saved model and tests it with a sample sensor reading.

In [4]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score
import joblib

## Data Loading and Preprocessing

The dataset is loaded from `synthetic_combined_dataset_balanced.csv`. Features are scaled and target labels are encoded.

In [2]:
data = pd.read_csv('synthetic_combined_dataset_india_plants.csv')
X = data.drop('Plant_Type', axis=1)
y = data['Plant_Type']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

le = LabelEncoder()
y_encoded = le.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_encoded, test_size=0.2, random_state=42)

## Hyperparameter Tuning and Model Training

A KNN classifier is tuned using GridSearchCV with 5-fold cross-validation.

In [5]:
knn = KNeighborsClassifier()
param_grid = {'n_neighbors': [3, 5, 7, 9, 11], 'weights': ['uniform', 'distance']}
grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)
print("Best parameters found:", grid_search.best_params_)
best_knn = grid_search.best_estimator_

Best parameters found: {'n_neighbors': 11, 'weights': 'distance'}


## Cross-Validation and Test Evaluation

In [6]:
cv_scores = cross_val_score(best_knn, X_scaled, y_encoded, cv=5, scoring='accuracy')
print("Cross-validation scores:", cv_scores)
print("Mean CV Accuracy: {:.3f}".format(cv_scores.mean()))
y_pred = best_knn.predict(X_test)
print("Test Accuracy: {:.3f}".format(accuracy_score(y_test, y_pred)))
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=le.classes_))

Cross-validation scores: [0.9685 0.971  0.975  0.9715 0.9745]
Mean CV Accuracy: 0.972
Test Accuracy: 0.970

Classification Report:
              precision    recall  f1-score   support

        Amla       0.93      0.94      0.93       399
   Drumstick       0.99      0.99      0.99       416
       Jamun       0.95      0.94      0.94       377
       Mango       0.99      0.98      0.99       399
        Neem       0.99      0.99      0.99       409

    accuracy                           0.97      2000
   macro avg       0.97      0.97      0.97      2000
weighted avg       0.97      0.97      0.97      2000



## Save the Model and Preprocessors

In [7]:
joblib.dump(best_knn, 'knn_plant_identifier.pkl')
joblib.dump(scaler, 'scaler_knn.pkl')
joblib.dump(le, 'label_encoder_knn.pkl')
print("Model and preprocessors saved.")

Model and preprocessors saved.


## Model Testing with a Sample Input

The following cell loads the saved model and tests it with a sample sensor reading. The features (in order) are:
- AQI, CO2_Level_ppm, NO2_Level_ppm, PM2_5_ug_m3, PM10_ug_m3, VOC_Level_ppm,
- pH_Level, Nitrogen_mg_kg, Phosphorus_mg_kg, Potassium_mg_kg, Moisture_Level_%, Organic_Matter_%, Plot_Area_m2.

In [9]:
knn = joblib.load('knn_plant_identifier.pkl')
scaler = joblib.load('scaler_knn.pkl')
le = joblib.load('label_encoder_knn.pkl')

In [10]:
test_samples = np.array([
    [100, 415, 12, 54.4, 116, 4, 6.8, 100, 50, 80, 55, 3.0, 500],
    [90, 410, 10, 55, 115, 3.8, 7.0, 70, 40, 40, 40, 2.0, 300],
    [110, 420, 14, 53, 118, 4.2, 6.5, 90, 60, 60, 60, 3.5, 400],
    [95, 415, 12, 54, 116, 4, 7.0, 80, 55, 65, 50, 3.0, 350],
    [105, 412, 13, 56, 117, 4.1, 6.2, 60, 45, 55, 45, 2.5, 200]
])

In [11]:
test_samples_scaled = scaler.transform(test_samples)
preds = knn.predict(test_samples_scaled)
predicted_plants = le.inverse_transform(preds)

for i, plant in enumerate(predicted_plants, start=1):
    print(f"Test Sample {i}: Recommended Plant Type: {plant}")

Test Sample 1: Recommended Plant Type: Mango
Test Sample 2: Recommended Plant Type: Neem
Test Sample 3: Recommended Plant Type: Jamun
Test Sample 4: Recommended Plant Type: Amla
Test Sample 5: Recommended Plant Type: Drumstick


