# Hyperparameter Tuning Using Feature Set 2

---

**Feature Set 2** includes:

- **Soil Moisture**
- **Temperature**
- **Soil Humidity**
- **Air temperature (C)**
- **Wind speed (Km/h)**
- **Pressure (KPa)**

The dataset used for this analysis is available [here](https://www.kaggle.com/datasets/nelakurthisudheer/dataset-for-predicting-watering-the-plants).

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
    accuracy_score, 
    precision_score, 
    recall_score, 
    f1_score, 
    confusion_matrix, 
    classification_report, 
    roc_curve, 
    auc
)

# For handling warnings
import warnings
warnings.filterwarnings('ignore')

# Data Preparation and Feature Selection

In [7]:
df = pd.read_csv('soil_data.csv')
required_columns = ['Soil Moisture', 'Temperature', 'Soil Humidity', 'Air temperature (C)', 'Wind speed (Km/h)', 'Pressure (KPa)', 'Status']
df = df.dropna(subset=required_columns)
df['Status'] = df['Status'].map({'OFF': 0, 'ON': 1})

fs2 = ['Soil Moisture', 'Temperature', 'Soil Humidity', 'Air temperature (C)', 'Wind speed (Km/h)', 'Pressure (KPa)'] 

X = df[fs2]
y = df['Status']

In [9]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.3, 
    random_state=42, 
    stratify=y
)

print(f"Training set size: {X_train.shape[0]} samples")
print(f"Evaluation set size: {X_test.shape[0]} samples")

Training set size: 16796 samples
Evaluation set size: 7199 samples


## Logistic Regression
### Tuning hyperparameters
For logistic regression, the main hyperparameters that are tuned are solver, penalty, and regularization strength

In [16]:
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV

logreg = LogisticRegression()
solvers = ['newton-cg', 'lbfgs', 'liblinear']
penalty = ['l2']
c_values = [100, 10, 1.0, 0.1, 0.01]


# define grid search
grid = dict(solver=solvers,penalty=penalty,C=c_values)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=logreg, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result = grid_search.fit(X, y)


# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.701591 using {'C': 0.1, 'penalty': 'l2', 'solver': 'newton-cg'}
0.701522 (0.009796) with: {'C': 100, 'penalty': 'l2', 'solver': 'newton-cg'}
0.701272 (0.009827) with: {'C': 100, 'penalty': 'l2', 'solver': 'lbfgs'}
0.701272 (0.009827) with: {'C': 100, 'penalty': 'l2', 'solver': 'liblinear'}
0.701536 (0.009814) with: {'C': 10, 'penalty': 'l2', 'solver': 'newton-cg'}
0.701272 (0.009827) with: {'C': 10, 'penalty': 'l2', 'solver': 'lbfgs'}
0.701272 (0.009827) with: {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}
0.701536 (0.009840) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'newton-cg'}
0.701272 (0.009827) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'lbfgs'}
0.701272 (0.009827) with: {'C': 1.0, 'penalty': 'l2', 'solver': 'liblinear'}
0.701591 (0.009784) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'newton-cg'}
0.701272 (0.009827) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'lbfgs'}
0.701272 (0.009827) with: {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}
0.701369 (0.009604) wit