# **Smart Irrigation Prediction**

## Problem Statement
The objective of this project is to predict whether irrigation is required for a crop
based on soil type, crop growth stage, and environmental conditions such as moisture
index, temperature, and humidity.

This prediction helps farmers make informed irrigation decisions, reduce water
wastage, and prevent crop stress. The problem is formulated as a binary classification
task where the output indicates whether irrigation is needed or not.



**Dataset Loading**

In [2]:
import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/cropdata_updated.csv")
df.head()

Unnamed: 0,crop ID,soil_type,Seedling Stage,MOI,temp,humidity,result
0,Wheat,Black Soil,Germination,1,25,80.0,1
1,Wheat,Black Soil,Germination,2,26,77.0,1
2,Wheat,Black Soil,Germination,3,27,74.0,1
3,Wheat,Black Soil,Germination,4,28,71.0,1
4,Wheat,Black Soil,Germination,5,29,68.0,1


**Data Preprocessing**

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16411 entries, 0 to 16410
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   crop ID         16411 non-null  object 
 1   soil_type       16411 non-null  object 
 2   Seedling Stage  16411 non-null  object 
 3   MOI             16411 non-null  int64  
 4   temp            16411 non-null  int64  
 5   humidity        16411 non-null  float64
 6   result          16411 non-null  int64  
dtypes: float64(1), int64(3), object(3)
memory usage: 897.6+ KB


In [8]:
df.columns

Index(['crop ID', 'soil_type', 'Seedling_Stage', 'MOI', 'temp', 'humidity',
       'result'],
      dtype='object')

In [4]:
df.isnull().sum()

Unnamed: 0,0
crop ID,0
soil_type,0
Seedling Stage,0
MOI,0
temp,0
humidity,0
result,0


The dataset was checked for missing values, and no null values were found.
Therefore, no imputation was required.


In [9]:
# Renaming the column to remove spaces for easier and cleaner code usage
df.rename(columns={'Seedling Stage': 'Seedling_Stage'}, inplace=True)
df.rename(columns={'crop ID': 'crop_ID'}, inplace=True)

Column names were standardized to remove spaces for cleaner and more readable code.

**Encode categorical features**

In [10]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

categorical_cols = ['crop_ID', 'soil_type', 'Seedling_Stage']

for col in categorical_cols:
    df[col] = le.fit_transform(df[col])

Categorical features such as crop ID, soil type, and seedling stage were encoded into numerical values to make them suitable for machine learning models.


**Split Features and Target**

In [11]:
X = df.drop('result', axis=1)
y = df['result']

The dataset was split into input features (X) and target variable (y),
where the target indicates whether irrigation is required.

**Train–Test Split**

In [12]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

The data was divided into training and testing sets to evaluate
model performance on unseen data.


**Feature Scaling**

In [13]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Feature scaling was applied to normalize numerical values and ensure
all features contribute equally during model training.


**Model Training (Logistic Regression)**

In [14]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train_scaled, y_train)

Logistic Regression was used as a baseline classification model
due to its simplicity and interpretability.


**Evaluation Metrics**

In [15]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

y_pred = model.predict(X_test_scaled)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.8215047212915016

Classification Report:
               precision    recall  f1-score   support

           0       0.84      0.90      0.87      1835
           1       0.81      0.85      0.83      1231
           2       0.00      0.00      0.00       217

    accuracy                           0.82      3283
   macro avg       0.55      0.58      0.57      3283
weighted avg       0.77      0.82      0.80      3283



**Inference Explanation**

The model predicts whether irrigation is required based on soil type,
crop growth stage, moisture index, temperature, and humidity.
Lower moisture levels combined with higher temperatures increase
the likelihood of irrigation being required.

**Model Improvement** : Random Forest Classifier

In [16]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

rf_model = RandomForestClassifier(
    n_estimators=100,
    random_state=42
)

rf_model.fit(X_train, y_train)

rf_pred = rf_model.predict(X_test)

**Random Forest Evaluation**

In [17]:
print("Random Forest Accuracy:", accuracy_score(y_test, rf_pred))
print("\nClassification Report:\n", classification_report(y_test, rf_pred))

Random Forest Accuracy: 0.9899482180932074

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00      1835
           1       0.98      1.00      0.99      1231
           2       0.97      0.88      0.92       217

    accuracy                           0.99      3283
   macro avg       0.98      0.96      0.97      3283
weighted avg       0.99      0.99      0.99      3283



Random Forest was used as an improved model to capture non-linear relationships
between environmental factors and irrigation requirements. Compared to Logistic
Regression, Random Forest achieved better performance by combining multiple
decision trees, making the predictions more robust.


**Feature Importance**

In [18]:
import pandas as pd

feature_importance = pd.Series(
    rf_model.feature_importances_,
    index=X.columns
).sort_values(ascending=False)

feature_importance

Unnamed: 0,0
MOI,0.356602
temp,0.298932
humidity,0.204474
Seedling_Stage,0.082513
crop_ID,0.031438
soil_type,0.02604


Feature importance analysis shows that Moisture Index and Temperature are the most influential factors in predicting irrigation requirement, which aligns with real-world agricultural understanding.


# ***FINAL Inference***

Among the models tested, Random Forest provided better prediction performance
compared to Logistic Regression. The model effectively identifies irrigation
requirements based on soil moisture, temperature, humidity, and crop growth stage.
This model can be used as a decision-support system to help farmers optimize water
usage and prevent crop stress.