# Project Title:  crop advisory system model

# Goal

# ML Task type: Classification

Objective: The objective of the model is to predict the most suitable crop to be cultivated based on various input parameters such as levels of Nitrogen (N), Phosphorus (P), Potassium (K), Temperature, Humidity, pH level, and Rainfall. This prediction helps in providing advisory recommendations to farmers, aiming to optimize crop yield and ensure sustainable agricultural practices.



# Description:

In a classification task for a crop advisory system, the model learns from historical data where the features (N, P, K, Temperature, Humidity, pH, Rainfall) are used to predict the class label (crop type) of the data instances. The model uses this learned relationship to classify new, unseen data into one of several predefined crop categories. This assists farmers in making informed decisions about which crop(s) are likely to thrive best under given environmental conditions, thereby maximizing productivity and profitability while minimizing risks such as crop failure or suboptimal yields.

# Lets start to code

# import all  necessary library

In [1]:

from __future__ import print_function
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report
from sklearn import metrics
from sklearn import tree
import warnings
warnings.filterwarnings('ignore')

# load the dataset

In [6]:
data=pd.read_csv('Crop_dataset.csv')
data

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,90,42,43,20.879744,82.002744,6.502985,202.935536,irish potato
1,85,58,41,21.770462,80.319644,7.038096,226.655537,irish potato
2,60,55,44,23.004459,82.320763,7.840207,263.964248,irish potato
3,74,35,40,26.491096,80.158363,6.980401,242.864034,irish potato
4,78,42,42,20.130175,81.604873,7.628473,262.717340,irish potato
...,...,...,...,...,...,...,...,...
2195,107,34,32,26.774637,66.413269,6.780064,177.774507,coffee
2196,99,15,27,27.417112,56.636362,6.086922,127.924610,coffee
2197,118,33,30,24.131797,67.225123,6.362608,173.322839,coffee
2198,117,32,34,26.272418,52.127394,6.758793,127.175293,coffee


In [7]:
data.head()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,90,42,43,20.879744,82.002744,6.502985,202.935536,irish potato
1,85,58,41,21.770462,80.319644,7.038096,226.655537,irish potato
2,60,55,44,23.004459,82.320763,7.840207,263.964248,irish potato
3,74,35,40,26.491096,80.158363,6.980401,242.864034,irish potato
4,78,42,42,20.130175,81.604873,7.628473,262.71734,irish potato


In [8]:
data.tail()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
2195,107,34,32,26.774637,66.413269,6.780064,177.774507,coffee
2196,99,15,27,27.417112,56.636362,6.086922,127.92461,coffee
2197,118,33,30,24.131797,67.225123,6.362608,173.322839,coffee
2198,117,32,34,26.272418,52.127394,6.758793,127.175293,coffee
2199,104,18,30,23.603016,60.396475,6.779833,140.937041,coffee


In [10]:
data.size

17600

In [11]:
data.columns

Index(['N', 'P', 'K', 'temperature', 'humidity', 'ph', 'rainfall', 'label'], dtype='object')

In [12]:
data.dtypes

N                int64
P                int64
K                int64
temperature    float64
humidity       float64
ph             float64
rainfall       float64
label           object
dtype: object

In [13]:
data['label'].unique()

array(['irish potato', 'maize', 'chickpea', 'wheat', 'pigeonpeas',
       'mothbeans', 'mungbean', 'blackgram', 'lentil', 'pomegranate',
       'banana', 'mango', 'grapes', 'watermelon', 'muskmelon', 'apple',
       'orange', 'papaya', 'coconut', 'cotton', 'jute', 'coffee'],
      dtype=object)

# Data preparation

 Feature selection From above information, I need to remain with only Irish potato, wheat and maize as those crops is suitable in our region (Karongi). So I have to filter the entered dataset. feature filtering is essential because it enhances model performance, reduces overfitting, simplifies interpretation, improves computational efficiency, and enhances overall data understanding and quality. It is a critical step in the machine learning pipeline that contributes significantly to the success and reliability of predictive models.

In [14]:
df = data[data['label'].isin(['irish potato', 'wheat', 'maize'])]

In [15]:
df

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,90,42,43,20.879744,82.002744,6.502985,202.935536,irish potato
1,85,58,41,21.770462,80.319644,7.038096,226.655537,irish potato
2,60,55,44,23.004459,82.320763,7.840207,263.964248,irish potato
3,74,35,40,26.491096,80.158363,6.980401,242.864034,irish potato
4,78,42,42,20.130175,81.604873,7.628473,262.717340,irish potato
...,...,...,...,...,...,...,...,...
395,27,65,18,20.109938,23.223238,5.595032,73.363865,wheat
396,30,63,16,23.605066,21.905396,5.525905,100.597873,wheat
397,37,70,25,19.731369,24.894874,5.819404,84.063541,wheat
398,27,63,19,20.934099,21.189301,5.562202,133.191442,wheat


In [16]:
df['label'].unique()

array(['irish potato', 'maize', 'wheat'], dtype=object)

# Checking the null or any missing value in every column of data

In [17]:
df.isnull().sum()

N              0
P              0
K              0
temperature    0
humidity       0
ph             0
rainfall       0
label          0
dtype: int64

# Checking for any Outliers

In [20]:
df.describe()[['N', 'P', 'K', 'temperature', 'humidity', 'ph', 'rainfall']]

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall
count,300.0,300.0,300.0,300.0,300.0,300.0,300.0
mean,59.466667,54.52,26.57,22.06454,56.323476,6.140024,142.289293
std,29.764554,12.086489,9.883186,2.858299,25.808927,0.583949,72.059271
min,0.0,35.0,15.0,15.330426,18.09224,5.005307,60.275525
25%,28.0,44.75,19.0,19.919017,23.404594,5.715107,84.188154
50%,68.5,55.0,22.0,22.22304,65.303845,5.950298,108.574448
75%,83.25,61.0,38.0,24.51887,80.908597,6.50202,203.401698
max,100.0,80.0,45.0,26.929951,84.969072,7.868475,298.560117


# data cleaning

removing duplicates is a crucial step in data cleaning to ensure accuracy, efficiency, and reliability in subsequent analyses and modeling tasks. It promotes cleaner, more reliable data that can lead to better insights and decisions.

In [22]:
df = df.drop_duplicates()
df

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,90,42,43,20.879744,82.002744,6.502985,202.935536,irish potato
1,85,58,41,21.770462,80.319644,7.038096,226.655537,irish potato
2,60,55,44,23.004459,82.320763,7.840207,263.964248,irish potato
3,74,35,40,26.491096,80.158363,6.980401,242.864034,irish potato
4,78,42,42,20.130175,81.604873,7.628473,262.717340,irish potato
...,...,...,...,...,...,...,...,...
395,27,65,18,20.109938,23.223238,5.595032,73.363865,wheat
396,30,63,16,23.605066,21.905396,5.525905,100.597873,wheat
397,37,70,25,19.731369,24.894874,5.819404,84.063541,wheat
398,27,63,19,20.934099,21.189301,5.562202,133.191442,wheat


In [23]:
df['label'].value_counts()

label
irish potato    100
maize           100
wheat           100
Name: count, dtype: int64

# TRAIN MODEL 

In [24]:
df.head()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,90,42,43,20.879744,82.002744,6.502985,202.935536,irish potato
1,85,58,41,21.770462,80.319644,7.038096,226.655537,irish potato
2,60,55,44,23.004459,82.320763,7.840207,263.964248,irish potato
3,74,35,40,26.491096,80.158363,6.980401,242.864034,irish potato
4,78,42,42,20.130175,81.604873,7.628473,262.71734,irish potato


In [25]:
features = df[['N', 'P','K','temperature', 'humidity', 'ph', 'rainfall']]
target = df['label']
labels = df['label']

# spliting dataset

In [26]:
from sklearn.model_selection import train_test_split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(features,target,test_size = 0.2,random_state =42)

In [27]:
from sklearn.ensemble import RandomForestClassifier

RF = RandomForestClassifier(n_estimators=20, random_state=0)
RF.fit(Xtrain,Ytrain)

predicted_values = RF.predict(Xtest)

x = metrics.accuracy_score(Ytest, predicted_values)
print("RF's Accuracy is: ", x*100)

print(classification_report(Ytest,predicted_values))

RF's Accuracy is:  100.0
              precision    recall  f1-score   support

irish potato       1.00      1.00      1.00        22
       maize       1.00      1.00      1.00        16
       wheat       1.00      1.00      1.00        22

    accuracy                           1.00        60
   macro avg       1.00      1.00      1.00        60
weighted avg       1.00      1.00      1.00        60



In [28]:
# Prompt user for input
N = float(input("Enter N value: "))
P = float(input("Enter P value: "))
K = float(input("Enter K value: "))
Temperature = float(input("Enter Temperature value: "))
Humidity = float(input("Enter Humidity value: "))
PH = float(input("Enter PH value: "))
Rainfall = float(input("Enter Rainfall value: "))

# Create data array
data = np.array([[N, P, K, Temperature, Humidity, PH, Rainfall]])

# Make prediction
prediction = RF.predict(data)
print("Predicted crop class:", prediction)

Enter N value: 67
Enter P value: 77
Enter K value: 12
Enter Temperature value: 23
Enter Humidity value: 22
Enter PH value: 8
Enter Rainfall value: 99
Predicted crop class: ['maize']


In [31]:
import joblib
joblib.dump(RF,'rf_model.pkl')

['rf_model.pkl']