* Crop: The name of the crop cultivated.
* Crop_Year: The year in which the crop was grown.
* Season: The specific cropping season (e.g., Kharif, Rabi, Whole Year).
* State: The Indian state where the crop was cultivated.
* Area: The total land area (in hectares) under cultivation for the specific crop.
* Production: The quantity of crop production (in metric tons).
* Annual_Rainfall: The annual rainfall received in the crop-growing region (in mm).
* Fertilizer: The total amount of fertilizer used for the crop (in kilograms).
* Pesticide: The total amount of pesticide used for the crop (in kilograms).
* Yield: The calculated crop yield (production per unit area).

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Drop unnecessary columns and load the model

In [102]:
df = pd.read_csv("data/original_crop_data.csv")
df_copy = df.copy()
df = df.drop(columns=['Crop_Year', 'Fertilizer', 'Pesticide', 'Yield'])

In [103]:
df_copy.head(1)

Unnamed: 0,Crop,Crop_Year,Season,State,Area,Production,Annual_Rainfall,Fertilizer,Pesticide,Yield
0,Arecanut,1997,Whole Year,Assam,73814.0,56708,2051.4,7024878.38,22882.34,0.796087


In [104]:
df.sample(5)

Unnamed: 0,Crop,Season,State,Area,Production,Annual_Rainfall
6851,Gram,Rabi,Maharashtra,920000.0,592200,1166.5
18733,Urad,Kharif,Sikkim,3220.0,2826,2247.5
636,Tobacco,Whole Year,Karnataka,70504.0,52131,1213.3
4254,Jowar,Kharif,Karnataka,115546.0,159843,1238.5
7101,Rapeseed &Mustard,Rabi,Chhattisgarh,48187.0,15950,896.4


In [105]:
df.shape

(19689, 6)

In [106]:
numeric_df = df.select_dtypes(include=['number'])
numeric_df.drop(columns=['Production'])

Unnamed: 0,Area,Annual_Rainfall
0,73814.0,2051.4
1,6637.0,2051.4
2,796.0,2051.4
3,19656.0,2051.4
4,1739.0,2051.4
...,...,...
19684,4000.0,1498.0
19685,1000.0,1498.0
19686,310883.0,1356.2
19687,275746.0,1356.2


In [107]:
categorical_df = df.select_dtypes(exclude=['number'])
categorical_df

Unnamed: 0,Crop,Season,State
0,Arecanut,Whole Year,Assam
1,Arhar/Tur,Kharif,Assam
2,Castor seed,Kharif,Assam
3,Coconut,Whole Year,Assam
4,Cotton(lint),Kharif,Assam
...,...,...,...
19684,Small millets,Kharif,Nagaland
19685,Wheat,Rabi,Nagaland
19686,Maize,Kharif,Jammu and Kashmir
19687,Rice,Kharif,Jammu and Kashmir


In [108]:
columns = categorical_df.columns
print(columns, '\n')

for i in columns:
    print(i)
    print(df[i].unique())
    print("*" * 50)

Index(['Crop', 'Season', 'State'], dtype='object') 

Crop
['Arecanut' 'Arhar/Tur' 'Castor seed' 'Coconut ' 'Cotton(lint)'
 'Dry chillies' 'Gram' 'Jute' 'Linseed' 'Maize' 'Mesta' 'Niger seed'
 'Onion' 'Other  Rabi pulses' 'Potato' 'Rapeseed &Mustard' 'Rice'
 'Sesamum' 'Small millets' 'Sugarcane' 'Sweet potato' 'Tapioca' 'Tobacco'
 'Turmeric' 'Wheat' 'Bajra' 'Black pepper' 'Cardamom' 'Coriander' 'Garlic'
 'Ginger' 'Groundnut' 'Horse-gram' 'Jowar' 'Ragi' 'Cashewnut' 'Banana'
 'Soyabean' 'Barley' 'Khesari' 'Masoor' 'Moong(Green Gram)'
 'Other Kharif pulses' 'Safflower' 'Sannhamp' 'Sunflower' 'Urad'
 'Peas & beans (Pulses)' 'other oilseeds' 'Other Cereals' 'Cowpea(Lobia)'
 'Oilseeds total' 'Guar seed' 'Other Summer Pulses' 'Moth']
**************************************************
Season
['Whole Year ' 'Kharif     ' 'Rabi       ' 'Autumn     ' 'Summer     '
 'Winter     ']
**************************************************
State
['Assam' 'Karnataka' 'Kerala' 'Meghalaya' 'West Bengal' 'Puduc

In [109]:
df.describe()

Unnamed: 0,Area,Production,Annual_Rainfall
count,19689.0,19689.0,19689.0
mean,179926.6,16435940.0,1437.755177
std,732828.7,263056800.0,816.909589
min,0.5,0.0,301.3
25%,1390.0,1393.0,940.7
50%,9317.0,13804.0,1247.6
75%,75112.0,122718.0,1643.7
max,50808100.0,6326000000.0,6552.7


In [110]:
df.isnull().sum()

Crop               0
Season             0
State              0
Area               0
Production         0
Annual_Rainfall    0
dtype: int64

In [111]:
df

Unnamed: 0,Crop,Season,State,Area,Production,Annual_Rainfall
0,Arecanut,Whole Year,Assam,73814.0,56708,2051.4
1,Arhar/Tur,Kharif,Assam,6637.0,4685,2051.4
2,Castor seed,Kharif,Assam,796.0,22,2051.4
3,Coconut,Whole Year,Assam,19656.0,126905000,2051.4
4,Cotton(lint),Kharif,Assam,1739.0,794,2051.4
...,...,...,...,...,...,...
19684,Small millets,Kharif,Nagaland,4000.0,2000,1498.0
19685,Wheat,Rabi,Nagaland,1000.0,3000,1498.0
19686,Maize,Kharif,Jammu and Kashmir,310883.0,440900,1356.2
19687,Rice,Kharif,Jammu and Kashmir,275746.0,5488,1356.2


In [None]:
# df.to_csv("data/required_feature.csv", index=False)
