
# Crop Yield Prediction - Machine Learning Project

## Problem Understanding
Agriculture plays a critical role in the global economy, and accurate crop yield prediction helps farmers, policymakers, and supply chain stakeholders make informed decisions. The goal of this project is to build a machine learning model that predicts crop yield using real-world agricultural data such as rainfall, temperature, pesticide usage, and cultivated area.

By analyzing historical patterns, the model can forecast future yields, helping reduce risk, optimize resource allocation, and improve food security. This project focuses on applying core machine learning fundamentals including data preprocessing, model training, evaluation, and inference rather than maximizing accuracy alone.


In [None]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


In [None]:

df = pd.read_csv("crop_yield.csv")  # Replace with your dataset path
df.head()



## Data Preprocessing
- Handle missing values  
- Convert categorical variables into numerical format  
- Scale features for better model performance  


In [None]:

df = df.dropna()
df = pd.get_dummies(df, drop_first=True)

X = df.drop("Yield", axis=1)  # Ensure target column is named 'Yield'
y = df["Yield"]


In [None]:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)



## Model Selection & Training
A Random Forest Regressor is used because it handles nonlinear relationships well and performs strongly on structured datasets.


In [None]:

model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)



## Evaluation Metrics
- MAE (Mean Absolute Error)  
- RMSE  
- RÂ² Score  


In [None]:

predictions = model.predict(X_test)

mae = mean_absolute_error(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))
r2 = r2_score(y_test, predictions)

print("MAE:", mae)
print("RMSE:", rmse)
print("R2 Score:", r2)



## Inference Explanation
The model learns relationships between environmental factors and crop production. Strong evaluation scores indicate the model can reliably forecast yields, supporting better agricultural planning and decision-making.



## Model Pipeline Description
1. Load dataset  
2. Clean and preprocess data  
3. Encode categorical variables  
4. Split data  
5. Train model  
6. Evaluate performance  
7. Generate predictions  
