**Crop Yield Prediction for Food Security (SDG 2: Zero Hunger)**

🏁 1. Project Overview

SDG Goal: SDG 2 – Zero Hunger

Problem Statement:
Unpredictable weather patterns make it hard for farmers to estimate crop yields. This project uses machine learning to predict crop yield based on environmental factors such as rainfall, temperature, pesticide use, and soil quality.

Objective:
To build a regression model that predicts crop yield to help farmers and policymakers improve food security and resource allocation.

📊 2. Import Libraries

In [None]:
# Basic Libraries
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, r2_score


3. Load and Explore Dataset

In [None]:
# Load dataset (replace with your dataset name)
df = pd.read_csv("crop_yield.csv")

# Display the first few rows
df.head()
# Dataset information
df.info()
# Summary statistics
df.describe()


🧹 4. Data Cleaning and Preprocessing

In [None]:
# Check for missing values
df.isnull().sum()
# Fill missing values with column mean (for numeric data)
df.fillna(df.mean(), inplace=True)
# Drop duplicates if any
df.drop_duplicates(inplace=True)
# Verify data is clean
df.isnull().sum()


📈 5. Exploratory Data Analysis (EDA)

In [None]:
# Correlation Heatmap
plt.figure(figsize=(8,6))
sns.heatmap(df.corr(), annot=True, cmap='Greens')
plt.title("Correlation Heatmap")
plt.show()
# Distribution of Crop Yield
plt.figure(figsize=(7,5))
sns.histplot(df['yield'], bins=20, kde=True, color='green')
plt.title("Crop Yield Distribution")
plt.xlabel("Yield")
plt.ylabel("Frequency")
plt.show()
# Scatter plot: Rainfall vs Yield
plt.scatter(df['average_rainfall'], df['yield'], color='blue')
plt.xlabel('Average Rainfall (mm)')
plt.ylabel('Crop Yield')
plt.title('Rainfall vs Crop Yield')
plt.show()


6. Feature Selection

Choose features that influence crop yield (you can adjust based on your dataset columns)

In [None]:
# Example feature selection
X = df[['average_rainfall', 'avg_temp', 'pesticide_use', 'soil_quality']]
y = df['yield']



🔀 7. Split Dataset

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


🤖 8. Model Training
🧩 Linear Regression

In [None]:
lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)


🌳 Random Forest Regressor

In [None]:
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)


📊 9. Model Evaluation

In [None]:
# Evaluate both models
print("Linear Regression R²:", r2_score(y_test, y_pred_lr))
print("Random Forest R²:", r2_score(y_test, y_pred_rf))
print("Random Forest MAE:", mean_absolute_error(y_test, y_pred_rf))
# Visualization: Actual vs Predicted (Random Forest)
plt.figure(figsize=(7,5))
plt.scatter(y_test, y_pred_rf, color='green')
plt.xlabel("Actual Yield")
plt.ylabel("Predicted Yield")
plt.title("Actual vs Predicted Crop Yield (Random Forest)")
plt.show()


📘 10. Results Summary

In [None]:
results = pd.DataFrame({
    'Model': ['Linear Regression', 'Random Forest'],
    'R2 Score': [r2_score(y_test, y_pred_lr), r2_score(y_test, y_pred_rf)],
    'MAE': [mean_absolute_error(y_test, y_pred_lr), mean_absolute_error(y_test, y_pred_rf)]
})

results


🌱 11. Ethical Reflection

Bias & Fairness:
If data mostly represents specific regions or crops, predictions might not generalize well. Including diverse regional data ensures fairness and broader usability.

Sustainability Impact:
Accurate predictions help reduce food waste, improve planning, and support smallholder farmers — directly contributing to Zero Hunger (SDG 2).

🧭 12. Conclusion and Next Steps

Summary:

We used regression techniques to predict crop yields.

Random Forest performed better than Linear Regression.

The model can assist in agricultural decision-making and planning.

Next Steps:

Add real-time weather data via API (e.g., OpenWeatherMap).

Test additional models like Gradient Boosting or XGBoost.

Deploy the model via Streamlit for user interaction.

🧮 13. Optional: Save Model

In [None]:
import joblib
joblib.dump(rf, "crop_yield_model.pkl")
print("Model saved successfully!")
