# Climate Change Impact on Agriculture: Machine Learning Project
This project analyzes the effects of climate change on agricultural yield using a dataset that includes various environmental and agricultural factors. We will preprocess the data, select features, and apply machine learning models to predict crop yield per hectare.

## 1. Importing Required Libraries
We begin by importing the necessary libraries for data manipulation, visualization, and machine learning.

In [9]:
!pip install xgboost


Collecting xgboost
  Downloading xgboost-2.1.1-py3-none-win_amd64.whl.metadata (2.1 kB)
Downloading xgboost-2.1.1-py3-none-win_amd64.whl (124.9 MB)
   ---------------------------------------- 0.0/124.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/124.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/124.9 MB 435.7 kB/s eta 0:04:47
   ---------------------------------------- 0.1/124.9 MB 744.7 kB/s eta 0:02:48
   ---------------------------------------- 0.3/124.9 MB 1.6 MB/s eta 0:01:20
   ---------------------------------------- 0.7/124.9 MB 3.5 MB/s eta 0:00:36
   ---------------------------------------- 1.4/124.9 MB 5.4 MB/s eta 0:00:24
    --------------------------------------- 2.0/124.9 MB 6.8 MB/s eta 0:00:19
    --------------------------------------- 2.7/124.9 MB 7.9 MB/s eta 0:00:16
   - -------------------------------------- 3.3/124.9 MB 8.5 MB/s eta 0:00:15
   - -------------------------------------- 4.0/124.9 MB 9.1 MB/s eta 0:00:14
   -

## 2. Loading the Dataset
We will load the dataset containing information about climate factors and crop yield. The dataset is assumed to be in CSV format and will be loaded using Pandas.

In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans, DBSCAN
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression
import xgboost as xgb

# Load dataset from the provided path
file_path = r'C:\Users\admin\Desktop\ML\FINAL PROJECT\climate_change_impact_on_agriculture_2024.csv'
data = pd.read_csv(file_path)

# Preview dataset
print(data.head())

   Year Country         Region  Crop_Type  Average_Temperature_C  \
0  2001   India    West Bengal       Corn                   1.55   
1  2024   China          North       Corn                   3.23   
2  2001  France  Ile-de-France      Wheat                  21.11   
3  2001  Canada       Prairies     Coffee                  27.85   
4  1998   India     Tamil Nadu  Sugarcane                   2.19   

   Total_Precipitation_mm  CO2_Emissions_MT  Crop_Yield_MT_per_HA  \
0                  447.06             15.22                 1.737   
1                 2913.57             29.82                 1.737   
2                 1301.74             25.75                 1.719   
3                 1154.36             13.91                 3.890   
4                 1627.48             11.81                 1.080   

   Extreme_Weather_Events  Irrigation_Access_%  Pesticide_Use_KG_per_HA  \
0                       8                14.54                    10.08   
1                       8 

## 3. Data Preprocessing
Before applying machine learning models, we need to preprocess the data. In this step, we will define our features and target variable, and split the data into training and test sets.

In [11]:
# Data preprocessing
# Assume target variable is 'Crop_Yield_MT_per_HA'
features = ['Average_Temperature_C', 'Total_Precipitation_mm', 'CO2_Emissions_MT', 
            'Extreme_Weather_Events', 'Irrigation_Access_%', 'Pesticide_Use_KG_per_HA', 
            'Fertilizer_Use_KG_per_HA', 'Soil_Health_Index']
target = 'Crop_Yield_MT_per_HA'

X = data[features]
y = data[target]

# Split dataset into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### 3.1. Feature Scaling
Since some machine learning models are sensitive to the scale of input data, we will standardize the features using `StandardScaler` to ensure they have mean 0 and variance 1.

In [12]:
# Linear Regression Model
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)

# Predictions
y_pred_lr = lr.predict(X_test_scaled)

# Metrics
rmse_lr = np.sqrt(mean_squared_error(y_test, y_pred_lr))
r2_lr = r2_score(y_test, y_pred_lr)
print(f"Linear Regression RMSE: {rmse_lr}, R^2: {r2_lr}")


Linear Regression RMSE: 0.9920008984649176, R^2: 0.06765327874966898
