<a href="https://colab.research.google.com/github/official-okello/DS_bootcamp_with_gomycode/blob/master/Climate_Change.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## *Introduction*

Climate change has a profound impact on global agriculture, affecting crop yields, soil health, and farming sustainability. This synthetic dataset is designed to simulate real-world agricultural data, enabling researchers, data scientists, and policymakers to explore how climate variations influence food production across different regions.



🔍 Key Features:

✔️ Climate Variables – Simulated data on temperature changes, precipitation levels, and extreme weather events

✔️ Crop Productivity – Modeled impact of climate shifts on yields of key crops like wheat, rice, and corn

✔️ Regional Insights – Includes various geographic regions to analyze diverse climate-agriculture interactions

✔️ Ideal for Predictive Modeling – Supports climate risk assessment, food security studies, and sustainability research



📊 Dataset Overview:
This dataset has been synthetically generated and does not contain real-world agricultural records. It is intended for academic learning, climate impact analysis, and machine learning applications in environmental studies.


📖 Columns Description:

**Region** – Simulated geographic region

**Year** – Modeled year of data collection

**Average_Temperature** – Simulated temperature levels (°C)

**Precipitation** – Modeled annual rainfall (mm)

**Crop_Yield** – Synthetic yield data for selected crops (tons/hectare)

**Extreme_Weather_Events** – Number of modeled extreme weather occurrences per year

⚠️ Disclaimer:
This dataset is completely synthetic and should not be used for real-world climate policy decisions or agricultural forecasting. It is meant for educational purposes, research, and data science applications.

🔹 I'll use this dataset to analyze climate trends, build predictive models, and explore solutions for sustainable agriculture! 🌱📊

## *Loading the data*

In [None]:
# import libraries
import pandas as pd

In [None]:
# load data and convert to dataframe
data = pd.read_csv('/content/climate_change_impact_on_agriculture_2024.csv')

In [None]:
# first 5 records
data.head()

Unnamed: 0,Year,Country,Region,Crop_Type,Average_Temperature_C,Total_Precipitation_mm,CO2_Emissions_MT,Crop_Yield_MT_per_HA,Extreme_Weather_Events,Irrigation_Access_%,Pesticide_Use_KG_per_HA,Fertilizer_Use_KG_per_HA,Soil_Health_Index,Adaptation_Strategies,Economic_Impact_Million_USD
0,2001,India,West Bengal,Corn,1.55,447.06,15.22,1.737,8,14.54,10.08,14.78,83.25,Water Management,808.13
1,2024,China,North,Corn,3.23,2913.57,29.82,1.737,8,11.05,33.06,23.25,54.02,Crop Rotation,616.22
2,2001,France,Ile-de-France,Wheat,21.11,1301.74,25.75,1.719,5,84.42,27.41,65.53,67.78,Water Management,796.96
3,2001,Canada,Prairies,Coffee,27.85,1154.36,13.91,3.89,5,94.06,14.38,87.58,91.39,No Adaptation,790.32
4,1998,India,Tamil Nadu,Sugarcane,2.19,1627.48,11.81,1.08,9,95.75,44.35,88.08,49.61,Crop Rotation,401.72


In [None]:
# information on each column
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 15 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Year                         10000 non-null  int64  
 1   Country                      10000 non-null  object 
 2   Region                       10000 non-null  object 
 3   Crop_Type                    10000 non-null  object 
 4   Average_Temperature_C        10000 non-null  float64
 5   Total_Precipitation_mm       10000 non-null  float64
 6   CO2_Emissions_MT             10000 non-null  float64
 7   Crop_Yield_MT_per_HA         10000 non-null  float64
 8   Extreme_Weather_Events       10000 non-null  int64  
 9   Irrigation_Access_%          10000 non-null  float64
 10  Pesticide_Use_KG_per_HA      10000 non-null  float64
 11  Fertilizer_Use_KG_per_HA     10000 non-null  float64
 12  Soil_Health_Index            10000 non-null  float64
 13  Adaptation_Strate

In [None]:
# missing values per column
data.isnull().sum()

Unnamed: 0,0
Year,0
Country,0
Region,0
Crop_Type,0
Average_Temperature_C,0
Total_Precipitation_mm,0
CO2_Emissions_MT,0
Crop_Yield_MT_per_HA,0
Extreme_Weather_Events,0
Irrigation_Access_%,0


In [None]:
# statictical description of each column
data.describe(include='all')

Unnamed: 0,Year,Country,Region,Crop_Type,Average_Temperature_C,Total_Precipitation_mm,CO2_Emissions_MT,Crop_Yield_MT_per_HA,Extreme_Weather_Events,Irrigation_Access_%,Pesticide_Use_KG_per_HA,Fertilizer_Use_KG_per_HA,Soil_Health_Index,Adaptation_Strategies,Economic_Impact_Million_USD
count,10000.0,10000,10000,10000,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000,10000.0
unique,,10,34,10,,,,,,,,,,5,
top,,Australia,South,Wheat,,,,,,,,,,Water Management,
freq,,1032,754,1047,,,,,,,,,,2049,
mean,2007.0887,,,,15.241299,1611.663834,15.246608,2.240017,4.9809,55.248332,24.955735,49.973708,64.901278,,674.269658
std,10.084245,,,,11.466955,805.016815,8.589423,0.998342,3.165808,25.988305,14.490962,28.711027,20.195882,,414.591431
min,1990.0,,,,-4.99,200.15,0.5,0.45,0.0,10.01,0.0,0.01,30.0,,47.84
25%,1999.0,,,,5.43,925.6975,7.76,1.449,2.0,32.6775,12.5275,25.39,47.235,,350.545
50%,2007.0,,,,15.175,1611.16,15.2,2.17,5.0,55.175,24.93,49.635,64.65,,583.92
75%,2016.0,,,,25.34,2306.9975,22.82,2.93,8.0,77.5825,37.47,74.825,82.4725,,917.505


In [None]:
# checks for duplicates
data.duplicated().sum()

np.int64(0)

In [None]:
# countries
data['Country'].unique()

array(['India', 'China', 'France', 'Canada', 'USA', 'Argentina',
       'Australia', 'Nigeria', 'Russia', 'Brazil'], dtype=object)

In [None]:
# crops
data['Crop_Type'].unique()

array(['Corn', 'Wheat', 'Coffee', 'Sugarcane', 'Fruits', 'Rice', 'Barley',
       'Vegetables', 'Soybeans', 'Cotton'], dtype=object)

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 15 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Year                         10000 non-null  int64  
 1   Country                      10000 non-null  object 
 2   Region                       10000 non-null  object 
 3   Crop_Type                    10000 non-null  object 
 4   Average_Temperature_C        10000 non-null  float64
 5   Total_Precipitation_mm       10000 non-null  float64
 6   CO2_Emissions_MT             10000 non-null  float64
 7   Crop_Yield_MT_per_HA         10000 non-null  float64
 8   Extreme_Weather_Events       10000 non-null  int64  
 9   Irrigation_Access_%          10000 non-null  float64
 10  Pesticide_Use_KG_per_HA      10000 non-null  float64
 11  Fertilizer_Use_KG_per_HA     10000 non-null  float64
 12  Soil_Health_Index            10000 non-null  float64
 13  Adaptation_Strate

In [None]:
for column in ['Country', 'Region', 'Crop_Type', 'Adaptation_Strategies']:
  data[column] = data[column].astype(str).str.lower().str.strip()

In [None]:
data = data.drop('Adaptation_Strategies', axis=1)

In [None]:
average_temp_per_crop_per_year = data.groupby(['Year', 'Country', 'Region', 'Crop_Type'])['Average_Temperature_C'].mean().reset_index()
data['Average_Temperature_C'] = average_temp_per_crop_per_year['Average_Temperature_C']

In [None]:
total_precipitation_per_year = data.groupby(['Year', 'Country', 'Region', 'Crop_Type'])['Total_Precipitation_mm'].sum().reset_index()
data['Total_Precipitation_mm'] = total_precipitation_per_year['Total_Precipitation_mm']

In [None]:
total_co2_emissions_per_year = data.groupby(['Year', 'Country', 'Region', 'Crop_Type'])['CO2_Emissions_MT'].sum().reset_index()
data['CO2_Emissions_MT'] = total_co2_emissions_per_year['CO2_Emissions_MT']

In [None]:
total_yield_per_year = data.groupby(['Year', 'Country', 'Region', 'Crop_Type'])['Crop_Yield_MT_per_HA'].sum().reset_index()
data['Crop_Yield_MT_per_HA'] = total_yield_per_year['Crop_Yield_MT_per_HA']

In [None]:
total_extreme_weather_events_per_year = data.groupby(['Year', 'Country', 'Region', 'Crop_Type'])['Extreme_Weather_Events'].sum().reset_index()
data['Extreme_Weather_Events'] = total_extreme_weather_events_per_year['Extreme_Weather_Events']

In [None]:
average_irrigation_access_per_year = data.groupby(['Year', 'Country', 'Region', 'Crop_Type'])['Irrigation_Access_%'].mean().reset_index()
data['Irrigation_Access_%'] = average_irrigation_access_per_year['Irrigation_Access_%']

In [None]:
total_pesticides_per_year = data.groupby(['Year', 'Country', 'Region', 'Crop_Type'])['Pesticide_Use_KG_per_HA'].sum().reset_index()
data['Pesticide_Use_KG_per_HA'] = total_pesticides_per_year['Pesticide_Use_KG_per_HA']

In [None]:
total_fertilizer_per_year = data.groupby(['Year', 'Country', 'Region', 'Crop_Type'])['Fertilizer_Use_KG_per_HA'].sum().reset_index()
data['Fertilizer_Use_KG_per_HA'] = total_fertilizer_per_year['Fertilizer_Use_KG_per_HA']

In [None]:
average_soil_health_index_per_year = data.groupby(['Year', 'Country', 'Region', 'Crop_Type'])['Soil_Health_Index'].mean().reset_index()
data['Soil_Health_Index'] = average_soil_health_index_per_year['Soil_Health_Index']

In [None]:
total_economic_impact_per_year = data.groupby(['Year', 'Country', 'Region', 'Crop_Type'])['Economic_Impact_Million_USD'].sum().reset_index()
data['Economic_Impact_Million_USD'] = total_economic_impact_per_year['Economic_Impact_Million_USD']

In [None]:
cleaned_data = data.copy().drop_duplicates().dropna()
cleaned_data.to_csv('climate_change.csv', index=False)

In [None]:
cleaned_data.describe(include='all')

Unnamed: 0,Year,Country,Region,Crop_Type,Average_Temperature_C,Total_Precipitation_mm,CO2_Emissions_MT,Crop_Yield_MT_per_HA,Extreme_Weather_Events,Irrigation_Access_%,Pesticide_Use_KG_per_HA,Fertilizer_Use_KG_per_HA,Soil_Health_Index,Economic_Impact_Million_USD
count,7133.0,7133,7133,7133,7133.0,7133.0,7133.0,7133.0,7133.0,7133.0,7133.0,7133.0,7133.0,7133.0
unique,,10,34,10,,,,,,,,,,
top,,nigeria,northeast,vegetables,,,,,,,,,,
freq,,753,558,753,,,,,,,,,,
mean,2007.016122,,,,15.249887,2259.447405,21.374748,3.140357,6.982896,55.201972,34.98631,70.059874,64.865106,945.28201
std,10.080656,,,,10.482118,1448.718826,14.469436,1.922729,4.961403,23.677546,23.882771,46.90375,18.520331,666.709416
min,1990.0,,,,-4.99,200.15,0.5,0.45,0.0,10.04,0.03,0.01,30.0,53.76
25%,1998.0,,,,6.98,1195.67,10.6,1.755,3.0,36.61,17.24,35.21,50.385,444.39
50%,2007.0,,,,15.18,2080.98,19.73,2.7,7.0,55.38,32.49,65.42,64.635,785.0
75%,2016.0,,,,23.58,2854.44,28.15,4.032,10.0,73.96,46.5,92.77,79.38,1276.47


In [None]:
!pip install ydata-profiling

Collecting ydata-profiling
  Downloading ydata_profiling-4.16.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting visions<0.8.2,>=0.7.5 (from visions[type_image_path]<0.8.2,>=0.7.5->ydata-profiling)
  Downloading visions-0.8.1-py3-none-any.whl.metadata (11 kB)
Collecting htmlmin==0.1.12 (from ydata-profiling)
  Downloading htmlmin-0.1.12.tar.gz (19 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting phik<0.13,>=0.11.1 (from ydata-profiling)
  Downloading phik-0.12.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Collecting multimethod<2,>=1.4 (from ydata-profiling)
  Downloading multimethod-1.12-py3-none-any.whl.metadata (9.6 kB)
Collecting imagehash==4.3.1 (from ydata-profiling)
  Downloading ImageHash-4.3.1-py2.py3-none-any.whl.metadata (8.0 kB)
Collecting dacite>=1.8 (from ydata-profiling)
  Downloading dacite-1.9.2-py3-none-any.whl.metadata (17 kB)
Collecting PyWavelets (from imagehash==4.3.1->ydata-profiling)
  Downloading pywavelets-1.

In [None]:
from ydata_profiling import ProfileReport

profile = ProfileReport(cleaned_data, title="Climate Change Profile Report")
profile.to_file("climate_change_profile.html")

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]


  0%|          | 0/14 [00:00<?, ?it/s][A
 21%|██▏       | 3/14 [00:00<00:00, 21.79it/s][A
 43%|████▎     | 6/14 [00:00<00:00, 24.58it/s][A
 64%|██████▍   | 9/14 [00:00<00:00, 26.48it/s][A
100%|██████████| 14/14 [00:00<00:00, 29.33it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]