In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# 🌾📊 **Agriculture Production Forecasting Using Machine Learning: A Predictive Analysis on Global Rice Production Trends**


## 📘 Introduction

Agriculture plays a vital role in ensuring food security across the globe. This project focuses on analyzing historical data of rice production and building a machine learning model to predict future rice production trends.

Using Linear Regression, a fundamental supervised learning technique, we aim to understand production behavior over the years and forecast values for future planning. This type of predictive modeling is crucial for stakeholders such as policy makers, agricultural organizations, and researchers.

We use Python libraries like Pandas, Matplotlib, and Scikit-learn to clean the data, build the model, and visualize the trends.


## 🛠️ Tools & Technologies Used

- **Python** – programming language  
- **Pandas** – data manipulation  
- **Matplotlib** – data visualization  
- **Seaborn** – advanced visualizations  
- **Scikit-learn** – machine learning algorithms  
- **Jupyter Notebook** – development environment  

## 📂 Dataset Description

The dataset includes food production data for multiple countries across several years. It contains the following key columns:

- `year`: Year of observation  
- `country`: Name of the country  
- `rice production`: Quantity of rice produced  
- `wheat production`: Quantity of wheat produced  
- `vegetable production`: Quantity of vegetables produced  

This dataset helps us study how production has evolved globally and build models to estimate future values.


## 🧼 Data Cleaning and Preprocessing

- Loaded the dataset using Pandas
- Removed or handled missing values
- Selected relevant features for model training
- Converted non-numeric values (if needed)


## 📊 Exploratory Data Analysis (EDA)

We analyze the data trends, outliers, correlations, and distributions using various plots:
- Line chart of rice production over the years
- Top 10 countries by average rice production
- Correlation between rice and wheat production
- Heatmap of numerical feature correlations

In [None]:
import pandas as pd

# Load dataset
data = pd.read_csv("/kaggle/input/world-food-production/world_food_production new.csv")

# Remove missing values
data.dropna(inplace=True)
print(data.head())
print(data.columns)

In [None]:
# Independent variable
X = data[['year']]  

# Dependent variable
y = data['rice_production']  

## 🤖 Model Building (Linear Regression)

We use **Linear Regression** from scikit-learn to train a model on historical rice production data:

- `X`: Year (independent variable)  
- `y`: Rice production (dependent variable)  
- Fit model and predict future values (2024–2030)


In [None]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X, y)

##  Visualizing Predictions

We plot the actual vs predicted values and show the trend line. This helps us understand how well the model fits and future expectations.


### 📈 Rice Production Trend Analysis

The overall trend of rice production over the years reveals a **steady increase**, reflecting technological improvements, policy initiatives, and agricultural development. This upward trajectory indicates global efforts in ensuring food security and improving yield efficiency.


In [None]:
import matplotlib.pyplot as plt
y_pred = model.predict(X)
plt.scatter(X, y, color='blue', label='Actual')
plt.plot(X, y_pred, color='red', label='Regression Line')
plt.xlabel("Year")
plt.ylabel("Rice Production")
plt.title("Rice Production Trend Analysis")
plt.legend()
plt.show()


### 🌾 Top Countries by Average Rice Production

Countries like **China, India, and Indonesia** consistently appear at the top of rice production rankings. These countries contribute significantly to global rice supply due to large arable land, favorable climate, and a rice-based diet culture. This insight helps identify key players in the global rice market.


In [None]:
top_countries = data.groupby('Country')['rice_production'].mean().sort_values(ascending=False).head(10)
print(top_countries)


### 🌟 Rice Production Trend of the Top Country

This line chart visualizes the rice production trend of the **top-producing country** (e.g., China or India). The plot helps to understand how rice production has changed over the years in that specific country.

- It highlights the country's **year-wise contribution** to global rice production.
- A steady or steep upward trend reflects **consistent growth**, while fluctuations may indicate climate, policy, or economic impacts.
- This type of country-specific analysis can support **regional planning and decision-making** in agriculture.


In [None]:
top_country = top_countries.index[0]
country_data = data[data['Country'] == top_country]
plt.plot(country_data['year'], country_data['rice_production'], marker='o')
plt.title(f"Rice Production Trend:")
plt.xlabel("Year")
plt.ylabel("Rice Production")
plt.grid()
plt.show()


### 🔗 Correlation Between Rice and Wheat Production

The scatter plot shows a **moderate positive correlation** between rice and wheat production, especially in agriculturally rich countries. This suggests that regions with robust agricultural infrastructure and resources tend to produce both crops in large quantities. However, climatic and regional preferences may cause variations.


In [None]:
import seaborn as sns

sns.scatterplot(x='rice_production', y='wheat_production', data=data)
plt.title("Correlation between Rice and Wheat Production")
plt.xlabel("Rice Production")
plt.ylabel("Wheat Production")
plt.grid()
plt.show()

# Optional: Correlation coefficient
correlation = data[['rice_production', 'wheat_production']].corr()
print(correlation)

### 🌍 Global Agricultural Production Over Time

A line graph comparing rice, wheat, and vegetable production shows that **all three food categories have grown**, with **rice and wheat showing the most consistent upward trends**. This highlights global progress in food production, which is essential to meet the growing population demands.


In [None]:
total_production = data.groupby('year')[['rice_production', 'wheat_production', 'vegetable_production']].sum()
total_production.plot(figsize=(10, 6), marker='o')
plt.title("Global Agricultural Production Over Time")
plt.xlabel("Year")
plt.ylabel("Production Volume")
plt.grid(True)
plt.show()


### 🔮 Future Rice Production Prediction (2024–2030)

Based on the linear regression model, rice production is predicted to **continue increasing steadily in the coming years**. These estimates are useful for government planning, export/import decisions, and food security analysis. However, real-world factors like climate change, pests, or policies may influence actual outcomes.


In [None]:
import numpy as np

# Predict future years
future_years = pd.DataFrame({'year': np.arange(2024, 2031)})
future_predictions = model.predict(future_years)

# Visualize future predictions
plt.plot(X, y, label='Historical Data')
plt.plot(future_years, future_predictions, 'ro--', label='Predicted (2024–2030)')
plt.xlabel("Year")
plt.ylabel("Rice Production")
plt.title("Future Rice Production Prediction")
plt.legend()
plt.grid()
plt.show()

# Optional: Print future predictions
future_years['predicted production'] = future_predictions
print(future_years)


### 🧪 Feature Correlation Heatmap

The heatmap shows the **correlation strength between all numeric features**. It helps identify how different types of crop production (like rice, wheat, and vegetables) relate to one another. Features with strong correlation can support multi-variable modeling and more complex predictions in future versions of the project.



In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Select only numeric columns
numeric_data = data.select_dtypes(include=['float64', 'int64'])

# Plot heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(numeric_data.corr(), annot=True, cmap="YlGnBu")
plt.title("Feature Correlation Heatmap")
plt.show()


### 🥧 Rice Production Share by Country (Pie Chart)

This pie chart visualizes the **top 10 countries** contributing to total global rice production. It helps in identifying which countries dominate the rice production landscape.

**Insight:**  
Countries like China and India contribute significantly to global rice supply. Understanding this distribution is important for analyzing global trade, food security, and export policies.

In [None]:
top_total = data.groupby('Country')['rice_production'].sum().sort_values(ascending=False).head(10)
plt.figure(figsize=(8, 8))
plt.pie(top_total, labels=top_total.index, autopct='%1.1f%%', startangle=140)
plt.title("Top 10 Countries by Total Rice Production Share")
plt.axis('equal')
plt.show()

### 📉 Global Rice Production – 5-Year Moving Average Trend

To better observe long-term trends, a **moving average** is applied over annual production data. This smooths out short-term fluctuations and highlights the underlying trend.

**Insight:**  
The production trend shows consistent long-term growth. This indicates sustained efforts in agriculture through improved technology, seeds, and farming techniques.


In [None]:
global_trend = data.groupby('year')['rice_production'].sum().reset_index()
global_trend['moving_avg'] = global_trend['rice_production'].rolling(window=5).mean()

plt.plot(global_trend['year'], global_trend['rice_production'], label='Annual Production', alpha=0.4)
plt.plot(global_trend['year'], global_trend['moving_avg'], label='5-Year Moving Average', color='red')
plt.title("Global Rice Production with Moving Average")
plt.xlabel("Year")
plt.ylabel("Total Rice Production")
plt.legend()
plt.grid()
plt.show()

### 🌾 Rice vs Wheat Production Over Time

This dual line chart compares **global rice and wheat production** trends from year to year.

**Insight:**  
Both crops show increasing trends, but their rate of growth may differ. The comparison helps assess whether food security is diversifying or relying on specific crops.


In [None]:
trend = data.groupby('year')[['rice_production', 'wheat_production']].sum().reset_index()

plt.plot(trend['year'], trend['rice_production'], label='Rice Production', marker='o')
plt.plot(trend['year'], trend['wheat_production'], label='Wheat Production', marker='s')
plt.title("Global Rice vs Wheat Production Over Time")
plt.xlabel("Year")
plt.ylabel("Production Quantity")
plt.legend()
plt.grid()
plt.show()

### 📈 Year-over-Year Growth Rate of Global Rice Production

This line chart shows the **percentage change in global rice production** from one year to the next.

- Positive values indicate years with **growth**, while negative values show **decline** compared to the previous year.
- A horizontal dashed line at 0% separates growth from decline, making it easy to interpret.

**Insight:**  
This visualization helps identify:
- **Strong growth periods** in global rice production
- **Crisis or disruption years** where production fell (possibly due to climate events, economic issues, or policy changes)
- **Volatility or stability** in global agriculture output year by year

It provides a deeper look beyond absolute production, focusing on **rate of change and momentum**.


In [None]:
rice_growth = data.groupby('year')['rice_production'].sum().pct_change() * 100

plt.figure(figsize=(10, 5))
plt.plot(rice_growth.index, rice_growth.values, marker='o', color='purple')
plt.axhline(0, color='black', linestyle='--')
plt.title("Year-over-Year Growth Rate of Global Rice Production")
plt.xlabel("Year")
plt.ylabel("Growth Rate (%)")
plt.grid(True)
plt.show()

### 🌍 Stacked Area Chart – Rice Production Over Time (Top 5 Countries)

This stacked area chart illustrates how the **top 5 rice-producing countries** have contributed to global rice production over the years.

- Each colored area represents one country’s annual rice production.
- The height of the stacked areas shows the **combined global total**, while the **individual areas** reflect each country's **relative share** year by year.

**Insight:**  
This chart reveals:
- Which countries have **consistently dominated** rice production
- How their contributions have **increased, decreased, or remained stable** over time
- Whether the **gap between countries is widening or narrowing**

It’s a great way to understand **global rice supply dynamics** and shifts in agricultural dominance across countries.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Step 1: Clean invalid country names
data = data[data['Country'].notna()]  # Remove NaN
data = data[data['Country'].astype(str) != '0']  # Remove '0'
data = data[~data['Country'].astype(str).str.isnumeric()]  # Remove other numeric country names

# Step 2: Find top 5 valid countries by total rice production
top_countries_list = data.groupby('Country')['rice_production'].sum().sort_values(ascending=False).head(5).index

# Step 3: Filter dataset to only include those top 5 countries
filtered_data = data[data['Country'].isin(top_countries_list)]

# Step 4: Create pivot table (year vs country)
pivot_data = filtered_data.pivot_table(index='year', columns='Country', values='rice_production', aggfunc='sum')

# Step 5: Plot stacked area chart
pivot_data.plot.area(figsize=(12, 6), colormap='Set3', alpha=0.8)
plt.title("Rice Production Over Time (Top 5 Countries)")
plt.xlabel("Year")
plt.ylabel("Production Volume")
plt.legend(title="Country")
plt.grid(True)
plt.show()


### 📦 Box Plot – Yearly Distribution of Rice Production

This box plot shows the **distribution of rice production values** across all countries for each year in the dataset.

- The **box** represents the interquartile range (IQR), showing the middle 50% of values.
- The **line inside the box** is the median.
- The **whiskers** show variability outside the upper and lower quartiles.
- **Dots beyond the whiskers** indicate outliers — countries with unusually high or low production in that year.

**Insight:**  
This plot helps identify:
- **Variation** in production among countries each year
- **Outlier-producing countries**
- Trends in **data spread and skewness** over time  
It also reveals whether global rice production is becoming more consistent or more variable across nations.

In [None]:
import seaborn as sns

plt.figure(figsize=(14, 6))
sns.boxplot(x='year', y='rice_production', data=data)
plt.title("Distribution of Rice Production by Year")
plt.xlabel("Year")
plt.ylabel("Rice Production")
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()


### 🧯 Outlier Detection in Global Rice Production (Z-Score Method)

This chart identifies **anomalous years** in global rice production using the Z-score technique. Z-scores measure how far a data point deviates from the mean.

- **Red dots** represent years where production deviated significantly from the average (above or below 2 standard deviations).
- This method highlights **unusual events** such as natural disasters, agricultural revolutions, or global crises (e.g., pandemics, wars).

**Insight:**  
Outliers in agricultural data can help explain disruptions in food supply, changes in export/import patterns, and identify years worth deeper investigation. These years may also correlate with external events like droughts, floods, or policy changes.


In [None]:
from scipy.stats import zscore

global_total = data.groupby('year')['rice_production'].sum()
z_scores = zscore(global_total)

outlier_years = global_total[(z_scores > 2) | (z_scores < -2)]

plt.plot(global_total.index, global_total.values, label='Rice Production')
plt.scatter(outlier_years.index, outlier_years.values, color='red', label='Outliers')
plt.title("Outlier Detection in Rice Production")
plt.xlabel("Year")
plt.ylabel("Production Volume")
plt.legend()
plt.grid(True)
plt.show()

### 🧭 Radar Chart – Average Crop Production Comparison (Top 5 Countries)

This radar chart compares the **average production of rice, wheat, and vegetables** among the top 5 rice-producing countries.

- Each axis represents a different crop: **Rice**, **Wheat**, and **Vegetables**.
- Each polygon corresponds to a country’s average production levels across the three crops.
- Countries with larger, well-rounded shapes indicate **balanced agricultural output**, while more pointed shapes show **specialization in specific crops**.

**Insight:**  
This visualization helps identify:
- **Crop diversity vs. specialization** across countries
- How major rice producers perform in **other key agricultural areas**
- Which countries have **strong overall food production systems**

Radar charts are useful for multi-variable comparisons in a single view and give a quick understanding of a country's agricultural strengths.


In [None]:
import numpy as np

# Get top 5 countries for radar
top_5 = data.groupby('Country')['rice_production'].sum().sort_values(ascending=False).head(5).index
avg_data = data[data['Country'].isin(top_5)].groupby('Country')[['rice_production', 'wheat_production', 'vegetable_production']].mean()

# Radar setup
labels = ['Rice', 'Wheat', 'Vegetables']
angles = np.linspace(0, 2 * np.pi, len(labels), endpoint=False).tolist()
angles += angles[:1]

plt.figure(figsize=(8, 8))

for country in avg_data.index:
    values = avg_data.loc[country].tolist()
    values += values[:1]
    plt.polar(angles, values, label=country, marker='o')

plt.xticks(angles[:-1], labels)
plt.title("Average Crop Production Comparison (Top 5 Countries)")
plt.legend(loc='upper right')
plt.show()


## ✅ Conclusion

This project successfully explored and analyzed global rice production trends using a combination of **data visualization**, **statistical analysis**, and **machine learning (Linear Regression)**.

We derived meaningful insights from a multi-dimensional dataset that included rice, wheat, and vegetable production across various countries and years. Key achievements include:

- Cleaning and preprocessing real-world agricultural data
- Identifying **top-performing countries** in rice production
- Visualizing long-term trends, yearly fluctuations, and country-wise comparisons
- Detecting **anomalies and outliers** using statistical methods like **Z-score**
- Predicting future rice production using **Linear Regression**
- Comparing crop diversity using **radar charts**, **box plots**, and **stacked area charts**

**Final Insight:**  
Global rice production has shown a steady upward trend, but **fluctuations, regional differences, and outliers** reveal the impact of external factors like climate, economy, and policy. Predictive modeling can assist **governments, farmers, and organizations** in making better decisions about food security, resource planning, and agricultural investments.

---

### 📌 Future Enhancements
- Incorporate **time-series forecasting** models like ARIMA or Prophet for more accurate predictions.
- Analyze the impact of **external features** such as rainfall, temperature, or fertilizer usage.
- Build an **interactive dashboard** using Streamlit or Plotly for dynamic exploration.
- Extend the project to include **yield per capita** or **crop vs climate analysis**.

This project demonstrates the power of **data science and machine learning in agriculture**, offering a foundation for more advanced research and impactful solutions.
