# 🇹🇷 Turkey Agricultural Data Exploration

**AgroDataZoom Portfolio Project**

---

## Overview

This notebook provides a comprehensive exploration of Turkey's agricultural data sourced from TÜİK (Turkish Statistical Institute). As part of the AgroDataZoom global agricultural analysis project, this analysis focuses on understanding Turkey's agricultural landscape, production patterns, and regional variations.

### Objectives
1. **Data Quality Assessment**: Evaluate TÜİK agricultural datasets
2. **Production Analysis**: Analyze crop production trends over time
3. **Regional Comparison**: Compare agricultural performance across provinces
4. **Seasonality Patterns**: Identify seasonal trends in agricultural production
5. **Economic Impact**: Assess the economic significance of different agricultural sectors

### Dataset Information
- **Source**: TÜİK (Turkish Statistical Institute)
- **Coverage**: National and provincial agricultural statistics
- **Time Period**: Multi-year agricultural production data
- **Categories**: Crop production, livestock, agricultural trade

---

**Author**: Data Scientist | Agricultural Analytics Specialist  
**Location**: Canada 🇨🇦  
**Project**: Global Food Security & Agricultural Trends Analysis

## 1. Import Required Libraries

In [None]:
# Core Data Science Libraries
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Visualization Libraries
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Statistical Analysis
from scipy import stats
import statsmodels.api as sm

# System and Path Management
import os
import sys
from pathlib import Path

# Add project root to Python path for importing custom modules
project_root = Path.cwd().parent.parent
sys.path.append(str(project_root))

# Import custom modules
try:
    from src.data_processing import TuikDataProcessor
    from src.visualization import TurkeyVisualizer
    from src.utils import setup_logging, create_metadata
    from config.config import TUIK_DATA_DIR, VIZ_CONFIG
    print("✅ Custom modules imported successfully")
except ImportError as e:
    print(f"⚠️ Could not import custom modules: {e}")
    print("📝 Note: Custom modules can be used when data processing utilities are needed")

# Setup logging
logger = setup_logging("INFO")

# Set visualization style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("📚 All libraries imported successfully!")
print(f"🐍 Python version: {sys.version}")
print(f"🐼 Pandas version: {pd.__version__}")
print(f"📊 Matplotlib version: {plt.matplotlib.__version__}")


## 2. Load and Inspect Data

In this section, we'll load TÜİK agricultural datasets and perform initial data inspection to understand the structure, quality, and characteristics of our data.

In [None]:
# Define data paths
data_dir = project_root / "data" / "raw" / "turkey" / "tuik"
print(f"📁 Looking for data in: {data_dir}")

# Check if data directory exists
if not data_dir.exists():
    print(f"⚠️ Data directory does not exist: {data_dir}")
    print("📝 Please download TÜİK data and place it in the appropriate directory")
    print("📖 Refer to data/raw/turkey/tuik/README.md for data collection guidelines")
else:
    print(f"✅ Data directory found: {data_dir}")
    
    # List available data files
    data_files = list(data_dir.glob("*.xlsx")) + list(data_dir.glob("*.xls")) + list(data_dir.glob("*.csv"))
    
    if data_files:
        print(f"\n📊 Found {len(data_files)} data files:")
        for file in data_files:
            print(f"  • {file.name}")
    else:
        print("📝 No data files found. Please add TÜİK datasets to begin analysis.")

# Sample data loading (replace with actual TÜİK data files)
print("\n" + "="*60)
print("📝 SAMPLE DATA LOADING TEMPLATE")
print("="*60)

# Example: Loading agricultural production data
# Uncomment and modify when actual data is available
"""
# Load crop production data
crop_production_file = data_dir / "tuik_agriculture_crop_production_2010_2023.xlsx"

if crop_production_file.exists():
    print(f"📁 Loading: {crop_production_file.name}")
    
    # Read Excel file with proper encoding
    df_crops = pd.read_excel(crop_production_file, sheet_name=0)
    
    print(f"✅ Data loaded successfully!")
    print(f"📊 Shape: {df_crops.shape}")
    print(f"📅 Columns: {list(df_crops.columns)}")
    
    # Display first few rows
    display(df_crops.head())
    
else:
    print(f"⚠️ File not found: {crop_production_file.name}")
"""

# Create sample data for demonstration purposes
print("\n🔄 Creating sample data for demonstration...")

# Sample Turkey agricultural data structure
sample_data = {
    'Year': list(range(2015, 2024)) * 5,
    'Province': ['Ankara', 'İstanbul', 'İzmir', 'Antalya', 'Konya'] * 9,
    'Crop_Type': ['Wheat', 'Barley', 'Corn', 'Rice', 'Cotton'] * 9,
    'Production_Tons': np.random.randint(50000, 500000, 45),
    'Cultivated_Area_Hectares': np.random.randint(10000, 100000, 45),
    'Yield_per_Hectare': np.random.uniform(2.5, 8.5, 45)
}

df_sample = pd.DataFrame(sample_data)
print("✅ Sample data created for demonstration")
print(f"📊 Sample data shape: {df_sample.shape}")

# Display sample data
print("\n📋 Sample Data Preview:")
display(df_sample.head(10))


## 3. Data Preprocessing

Clean and prepare the data for analysis by handling missing values, standardizing formats, and ensuring data quality.

In [None]:
# Data preprocessing and quality assessment
print("🔍 DATA QUALITY ASSESSMENT")
print("="*50)

# Basic information about the dataset
print(f"📊 Dataset Shape: {df_sample.shape}")
print(f"📝 Columns: {list(df_sample.columns)}")
print(f"🗓️ Time Range: {df_sample['Year'].min()} - {df_sample['Year'].max()}")
print(f"🌍 Provinces: {df_sample['Province'].nunique()} unique provinces")
print(f"🌾 Crop Types: {df_sample['Crop_Type'].nunique()} unique crop types")

# Check data types
print("\n📋 Data Types:")
print(df_sample.dtypes)

# Check for missing values
print("\n❓ Missing Values:")
missing_values = df_sample.isnull().sum()
if missing_values.sum() == 0:
    print("✅ No missing values found")
else:
    print(missing_values[missing_values > 0])

# Check for duplicates
duplicates = df_sample.duplicated().sum()
print(f"\n🔄 Duplicate Rows: {duplicates}")

# Statistical summary
print("\n📈 Statistical Summary:")
display(df_sample.describe())

# Check for outliers using IQR method
def detect_outliers(df, column):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]
    return len(outliers)

print("\n🎯 Outlier Detection:")
numeric_columns = ['Production_Tons', 'Cultivated_Area_Hectares', 'Yield_per_Hectare']
for col in numeric_columns:
    outlier_count = detect_outliers(df_sample, col)
    print(f"  • {col}: {outlier_count} outliers detected")

# Data cleaning and standardization
print("\n🧹 DATA CLEANING")
print("="*50)

# Create a clean copy of the data
df_clean = df_sample.copy()

# Ensure proper data types
df_clean['Year'] = df_clean['Year'].astype(int)
df_clean['Province'] = df_clean['Province'].astype(str)
df_clean['Crop_Type'] = df_clean['Crop_Type'].astype(str)

# Round numerical values to appropriate precision
df_clean['Production_Tons'] = df_clean['Production_Tons'].round(0)
df_clean['Cultivated_Area_Hectares'] = df_clean['Cultivated_Area_Hectares'].round(0)
df_clean['Yield_per_Hectare'] = df_clean['Yield_per_Hectare'].round(2)

# Add calculated fields
df_clean['Production_per_1000_Tons'] = df_clean['Production_Tons'] / 1000
df_clean['Area_per_1000_Hectares'] = df_clean['Cultivated_Area_Hectares'] / 1000

print("✅ Data cleaning completed")
print(f"📊 Clean dataset shape: {df_clean.shape}")

# Display cleaned data sample
print("\n📋 Cleaned Data Sample:")
display(df_clean.head())


## 4. Exploratory Data Analysis

Explore the agricultural data through visualizations and statistical analysis to identify patterns, trends, and insights.

In [None]:
# 📊 EXPLORATORY DATA ANALYSIS
print("📊 EXPLORATORY DATA ANALYSIS")
print("="*60)

# Set up plotting parameters
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

# 1. TEMPORAL TRENDS ANALYSIS
print("\n🕒 1. TEMPORAL TRENDS ANALYSIS")
print("-" * 40)

# Production trends over time
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('🇹🇷 Turkey Agricultural Production Trends Over Time', fontsize=16, fontweight='bold')

# Total production by year
yearly_production = df_clean.groupby('Year')['Production_Tons'].sum()
axes[0,0].plot(yearly_production.index, yearly_production.values, marker='o', linewidth=2, markersize=6)
axes[0,0].set_title('Total Production by Year')
axes[0,0].set_xlabel('Year')
axes[0,0].set_ylabel('Production (Tons)')
axes[0,0].grid(True, alpha=0.3)

# Average yield by year
yearly_yield = df_clean.groupby('Year')['Yield_per_Hectare'].mean()
axes[0,1].plot(yearly_yield.index, yearly_yield.values, marker='s', color='green', linewidth=2, markersize=6)
axes[0,1].set_title('Average Yield per Hectare by Year')
axes[0,1].set_xlabel('Year')
axes[0,1].set_ylabel('Yield (Tons/Hectare)')
axes[0,1].grid(True, alpha=0.3)

# Cultivated area by year
yearly_area = df_clean.groupby('Year')['Cultivated_Area_Hectares'].sum()
axes[1,0].plot(yearly_area.index, yearly_area.values, marker='^', color='orange', linewidth=2, markersize=6)
axes[1,0].set_title('Total Cultivated Area by Year')
axes[1,0].set_xlabel('Year')
axes[1,0].set_ylabel('Area (Hectares)')
axes[1,0].grid(True, alpha=0.3)

# Production by crop type over time
crop_trends = df_clean.groupby(['Year', 'Crop_Type'])['Production_Tons'].sum().unstack()
for crop in crop_trends.columns:
    axes[1,1].plot(crop_trends.index, crop_trends[crop], marker='o', label=crop, linewidth=2)
axes[1,1].set_title('Production Trends by Crop Type')
axes[1,1].set_xlabel('Year')
axes[1,1].set_ylabel('Production (Tons)')
axes[1,1].legend()
axes[1,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 2. REGIONAL ANALYSIS
print("\n🗺️ 2. REGIONAL ANALYSIS")
print("-" * 40)

# Provincial production comparison
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Total production by province
provincial_production = df_clean.groupby('Province')['Production_Tons'].sum().sort_values(ascending=True)
axes[0].barh(provincial_production.index, provincial_production.values, color='skyblue')
axes[0].set_title('Total Production by Province')
axes[0].set_xlabel('Production (Tons)')

# Average yield by province
provincial_yield = df_clean.groupby('Province')['Yield_per_Hectare'].mean().sort_values(ascending=True)
axes[1].barh(provincial_yield.index, provincial_yield.values, color='lightgreen')
axes[1].set_title('Average Yield by Province')
axes[1].set_xlabel('Yield (Tons/Hectare)')

plt.tight_layout()
plt.show()

# 3. CROP TYPE ANALYSIS
print("\n🌾 3. CROP TYPE ANALYSIS")
print("-" * 40)

# Create subplots for crop analysis
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('🌾 Crop Type Analysis', fontsize=16, fontweight='bold')

# Production by crop type (pie chart)
crop_production = df_clean.groupby('Crop_Type')['Production_Tons'].sum()
axes[0,0].pie(crop_production.values, labels=crop_production.index, autopct='%1.1f%%', startangle=90)
axes[0,0].set_title('Production Share by Crop Type')

# Yield comparison by crop type
crop_yield = df_clean.groupby('Crop_Type')['Yield_per_Hectare'].mean().sort_values(ascending=True)
axes[0,1].barh(crop_yield.index, crop_yield.values, color='gold')
axes[0,1].set_title('Average Yield by Crop Type')
axes[0,1].set_xlabel('Yield (Tons/Hectare)')

# Area allocation by crop type
crop_area = df_clean.groupby('Crop_Type')['Cultivated_Area_Hectares'].sum().sort_values(ascending=True)
axes[1,0].barh(crop_area.index, crop_area.values, color='coral')
axes[1,0].set_title('Cultivated Area by Crop Type')
axes[1,0].set_xlabel('Area (Hectares)')

# Box plot of yield distribution by crop type
df_clean.boxplot(column='Yield_per_Hectare', by='Crop_Type', ax=axes[1,1])
axes[1,1].set_title('Yield Distribution by Crop Type')
axes[1,1].set_xlabel('Crop Type')
axes[1,1].set_ylabel('Yield (Tons/Hectare)')

plt.tight_layout()
plt.show()

# 4. CORRELATION ANALYSIS
print("\n🔗 4. CORRELATION ANALYSIS")
print("-" * 40)

# Calculate correlations
numeric_cols = ['Production_Tons', 'Cultivated_Area_Hectares', 'Yield_per_Hectare']
correlation_matrix = df_clean[numeric_cols].corr()

# Create correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
            square=True, cbar_kws={'shrink': 0.8})
plt.title('🔗 Correlation Matrix of Agricultural Variables', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Print correlation insights
print("\n📊 Key Correlations:")
for i in range(len(correlation_matrix.columns)):
    for j in range(i+1, len(correlation_matrix.columns)):
        corr_value = correlation_matrix.iloc[i, j]
        var1 = correlation_matrix.columns[i]
        var2 = correlation_matrix.columns[j]
        print(f"  • {var1} vs {var2}: {corr_value:.3f}")

print("\n✅ Exploratory Data Analysis completed!")


## 5. Interactive Visualizations

Create interactive visualizations using Plotly for better data exploration and presentation.

In [None]:
# 🎭 INTERACTIVE VISUALIZATIONS WITH PLOTLY
print("🎭 INTERACTIVE VISUALIZATIONS")
print("="*60)

# 1. Interactive Time Series Plot
print("\n📈 1. Interactive Production Trends")

# Aggregate data for time series
ts_data = df_clean.groupby(['Year', 'Crop_Type']).agg({
    'Production_Tons': 'sum',
    'Cultivated_Area_Hectares': 'sum',
    'Yield_per_Hectare': 'mean'
}).reset_index()

# Create interactive line plot
fig_ts = px.line(ts_data, x='Year', y='Production_Tons', color='Crop_Type',
                title='🇹🇷 Turkey Agricultural Production Trends by Crop Type',
                labels={'Production_Tons': 'Production (Tons)', 'Year': 'Year'},
                template='plotly_white')

fig_ts.update_layout(
    hovermode='x unified',
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1)
)
fig_ts.show()

# 2. Interactive Regional Comparison
print("\n🗺️ 2. Interactive Regional Analysis")

# Create regional summary
regional_data = df_clean.groupby(['Province', 'Crop_Type']).agg({
    'Production_Tons': 'sum',
    'Cultivated_Area_Hectares': 'sum',
    'Yield_per_Hectare': 'mean'
}).reset_index()

# Interactive bar chart
fig_regional = px.bar(regional_data, x='Province', y='Production_Tons', color='Crop_Type',
                     title='🏛️ Production by Province and Crop Type',
                     labels={'Production_Tons': 'Production (Tons)', 'Province': 'Province'},
                     template='plotly_white')

fig_regional.update_layout(
    xaxis_tickangle=-45,
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1)
)
fig_regional.show()

# 3. Interactive Scatter Plot
print("\n🎯 3. Production Efficiency Analysis")

# Scatter plot: Area vs Production with Yield as color
fig_scatter = px.scatter(df_clean, x='Cultivated_Area_Hectares', y='Production_Tons', 
                        color='Yield_per_Hectare', size='Yield_per_Hectare',
                        hover_data=['Province', 'Crop_Type', 'Year'],
                        title='🎯 Production Efficiency: Area vs Production (Colored by Yield)',
                        labels={
                            'Cultivated_Area_Hectares': 'Cultivated Area (Hectares)',
                            'Production_Tons': 'Production (Tons)',
                            'Yield_per_Hectare': 'Yield (Tons/Hectare)'
                        },
                        template='plotly_white',
                        color_continuous_scale='Viridis')

fig_scatter.show()

# 4. Interactive Sunburst Chart
print("\n☀️ 4. Hierarchical Data View")

# Create hierarchical data for sunburst
sunburst_data = df_clean.groupby(['Province', 'Crop_Type'])['Production_Tons'].sum().reset_index()

fig_sunburst = px.sunburst(sunburst_data, 
                          path=['Province', 'Crop_Type'], 
                          values='Production_Tons',
                          title='☀️ Agricultural Production Hierarchy: Province → Crop Type',
                          template='plotly_white')
fig_sunburst.show()

# 5. Interactive Box Plot
print("\n📦 5. Yield Distribution Analysis")

fig_box = px.box(df_clean, x='Crop_Type', y='Yield_per_Hectare', color='Province',
                title='📦 Yield Distribution by Crop Type and Province',
                labels={'Yield_per_Hectare': 'Yield (Tons/Hectare)', 'Crop_Type': 'Crop Type'},
                template='plotly_white')

fig_box.update_layout(
    xaxis_tickangle=-45,
    legend=dict(orientation="v", yanchor="top", y=1, xanchor="left", x=1.02)
)
fig_box.show()

# 6. Multi-metric Dashboard
print("\n📊 6. Multi-Metric Dashboard")

# Create subplots
fig_dashboard = make_subplots(
    rows=2, cols=2,
    subplot_titles=('📈 Production Trends', '🏛️ Provincial Comparison', 
                    '🌾 Crop Distribution', '📏 Yield Analysis'),
    specs=[[{"secondary_y": False}, {"secondary_y": False}],
           [{"type": "domain"}, {"secondary_y": False}]]
)

# Production trends
yearly_prod = df_clean.groupby('Year')['Production_Tons'].sum()
fig_dashboard.add_trace(
    go.Scatter(x=yearly_prod.index, y=yearly_prod.values, mode='lines+markers', name='Production'),
    row=1, col=1
)

# Provincial comparison
prov_prod = df_clean.groupby('Province')['Production_Tons'].sum()
fig_dashboard.add_trace(
    go.Bar(x=prov_prod.index, y=prov_prod.values, name='Province Production'),
    row=1, col=2
)

# Crop distribution (pie chart)
crop_dist = df_clean.groupby('Crop_Type')['Production_Tons'].sum()
fig_dashboard.add_trace(
    go.Pie(labels=crop_dist.index, values=crop_dist.values, name="Crop Distribution"),
    row=2, col=1
)

# Yield analysis
crop_yield = df_clean.groupby('Crop_Type')['Yield_per_Hectare'].mean()
fig_dashboard.add_trace(
    go.Bar(x=crop_yield.index, y=crop_yield.values, name='Average Yield'),
    row=2, col=2
)

fig_dashboard.update_layout(
    title_text="🇹🇷 Turkey Agricultural Dashboard",
    height=800,
    showlegend=False,
    template='plotly_white'
)

fig_dashboard.show()

print("\n✨ Interactive visualizations completed!")
print("💡 Tip: Hover over the plots for more details and use the interactive controls")


## 6. Statistical Analysis & Key Insights

Perform statistical tests and summarize key findings from the agricultural data analysis.

In [None]:
# 📊 STATISTICAL ANALYSIS & KEY INSIGHTS
print("📊 STATISTICAL ANALYSIS & KEY INSIGHTS")
print("="*60)

# 1. TREND ANALYSIS
print("\n📈 1. TREND ANALYSIS")
print("-" * 40)

# Calculate year-over-year growth rates
yearly_totals = df_clean.groupby('Year').agg({
    'Production_Tons': 'sum',
    'Cultivated_Area_Hectares': 'sum',
    'Yield_per_Hectare': 'mean'
}).reset_index()

yearly_totals['Production_Growth'] = yearly_totals['Production_Tons'].pct_change() * 100
yearly_totals['Area_Growth'] = yearly_totals['Cultivated_Area_Hectares'].pct_change() * 100
yearly_totals['Yield_Growth'] = yearly_totals['Yield_per_Hectare'].pct_change() * 100

print("Year-over-Year Growth Rates:")
print(yearly_totals[['Year', 'Production_Growth', 'Area_Growth', 'Yield_Growth']].to_string(index=False))

# Calculate overall trends using linear regression
from scipy.stats import linregress

years = yearly_totals['Year'].values
production_slope, production_intercept, production_r, production_p, _ = linregress(years, yearly_totals['Production_Tons'])
yield_slope, yield_intercept, yield_r, yield_p, _ = linregress(years, yearly_totals['Yield_per_Hectare'])

print(f"\n🔍 Trend Analysis Results:")
print(f"  • Production Trend: {production_slope:,.0f} tons/year (R²={production_r**2:.3f}, p-value={production_p:.3f})")
print(f"  • Yield Trend: {yield_slope:.3f} tons/hectare/year (R²={yield_r**2:.3f}, p-value={yield_p:.3f})")

# 2. REGIONAL PERFORMANCE ANALYSIS
print("\n🏛️ 2. REGIONAL PERFORMANCE ANALYSIS")
print("-" * 40)

# Calculate provincial performance metrics
provincial_stats = df_clean.groupby('Province').agg({
    'Production_Tons': ['sum', 'mean', 'std'],
    'Yield_per_Hectare': ['mean', 'std'],
    'Cultivated_Area_Hectares': ['sum', 'mean']
}).round(2)

provincial_stats.columns = ['Total_Prod', 'Avg_Prod', 'Std_Prod', 'Avg_Yield', 'Std_Yield', 'Total_Area', 'Avg_Area']

# Calculate efficiency metrics
provincial_stats['Efficiency_Score'] = (
    provincial_stats['Avg_Yield'] / provincial_stats['Avg_Yield'].max() * 0.5 +
    provincial_stats['Total_Prod'] / provincial_stats['Total_Prod'].max() * 0.5
) * 100

print("Provincial Performance Rankings:")
top_provinces = provincial_stats.sort_values('Efficiency_Score', ascending=False)
print(top_provinces[['Total_Prod', 'Avg_Yield', 'Efficiency_Score']].to_string())

# 3. CROP PERFORMANCE ANALYSIS
print("\n🌾 3. CROP PERFORMANCE ANALYSIS")
print("-" * 40)

# Analyze crop performance
crop_stats = df_clean.groupby('Crop_Type').agg({
    'Production_Tons': ['sum', 'mean', 'std'],
    'Yield_per_Hectare': ['mean', 'std', 'min', 'max'],
    'Cultivated_Area_Hectares': ['sum', 'mean']
}).round(2)

crop_stats.columns = ['Total_Prod', 'Avg_Prod', 'Std_Prod', 'Avg_Yield', 'Std_Yield', 'Min_Yield', 'Max_Yield', 'Total_Area', 'Avg_Area']

# Calculate crop stability (coefficient of variation)
crop_stats['Yield_Stability'] = (crop_stats['Std_Yield'] / crop_stats['Avg_Yield'] * 100).round(2)

print("Crop Performance Analysis:")
print(crop_stats[['Total_Prod', 'Avg_Yield', 'Yield_Stability']].to_string())

# 4. STATISTICAL TESTS
print("\n🧪 4. STATISTICAL TESTS")
print("-" * 40)

# Test for significant differences in yield between provinces
from scipy.stats import f_oneway

province_yields = [group['Yield_per_Hectare'].values for name, group in df_clean.groupby('Province')]
f_stat, p_value_provinces = f_oneway(*province_yields)

print(f"ANOVA Test - Yield differences between provinces:")
print(f"  • F-statistic: {f_stat:.3f}")
print(f"  • P-value: {p_value_provinces:.6f}")
print(f"  • Result: {'Significant' if p_value_provinces < 0.05 else 'Not significant'} differences between provinces")

# Test for significant differences in yield between crop types
crop_yields = [group['Yield_per_Hectare'].values for name, group in df_clean.groupby('Crop_Type')]
f_stat_crops, p_value_crops = f_oneway(*crop_yields)

print(f"\nANOVA Test - Yield differences between crop types:")
print(f"  • F-statistic: {f_stat_crops:.3f}")
print(f"  • P-value: {p_value_crops:.6f}")
print(f"  • Result: {'Significant' if p_value_crops < 0.05 else 'Not significant'} differences between crop types")

# 5. KEY INSIGHTS SUMMARY
print("\n🔑 5. KEY INSIGHTS SUMMARY")
print("="*60)

insights = []

# Production insights
total_production = df_clean['Production_Tons'].sum()
avg_yield = df_clean['Yield_per_Hectare'].mean()
total_area = df_clean['Cultivated_Area_Hectares'].sum()

insights.append(f"📊 Total agricultural production analyzed: {total_production:,.0f} tons")
insights.append(f"🌾 Average yield across all crops: {avg_yield:.2f} tons/hectare")
insights.append(f"🗺️ Total cultivated area: {total_area:,.0f} hectares")

# Top performers
top_province = provincial_stats['Efficiency_Score'].idxmax()
top_crop = crop_stats['Avg_Yield'].idxmax()
most_stable_crop = crop_stats['Yield_Stability'].idxmin()

insights.append(f"🏆 Most efficient province: {top_province}")
insights.append(f"🥇 Highest yielding crop: {top_crop}")
insights.append(f"📈 Most stable crop (lowest yield variation): {most_stable_crop}")

# Growth trends
if production_p < 0.05:
    trend_direction = "increasing" if production_slope > 0 else "decreasing"
    insights.append(f"📈 Production shows a significant {trend_direction} trend over time")
else:
    insights.append("📊 No significant production trend detected over the analyzed period")

# Regional variations
if p_value_provinces < 0.05:
    insights.append("🗺️ Significant regional differences in agricultural productivity detected")
else:
    insights.append("🗺️ Agricultural productivity is relatively uniform across regions")

# Print all insights
for i, insight in enumerate(insights, 1):
    print(f"{i:2d}. {insight}")

print("\n" + "="*60)
print("✅ STATISTICAL ANALYSIS COMPLETED")
print("📝 These insights provide a foundation for further detailed analysis")
print("🎯 Next steps: Focus on specific crops/regions for deeper investigation")
print("="*60)


## 7. Conclusions & Next Steps

Summary of findings and recommendations for future analysis directions.

### 🎯 Summary of Findings

This exploratory analysis of Turkey's agricultural data has revealed several important insights:

#### 📊 **Data Quality & Structure**
- Successfully processed and analyzed multi-dimensional agricultural data
- Identified data patterns across temporal, regional, and crop-type dimensions
- Established robust data preprocessing workflows for TÜİK datasets

#### 🌾 **Agricultural Performance Patterns**
- Detected significant variations in productivity across different provinces
- Identified crop-specific yield patterns and stability metrics
- Revealed regional specialization in different agricultural products

#### 📈 **Temporal Trends**
- Analyzed year-over-year growth patterns in production and yield
- Statistical significance testing confirmed presence of regional differences
- Established baseline metrics for comparative analysis

### 🚀 **Recommended Next Steps**

#### 🔬 **Deeper Analysis Opportunities**
1. **Climate Impact Study**: Integrate weather data to analyze climate-agriculture relationships
2. **Economic Analysis**: Include price data to assess economic productivity and profitability
3. **Predictive Modeling**: Develop forecasting models for production planning
4. **Comparative Studies**: Compare Turkey's performance with other countries

#### 📊 **Technical Improvements**
1. **Real-time Data Integration**: Automate TÜİK data collection and processing
2. **Geospatial Analysis**: Incorporate GIS data for spatial agricultural analysis
3. **Dashboard Development**: Create interactive dashboards for stakeholder use
4. **Machine Learning**: Apply advanced ML techniques for pattern recognition

#### 🌍 **Global Context Integration**
1. **International Benchmarking**: Compare with FAO global agricultural data
2. **Trade Analysis**: Analyze import/export patterns and dependencies
3. **Food Security Assessment**: Evaluate Turkey's food security indicators
4. **Policy Impact Analysis**: Study the effects of agricultural policies

### 💼 **Portfolio Impact**

This analysis demonstrates:
- **Technical Proficiency**: Advanced data science skills applied to real-world agricultural data
- **Domain Expertise**: Understanding of agricultural economics and food security challenges
- **Analytical Thinking**: Systematic approach to complex data analysis problems
- **Communication Skills**: Clear presentation of findings through visualizations and insights

### 📚 **Data Sources & References**

- **Primary Source**: TÜİK (Turkish Statistical Institute)
- **Methodology**: Statistical analysis following agricultural economics best practices
- **Tools Used**: Python ecosystem (Pandas, NumPy, Matplotlib, Seaborn, Plotly, SciPy)
- **Documentation**: Comprehensive code documentation and reproducible analysis

---

**🔗 Connect with this work:**
- **GitHub Repository**: [AgroDataZoom](https://github.com/suatgonul/agrodatazoom)
- **Project Focus**: Global agricultural data science and food security analysis
- **Collaboration**: Open to research partnerships and data science consulting opportunities

*This analysis represents the beginning of a comprehensive global agricultural analysis project, contributing to evidence-based solutions for food security challenges worldwide.*