# Task 3: Expected Sales Volume Estimation

**Objective:** Estimate annual market capacity for each crop based on 2023 actual production.

**Assumption:** Market demand remains stable relative to 2023 baseline (problem statement).

**Method:** Use 2023 actual production as proxy for sustainable market capacity.

In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams['figure.dpi'] = 150
sns.set_theme(style='whitegrid')

## 1. Load Clustered Data

In [15]:
# Load clustering results
df = pd.read_csv('clustered_data.csv')

# Load crop types
df_types = pd.read_excel('Attachment1_EN.xlsx', sheet_name='Village Crops')
df_types = df_types[['Crop Name', 'Crop Type']].drop_duplicates()

df = df.merge(df_types, on='Crop Name', how='left')

print(f"Loaded {len(df)} crop-plot combinations across {df['Class'].nunique()} classes")

Loaded 107 crop-plot combinations across 4 classes


## 2. Calculate 2023 Actual Production

Compute actual production from 2023 planting records to establish baseline market capacity.

In [16]:
# Load 2023 data
planting_2023 = pd.read_excel('Attachment2_EN.xlsx', sheet_name='2023 Crop Planting Status')
stats_2023 = pd.read_excel('Attachment2_EN.xlsx', sheet_name='2023 Statistical Data')

# Merge and calculate production
merged_2023 = planting_2023.merge(
    stats_2023[['Crop Name', 'Planting Season', 'Yield per Mu (Jin)']],
    on=['Crop Name', 'Planting Season']
)
merged_2023['Production'] = merged_2023['Planting Area (Mu)'] * merged_2023['Yield per Mu (Jin)']

# Aggregate by crop (annual production)
actual_production_2023 = merged_2023.groupby('Crop Name')['Production'].sum()

print(f"2023 Production Summary:")
print(f"  Total production: {actual_production_2023.sum():,.0f} Jin")
print(f"  Unique crops: {len(actual_production_2023)}")
print(f"  Total planted area: {planting_2023['Planting Area (Mu)'].sum():.1f} mu")

2023 Production Summary:
  Total production: 2,833,300 Jin
  Unique crops: 41
  Total planted area: 1292.0 mu


## 3. Map to Crop-Plot Combinations

In [17]:
# Map 2023 production to each crop (all plot types share same annual capacity)
df['Expected_Sales_Volume'] = df['Crop Name'].map(actual_production_2023).fillna(0)

# Calculate auxiliary metrics
df['Cost_per_Jin'] = df['Cost_per_mu'] / df['Yield_per_mu']

print(f"\nMapping complete. Sample crops:")
sample = df.groupby('Crop Name')['Expected_Sales_Volume'].first().sort_values(ascending=False).head(10)
for crop, sales in sample.items():
    print(f"  {crop:<20} {sales:>12,.0f} jin/year")


Mapping complete. Sample crops:
  Wheat                     506,160 jin/year
  Corn                      384,750 jin/year
  Millet                    210,900 jin/year
  Soybean                   167,580 jin/year
  Chinese Cabbage           150,000 jin/year
  Sweet Potato              113,400 jin/year
  Pumpkin                   111,150 jin/year
  White Radish              100,000 jin/year
  Eggplant                   97,200 jin/year
  Mung Bean                  95,520 jin/year


## 4. Export for Optimization Model

In [22]:
# Select relevant columns for Task 4
output_cols = [
    'Crop Name', 'Plot Type', 'Class', 'Crop Type',
    'Yield_per_mu', 'Cost_per_mu', 'Avg_Price', 'Cost_per_Jin',
    'Expected_Sales_Volume'
]

df[output_cols].to_csv('sales_volume_data.csv', index=False, encoding='utf-8-sig')
print("Output saved: sales_volume_data.csv")


Output saved: sales_volume_data.csv
