# Machine Learning System for Annual Crop Planning and Yield Prediction

## Overview

This system uses machine learning to optimize crop planning and predict yields for a full year. It addresses the following requirements:

1. Number of crops to be grown
2. Types of crops throughout the year
3. Growing periods
4. Expected output for each period
5. Farmer salaries
6. Profit and loss margins

## Data Requirements

To train our models, we'll need historical data including:

- Weather patterns (temperature, rainfall, humidity)
- Soil quality metrics
- Previous crop yields
- Market prices for various crops
- Labor costs
- Input costs (seeds, fertilizers, pesticides)
- Crop rotation patterns

## Machine Learning Pipeline

1. **Data Preprocessing**

   - Clean and normalize data
   - Handle missing values
   - Encode categorical variables

2. **Feature Engineering**

   - Create relevant features (e.g., growing degree days, soil moisture index)
   - Perform feature selection to identify most important factors

3. **Model Development**

   a. Crop Selection Model (Multi-label Classification)

   - Algorithm: Random Forest or Gradient Boosting
   - Output: Optimal set of crops for the year

   b. Crop Scheduling Model (Sequence Prediction)

   - Algorithm: Recurrent Neural Network (LSTM)
   - Output: Optimal sequence and timing of crops

   c. Yield Prediction Model (Regression)

   - Algorithm: Ensemble of Gradient Boosting and Neural Networks
   - Output: Expected yield for each crop

   d. Financial Forecasting Model (Time Series Analysis)

   - Algorithm: ARIMA or Prophet
   - Output: Predicted market prices, profit/loss margins

4. **Hyperparameter Tuning**

   - Use techniques like Grid Search or Bayesian Optimization

5. **Model Evaluation**
   - Use metrics like F1-score for classification, RMSE for regression, and MAPE for time series

## System Output

The system will provide:

1. Number and types of crops to grow (from Crop Selection Model)
2. Planting and harvesting schedule (from Crop Scheduling Model)
3. Growing periods for each crop (from Crop Scheduling Model)
4. Expected yield for each crop and period (from Yield Prediction Model)
5. Recommended farmer salaries based on labor market data and predicted profits
6. Projected profit/loss margins (from Financial Forecasting Model)

## Continuous Improvement

- Implement A/B testing for model updates
- Regularly retrain models with new data
- Use reinforcement learning to optimize long-term strategies


In [218]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from datetime import datetime, timedelta
from sklearn.model_selection import GridSearchCV


In [219]:
class Crop:
    def __init__(self, name, growing_period, yield_per_acre, price_per_unit):
        self.name = name
        self.growing_period = growing_period  # in days
        self.yield_per_acre = yield_per_acre  # in units
        self.price_per_unit = price_per_unit  # in dollars

In [220]:
class MLCropPlan:
    def __init__(self, start_date, end_date, total_acres):
        self.start_date = start_date
        self.end_date = end_date
        self.total_acres = total_acres
        self.crops = []
        self.plan = []
        self.model = None
        self.scaler = StandardScaler()

    def add_crop(self, crop):
        self.crops.append(crop)

    def generate_synthetic_data(self, num_samples=1000):
        data = []
        for _ in range(num_samples):
            crop = np.random.choice(self.crops)
            temperature = np.random.uniform(10, 35)
            rainfall = np.random.uniform(0, 200)
            soil_quality = np.random.uniform(0, 1)
            acres = np.random.randint(1, self.total_acres + 1)

            base_yield = crop.yield_per_acre * acres
            yield_modifier = (
                1
                + 0.1 * (temperature - 20) / 10
                + 0.2 * (rainfall - 100) / 100
                + 0.3 * soil_quality
            )
            actual_yield = base_yield * yield_modifier

            data.append(
                [crop.name, temperature, rainfall, soil_quality, acres, actual_yield]
            )

        return pd.DataFrame(
            data,
            columns=[
                "crop",
                "temperature",
                "rainfall",
                "soil_quality",
                "acres",
                "yield",
            ],
        )

    def train_model(self):
        data = self.generate_synthetic_data()
        X = data[["temperature", "rainfall", "soil_quality", "acres"]]
        y = data["yield"]

        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42
        )

        X_train_scaled = self.scaler.fit_transform(X_train)
        X_test_scaled = self.scaler.transform(X_test)

        param_grid = {
            "n_estimators": [100, 200],
            "max_depth": [10, 20, 30],
            "min_samples_split": [2, 5, 10],
        }

        grid_search = GridSearchCV(
            RandomForestRegressor(random_state=42), param_grid, cv=5, verbose=2
        )
        grid_search.fit(X_train_scaled, y_train)

        self.model = grid_search.best_estimator_
        print(f"Best model parameters: {grid_search.best_params_}")
        print(f"Model R² score: {self.model.score(X_test_scaled, y_test):.2f}")

    def predict_yield(self, temperature, rainfall, soil_quality, acres):
        if self.model is None:
            raise ValueError("Model not trained. Call train_model() first.")

        input_data = np.array([[temperature, rainfall, soil_quality, acres]])
        scaled_input = self.scaler.transform(input_data)
        predicted_yield = self.model.predict(scaled_input)[0]
        return predicted_yield

    def generate_plan(self):
        current_date = self.start_date
        while current_date < self.end_date:
            crop = np.random.choice(self.crops)
            acres = np.random.randint(1, self.total_acres + 1)
            temperature = np.random.uniform(10, 35)
            rainfall = np.random.uniform(0, 200)
            soil_quality = np.random.uniform(0, 1)

            predicted_yield = self.predict_yield(
                temperature, rainfall, soil_quality, acres
            )

            end_date = current_date + timedelta(days=crop.growing_period)
            if end_date > self.end_date:
                break
            self.plan.append((crop, current_date, end_date, acres, predicted_yield))
            current_date = end_date

    def calculate_results(self):
        total_revenue = 0
        total_cost = 0
        farmer_salary = 50000  # Assuming a fixed yearly salary for simplicity

        for crop, start, end, acres, predicted_yield in self.plan:
            revenue = predicted_yield * crop.price_per_unit
            cost = acres * 1000  # Assuming a fixed cost per acre for simplicity
            total_revenue += revenue
            total_cost += cost

        total_cost += farmer_salary
        profit = total_revenue - total_cost
        profit_margin = (profit / total_revenue) * 100 if total_revenue > 0 else 0

        return {
            "total_crops": len(self.plan),
            "farmer_salary": farmer_salary,
            "total_revenue": total_revenue,
            "total_cost": total_cost,
            "profit": profit,
            "profit_margin": profit_margin,
        }

    def print_plan(self):
        print(f"ML-Enhanced Crop Plan for {self.start_date.year}")
        print(
            f"{'Crop':<15} {'Start Date':<12} {'End Date':<12} {'Acres':<6} {'Predicted Yield':<15} {'Expected Revenue':<18}"
        )
        print("-" * 85)
        for crop, start, end, acres, predicted_yield in self.plan:
            revenue = predicted_yield * crop.price_per_unit
            print(
                f"{crop.name:<15} {start.strftime('%Y-%m-%d'):<12} {end.strftime('%Y-%m-%d'):<12} {acres:<6} {predicted_yield:<15.2f} ${revenue:<17.2f}"
            )

        results = self.calculate_results()
        print("\nSummary:")
        print(f"Total number of crops grown: {results['total_crops']}")
        print(f"Farmer's salary: ${results['farmer_salary']:.2f}")
        print(f"Total revenue: ${results['total_revenue']:.2f}")
        print(f"Total cost: ${results['total_cost']:.2f}")
        print(f"Profit: ${results['profit']:.2f}")
        print(f"Profit margin: {results['profit_margin']:.2f}%")


In [221]:
start_date = datetime(2022, 1, 1)
end_date = datetime(2022, 12, 31)
total_acres = 100

In [222]:
plan = MLCropPlan(start_date, end_date, total_acres)
plan.add_crop(Crop("Wheat", 120, 50, 5))
plan.add_crop(Crop("Corn", 150, 180, 4))
plan.add_crop(Crop("Soybeans", 100, 40, 10))
plan.add_crop(Crop("Potatoes", 110, 200, 8))

In [223]:
plan.train_model()
plan.generate_plan()

Fitting 5 folds for each of 18 candidates, totalling 90 fits
[CV] END max_depth=10, min_samples_split=2, n_estimators=100; total time=   0.2s
[CV] END max_depth=10, min_samples_split=2, n_estimators=100; total time=   0.1s
[CV] END max_depth=10, min_samples_split=2, n_estimators=100; total time=   0.1s
[CV] END max_depth=10, min_samples_split=2, n_estimators=100; total time=   0.1s
[CV] END max_depth=10, min_samples_split=2, n_estimators=100; total time=   0.1s
[CV] END max_depth=10, min_samples_split=2, n_estimators=200; total time=   0.3s
[CV] END max_depth=10, min_samples_split=2, n_estimators=200; total time=   0.3s
[CV] END max_depth=10, min_samples_split=2, n_estimators=200; total time=   0.3s
[CV] END max_depth=10, min_samples_split=2, n_estimators=200; total time=   0.3s
[CV] END max_depth=10, min_samples_split=2, n_estimators=200; total time=   0.3s
[CV] END max_depth=10, min_samples_split=5, n_estimators=100; total time=   0.1s
[CV] END max_depth=10, min_samples_split=5, n_es



In [224]:
plan.print_plan()

ML-Enhanced Crop Plan for 2022
Crop            Start Date   End Date     Acres  Predicted Yield Expected Revenue  
-------------------------------------------------------------------------------------
Potatoes        2022-01-01   2022-04-21   3      389.56          $3116.50          
Potatoes        2022-04-21   2022-08-09   47     7013.92         $56111.38         

Summary:
Total number of crops grown: 2
Farmer's salary: $50000.00
Total revenue: $59227.87
Total cost: $100000.00
Profit: $-40772.13
Profit margin: -68.84%
