## **Kenya Regional Crop Yield Prediction Using Machine Learning**

**1. Business Understanding**

**Background**

Agriculture remains one of Kenya’s most critical economic sectors, contributing significantly to GDP, employment, food security, and rural livelihoods. However, crop production in Kenya is highly vulnerable to climate variability, input usage differences, and regional production disparities.

Unpredictable rainfall patterns, temperature fluctuations, and evolving agricultural practices make traditional yield estimation unreliable. Policymakers and agricultural planners often rely on historical trends rather than predictive intelligence, limiting proactive decision-making.

This project applies machine learning techniques to forecast crop yields across major Kenyan regions using historical agricultural production, climate indicators, and pesticide usage data.

## **Problem Statement**

The objective of this project is to develop a supervised machine learning model capable of predicting regional crop yield in Kenya using historical production data, climate variables, and agricultural input usage.

**Objectives**

1. How accurately can regional crop yields be predicted using historical climate and input variables?

2. Which regions are most sensitive to rainfall and temperature variability?

3. How do pesticide usage and harvested area influence yield outcomes?

4. Can yield prediction models provide early warning signals for potential food shortages?

## **Data Sources**

1. Kenya Agricultural Production Dataset (Kaggle)

    https://www.kaggle.com/datasets/samuelkamau/kenyas-agricultural-production-1960-
    2022

-  File: Kenya Agricultural production.xlsx
-  Extracted CSV: kenya_crops_only.csv
-  Columns: Year, Itemcode, Area Harvested in ha, Production in tonnes,
Yield_hg_per_ha
-  Provides historical crop production and yield data.

2. HarvestStat Africa – Regional Crop Data

    https://github.com/HarvestStat/HarvestStat-Africa.git

-  File: adm_crop_production_KE.csv
- Contains regional (admin) crop production data for modeling across Kenya’s main
regions.

3. Climate Data (OpenAfrica)

    https://open.africa/dataset/1106e169-24ab-404e-bdbf-42a7e2f77c6c/resource/25acacc5
    -c606-4958-9c8a-1b7d6b12d66d/download/kenya-climate-data-1991-2016-rainfallmm.c
    sv

    https://open.africa/dataset/1106e169-24ab-404e-bdbf-42a7e2f77c6c/resource/6ddd6aa0
    -fa30-44fa-85a1-ecb1d2bc4e05/download/kenya-climate-data-1991-2016-temp-degresscelcius.csv

-  Temperature data (1991–2026)
-  Rainfall data (1991–2026)
- Captures environmental effects on crop yield.

4. Pesticide Usage Data (KAPSARC Data Portal)

    https://datasource.kapsarc.org/explore/dataset/environment_pesticides_e_all_data/ex
    port/?disjunctive.item&disjunctive.element
    
- Kenya-specific data from 1999–2022
-  Represents agricultural input intensity.

All datasets were aligned between 1999 and 2022 and merged by year and region.

## **Proposed Solution**

1. Machine Learning Yield Prediction Model

    -  Forecast crop yield for specific regions and crops.

2. Regional Insights

    -  Feature importance analysis to understand climate sensitivity and regional
       yield variability.
3. Future Deployment

    -  Streamlit web application allowing users to select region and crop, input
       climate and input values, and receive predicted yield instantl

In [1]:
# Importing libraries
import openpyxl # Reading the excel file
import pandas as pd



In [None]:
# Reading the excel file

# read_file = pd.read_excel("data\Kenyas_Agricultural_Production.xlsx")

# read_file.to_csv("Crop_Yields.csv", index=None, header=True) # changed it to csv

# df = pd.DataFrame(pd.read_csv("Crop_Yields.csv"))

In [3]:
# Reading the Crop_Yields.csv file
Crop_Yields = pd.read_csv('data\Crop_Yields.csv')
Crop_Yields.head(20)

Unnamed: 0,Domain Code,Domain,Area Code (M49),Area,Element Code,Element,Item Code (CPC),Item,Year Code,Year,Unit,Value,Flag,Flag Description
0,QCL,Crops and livestock products,404,Kenya,5510,Production,1929.07,"Abaca, manila hemp, raw",1976,1976,tonnes,10.0,E,Estimated value
1,QCL,Crops and livestock products,404,Kenya,5510,Production,1929.07,"Abaca, manila hemp, raw",1977,1977,tonnes,10.0,E,Estimated value
2,QCL,Crops and livestock products,404,Kenya,5510,Production,1929.07,"Abaca, manila hemp, raw",1978,1978,tonnes,10.0,E,Estimated value
3,QCL,Crops and livestock products,404,Kenya,5510,Production,1929.07,"Abaca, manila hemp, raw",1979,1979,tonnes,10.0,E,Estimated value
4,QCL,Crops and livestock products,404,Kenya,5510,Production,1929.07,"Abaca, manila hemp, raw",1980,1980,tonnes,10.0,E,Estimated value
5,QCL,Crops and livestock products,404,Kenya,5510,Production,1929.07,"Abaca, manila hemp, raw",1981,1981,tonnes,10.0,E,Estimated value
6,QCL,Crops and livestock products,404,Kenya,5510,Production,1929.07,"Abaca, manila hemp, raw",1982,1982,tonnes,10.0,E,Estimated value
7,QCL,Crops and livestock products,404,Kenya,5510,Production,1929.07,"Abaca, manila hemp, raw",1983,1983,tonnes,20.0,E,Estimated value
8,QCL,Crops and livestock products,404,Kenya,5510,Production,1929.07,"Abaca, manila hemp, raw",1984,1984,tonnes,35.0,E,Estimated value
9,QCL,Crops and livestock products,404,Kenya,5510,Production,1929.07,"Abaca, manila hemp, raw",1985,1985,tonnes,40.0,E,Estimated value


In [4]:
# Specifying the crops needed
crop_list = [
    'Abaca,manila hemp, raw',
    'Anise,badian, coriander, cumin, caraway, fennel and juniper berries, raw',
    'Apples',
    'Apricots',
    'Artichokes',
    'Asparagus',
    'Avocados',
    'Barley',
    'Bananas',
    'Beer of barley, malted',
    'Broad beans and horse beans, green',
    'Cabbages',
    'Cashew nuts, in shell',
    'Castor oil seeds',
    'Carrots and turnips',
    'Cauliflowers and broccolli',
    'Chestnuts, in shell',
    'Chick peas, dry',
    'Chillies and peppers, dry (Capsicum spp., Pimenta spp.), raw',
    'Chillies and peppers, green (Capsicum spp. and Pimenta spp.)',
    'Cloves (whole stems), raw',
    'Coconuts, in shell',
    'Coffee, green',
    'Cotton lint, ginned',
    'Cotton seed',
    'Cottonseed oil',
    'Cow peas, dry',
    'Cucumbers and gherkins',
    'Ginger, raw',
    'Dates',
    'Green garlic',
    'Green tea (not fermented), black tea (fermented) and partly fermented tea',
    'Groundnuts, excluding shelled',
    'Leeks and other alliaceous vegetables',
    'Lemons and limes',
    'Lentils, dry',
    'Lettuce and chicory',
    'Linseed',
    'Mangoes, guavas and mangosteens',
    'Maize (corn)',
    'Millet',
    'Mushrooms and truffles',
    'Nutmeg, mace, cardamoms, raw',
    'Oats',
    'Onions and shallots, dry (excluding dehydrated)',
    'Oranges',
    'Papayas',
    'Peaches and nectarines',
    'Pears',
    'Peas, green',
    'Wheat', 
    'Potatoes', 
    'Beans, dry', 
    'Cassava, fresh', 
    'Sorghum', 
    'Millet', 
    'Sweet potatoes',
    'Pepper (Piper spp.), raw',
    'Pigeon peas, dry',
    'Pineapples',
    'Plantains and cooking bananas',
    'Plums and sloes',
    'Pomelos and grapefruits',
    'Pyrethrum, dried flowers',
    'Rice',
    'Seed cotton, unginned',
    'Sesame seed',
    'Soya beans',
    'Spinach',
    'Strawberries',
    'Sugar cane',
    'Sunflower seed',
    'Tangerines, mandarins, clementines',
    'Tea leaves',
    'Tomatoes',
    'Unmanufactured tobacco',
    'Vanilla, raw',
    'Watermelons',
    'Yams'
]

# Filter the dataset for the crops
crops_df = Crop_Yields[Crop_Yields['Item'].isin(crop_list)].copy()

# Pivot the Element column so each row has Production, Area Harvested, and Yield
pivot_df = crops_df.pivot_table(
    index=['Year', 'Item', 'Item Code (CPC)'],
    columns='Element',
    values='Value'
).reset_index()

# Clean column names
pivot_df.columns.name = None
pivot_df.rename(columns={
    'Production': 'Production_tonnes',
    'Area harvested': 'Area_Harvested_ha',
    'Yield': 'Yield_hg_per_ha'
}, inplace=True)

In [5]:
pivot_df.head()

Unnamed: 0,Year,Item,Item Code (CPC),Area_Harvested_ha,Production_tonnes,Yield_hg_per_ha
0,1961,Apricots,1343,2.0,10.0,50000.0
1,1961,Avocados,1311,1100.0,16000.0,145455.0
2,1961,Bananas,1312,40000.0,400000.0,100000.0
3,1961,Barley,115,12666.0,13513.0,10669.0
4,1961,"Beans, dry",1701,115000.0,55000.0,4783.0


In [6]:
pivot_df.to_csv('kenya_crops_only.csv', index=False)