# DigiCow Farmer Training Adoption Challenge

## Aim: 
To predict which farmers will turn training into action

## Problem Statement:
Access to high-quality agricultural training is just the first step toward improving productivity of farms. However, understanding which farmers adopt improved practices after training and why is a real challenge.

## Business Understanding: 
DigiCow supports smallholder farmers through digital tools, extension services, and targetted training programmes. However, like many real-world interventions, adoption rates remain low and uneven. The ability to predict adoption early can enable DigiCow and its partners to prioritise follow-ups, tailor support more effectively, and design stronger extension strategies. 

## Project Pitch: 
Predict the probability that a farmer will adopt a practice within 120 days of their first training, only using information available at the time of training. So the trained model must output predicted probabilities indicating the likelihood that a farmer will adopt a DigiCow-supported practice within the target time window (120 days of their first training). 

In [5]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

import warnings
warnings.filterwarnings('ignore')

## Dataset Overview

In [6]:
# Load datasets
train_df = pd.read_csv('Train.csv')
test_df = pd.read_csv('Test.csv')
sample_sub = pd.read_csv('SampleSubmission.csv')
description = pd.read_csv('dataset_data_dictionary.csv')

In [7]:
description

Unnamed: 0,column_name,description
0,ID,unique identifier for each farmer entry
1,gender,Gender of the farmer
2,age,Age category of the farmer
3,registration,Registration method
4,belong_to_cooperative,Whether the farmer belongs to a cooperative (1...
5,county,County of residence
6,subcounty,Sub-county of residence
7,ward,Ward of residence
8,trainer,Trainer who delivered the first training
9,topics_list,List of possible training topics


- Based on the description df; the target variable is `adopted_within_120_days`

- Perform high-level overview of the training and test datasets to investigate their respective structure and access readiness for modelling. 

In [11]:
# High-level datasets overview
print(f"\nTrain Dataset Shape: {train_df.shape}")
print(f"Test Dataset Shape: {test_df.shape}")
print(f"\nNumber of Features in Train: {train_df.shape[1]}")
print(f"Number of Features in Test: {test_df.shape[1]}")

# Check for target variable
if 'adopted_within_120_days' in train_df.columns and 'adopted_within_120_days' not in test_df.columns:
    print(f"\nTarget variable 'adopted_within_120_days' found in train set")
    print(f"Target variable absent from test set")
else:
    print("\n Check target variable presence!")


Train Dataset Shape: (11780, 19)
Test Dataset Shape: (5055, 18)

Number of Features in Train: 19
Number of Features in Test: 18

Target variable 'adopted_within_120_days' found in train set
Target variable absent from test set
