# Investigating the labels in the `ref_african_crops_kenya_01` dataset

Contents:

1. How many fields per major crop?
2. How many fields have an additional second crop?
3. What is the crop density per crop type?
4. How many crops are planted in each field

In [14]:
%matplotlib inline
import geojson
from matplotlib import pyplot as plt
from pathlib import Path
from collections import defaultdict
import numpy as np

from typing import Dict

First, we will load up the geojson labels

In [2]:
labels_path = Path(
    "../../data/ref_african_crops_kenya_01/ref_african_crops_kenya_01_001/ref_african_crops_kenya_01_001.geojson"
)

In [3]:
with labels_path.open("r") as f:
    labels_geojson = geojson.load(f)

#### 1. How many fields per major crop?

In [4]:
fields_as_major = defaultdict(int)
for label in labels_geojson["features"]:
    fields_as_major[label["properties"]["Crop1"]] += 1

In [5]:
fields_as_major

defaultdict(int,
            {'Millet': 16,
             'Cassava': 59,
             'Maize': 197,
             'Groundnut': 10,
             'Soybean': 2,
             'Sorghum': 2,
             'Bean': 2,
             'Fallowland': 23,
             'Tomato': 2,
             'Sugarcane': 3,
             'Cabbage': 1,
             'Sweetpotato': 1,
             'Banana': 1})

#### 2. How many fields have an additional second crop?

In [6]:
fields_as_1_or_2 = defaultdict(int)
for label in labels_geojson["features"]:
    crop1 = label['properties']['Crop1']
    crop2 = label['properties']['Crop2']

    crop_string = f"{crop1}{'/' if crop2 != '' else ''}{crop2}"

    fields_as_1_or_2[crop_string] += 1

In [7]:
fields_as_1_or_2

defaultdict(int,
            {'Millet': 13,
             'Cassava': 51,
             'Maize': 128,
             'Groundnut': 9,
             'Maize/Bean': 15,
             'Soybean/Groundnut': 1,
             'Soybean': 1,
             'Maize/Groundnut': 26,
             'Sorghum/Bean': 1,
             'Bean': 2,
             'Maize/Cowpea': 1,
             'Maize/Cassava': 17,
             'Fallowland': 23,
             'Cassava/Maize': 4,
             'Cassava/Groundnut': 3,
             'Maize/Soybean': 6,
             'Tomato': 2,
             'Millet/Groundnut': 1,
             'Millet/Fallowland': 1,
             'Sugarcane': 3,
             'Millet/Sorghum': 1,
             'Maize/Fallowland': 1,
             'Maize/Millet': 2,
             'Cassava/Fallowland': 1,
             'Sorghum/Soybean': 1,
             'Groundnut/Fallowland': 1,
             'Cabbage': 1,
             'Sweetpotato': 1,
             'Banana': 1,
             'Maize/Sweetpotato': 1})

We can now compare the fields with only 1 crop vs more than one:

In [8]:
total_mixed_fields = sum([val for key, val in fields_as_1_or_2.items() if "/" in key])

In [9]:
total_mixed_fields

84

In [10]:
total_unmixed_fields = sum([val for key, val in fields_as_1_or_2.items() if "/" not in key])

In [11]:
total_unmixed_fields

235

#### 3. What is the crop density per crop type?

Crop density between different fields is also documented - we can look at the distribution of crop density across crops:

In [12]:
crop_densities_per_crop = defaultdict(list)
for label in labels_geojson["features"]:
    crop_densities_per_crop[label["properties"]["Crop1"]].append(int(label["properties"]["Crop Density"]))

In [13]:
for crop, densities in crop_densities_per_crop.items():
    print(f"{crop}: {round(np.mean(densities), 2)}% ({fields_as_major[crop]} fields)")

Millet: 75.94% (16 fields)
Cassava: 47.71% (59 fields)
Maize: 40.48% (197 fields)
Groundnut: 52.2% (10 fields)
Soybean: 40.0% (2 fields)
Sorghum: 65.0% (2 fields)
Bean: 52.5% (2 fields)
Fallowland: 89.13% (23 fields)
Tomato: 20.0% (2 fields)
Sugarcane: 75.0% (3 fields)
Cabbage: 40.0% (1 fields)
Sweetpotato: 45.0% (1 fields)
Banana: 10.0% (1 fields)


#### 5. How many crops are planted in each field?

Up to 5 crops are documented in a field - we can look at how many fields have how many crops

In [19]:
def find_crops_per_field(properties_dict: Dict) -> int:
    for i in range(1, 6):
        crop_string = f"Crop{i}"
        crop = properties_dict[crop_string]
        if crop == "":
            return i - 1
    return i - 1

In [20]:
crops_per_field = defaultdict(int)

for label in labels_geojson["features"]:

    num_crops = find_crops_per_field(label["properties"])
    crops_per_field[num_crops] += 1

In [24]:
for num_crops, num_fields in crops_per_field.items():
    print(f"{num_fields} fields have {num_crops} crops planted")

235 fields have 1 crops planted
63 fields have 2 crops planted
11 fields have 3 crops planted
10 fields have 4 crops planted
