
## <a name="Wheat Detection">About this Competition</a>

In this competition, you’ll detect wheat heads from outdoor images of wheat plants, including wheat datasets from around the globe. Using worldwide data, you will focus on a generalized solution to estimate the number and size of wheat heads. To better gauge the performance for unseen genotypes, environments, and observational conditions, the training dataset covers multiple regions. You will use more than 3,000 images from Europe (France, UK, Switzerland) and North America (Canada). The test data includes about 1,000 images from Australia, Japan, and China.

Wheat is a staple across the globe, which is why this competition must account for different growing conditions. Models developed for wheat phenotyping need to be able to generalize between environments. If successful, researchers can accurately estimate the density and size of wheat heads in different varieties. With improved detection farmers can better assess their crops, ultimately bringing cereal, toast, and other favorite dishes to your table.

## <a name="Wheat Detection">Introduction : Wheat Detection Dataset - Exploration Data Analysis</a>

#### <a name="About_Competition"> Introduction </a>
"About this Competition" "Supporting the shit for sake of the breads to have for the dinner"

In this competition, you’ll detect wheat heads from outdoor images of wheat plants, including wheat datasets from around the globe. Using worldwide data, you will focus on a generalized solution to estimate the number and size of wheat heads. To better gauge the performance for unseen genotypes, environments, and observational conditions, the training dataset covers multiple regions. You will use more than 3,000 images from Europe (France, UK, Switzerland) and North America (Canada). The test data includes about 1,000 images from Australia, Japan, and China.

Wheat is a staple across the globe, which is why this competition must account for different growing conditions. Models developed for wheat phenotyping need to be able to generalize between environments. If successful, researchers can accurately estimate the density and size of wheat heads in different varieties. With improved detection farmers can better assess their crops, ultimately bringing cereal, toast, and other favorite dishes to your table.
                           

#### <a name="Challenges">Challenges in dectecting Wheat head</a>           

However, accurate wheat head detection in outdoor field images can be visually challenging. There is often overlap of dense wheat plants, and the wind can blur the photographs. Both make it difficult to identify single heads. Additionally, appearances vary due to maturity, color, genotype, and head orientation. Finally, because wheat is grown worldwide, different varieties, planting densities, patterns, and field conditions must be considered. Models developed for wheat phenotyping need to generalize between different growing environments. Current detection methods involve one- and two-stage detectors (Yolo-V3 and Faster-RCNN), but even when trained with a large dataset, a bias to the training region remains.


####  <a name="objective">Objective</a>: 
predict bounding boxes around each wheat head in images


#### <a name="dataset_description">Dataset Description</a>: 

The data is images of wheat fields, with bounding boxes for each identified wheat head. Not all images include wheat heads / bounding boxes. The images were recorded in many locations around the world.

The CSV data is simple - the image ID matches up with the filename of a given image, and the width and height of the image are included, along with a bounding box (see below). There is a row in train.csv for each bounding box. Not all images have bounding boxes.

Most of the test set images are hidden. A small subset of test images has been included for your use in writing code.

What am I predicting?
You are attempting to predict bounding boxes around each wheat head in images that have them. If there are no wheat heads, you must predict no bounding boxes.

File details :- 

1.     train.csv - the training data
2.     sample_submission.csv - a sample submission file in the correct format
3.     train.zip - training images
4.     test.zip - test images

Columns in train.csv

1.     image_id - the unique image ID
2.     width, height - the width and height of the images
3.     bbox - a bounding box, formatted as a Python-style list of [xmin, ymin, width, height]


#### <a name="target_variable">Target Variable</a>                                        
* __Submission data__  
    Image ID & Prediction String ( [xmin, ymin, width, height] ) 




##  <a name="Facts"> Wheat facts </a> 


#### What is a head of wheat?
* Wheat has a single main stem plus typically 2-3 tillers per plant. ... These wrap around the stem at the point where the leaf sheath meets the leaf blade. Spike. The spike (also called the ear or head) forms at the top of the plant.


#### How many wheat grain in a wheat crop ? 
* Now days, with breeding, a stalk of wheat can have up to 200 grains. Most wild wheat has between 10-18 grains per stalk. It takes 150 grams or 5 oz of wheat berries to make one cup of flour

#### How much wheat can one seed produce ? 
* On average, there are 22 seeds per head and 5 heads per plant, or 110 seeds per plant. With an average seed size of 15,000 seeds per pound or 900,000 seeds per bushel, a pound of average-sized seed with 80% germination and emergence has a yield potential of approximately 1.5 bushels per acre.


#### How long it takes wheat to grow ? 
* About seven to eight months
It is planted in the fall, usually between October and December, and grows over the winter to be harvested in the spring or early summer. Typically it takes about seven to eight months to reach maturity and it creates pretty golden contrast in spring gardens.


#### How much does a bag of wheat seed cost ?
* Average seed costs per 50-pound bag currently range from 12.50 USD  to 12.95 USD. However, seed cost depends on the variety (whether public or private) and the quantity of seed being purchased.


#### What parts of wheat are used?
* What is the wheat kernel? The wheat kernel, or wheat berry, is the grain portion of the wheat plant and the source of flour. It consists of three main parts – the endosperm, bran and germ – which are usually separated for different flours and uses.

## Machine Learning usescases in Agriculture

In [None]:
from IPython.display import HTML
HTML('<center><iframe width="700" height="400" src="https://www.youtube.com/embed/NlpS-DhayQA?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe></center>')

<font color="red" size=3>Please upvote this kernel if you like it. It motivates me to produce more quality content  :) </font>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import cv2
import math
import os, ast
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from matplotlib import pyplot as plt # plotting
import matplotlib.patches as patches

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
import plotly.figure_factory as ff
from sklearn.preprocessing import OneHotEncoder

import seaborn as sns
from tqdm import tqdm
import matplotlib.cm as cm
from sklearn import metrics
import matplotlib.pyplot as plt
from sklearn.utils import shuffle

tqdm.pandas()
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots


In [None]:
# Some constants
dataset_path = '/kaggle/input/global-wheat-detection'
dataset_img_train='/kaggle/input/global-wheat-detection/train/'
dataset_img_test='/kaggle/input/global-wheat-detection/test/'

In [None]:
%%time
train_df = pd.read_csv(os.path.join(dataset_path, 'train.csv'))
sample_sub_df = pd.read_csv(os.path.join(dataset_path, 'sample_submission.csv'))

In [None]:
train_df.head()

In [None]:
sample_sub_df.head()

In [None]:
print(f'Shape of training data: {train_df.shape}')
print(f'Shape of given test data: {sample_sub_df.shape}')

In [None]:
%%time
SAMPLE_LEN=2000
def load_image(image_id):
    file_path = image_id + ".jpg"
    image = cv2.imread(dataset_img_train + file_path)
    return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

train_images = train_df["image_id"][:SAMPLE_LEN].progress_apply(load_image)

### Plot Sample Image

In [None]:
fig = px.imshow(cv2.resize(train_images[99], (205, 136)))
fig.show()

## Descriptive statistics of Image data

In [None]:
print(f'Total no. of images                        : {train_df.shape[0]}')
print(f'Total no. of unique images                 : {train_df["image_id"].nunique()}')
print(f'Checking Dimentions - heights and widths   : {train_df["width"].unique()}, {train_df["height"].unique()}')
print(f'Maximum number of wheat heads in the Image : {max(train_df["image_id"].value_counts())}')
print(f'Average wheat heads in the Image           : {len(train_df)/train_df["image_id"].nunique()}')

# Distribution of images by wheat heads

In [None]:
sns.distplot(train_df['image_id'].value_counts(), kde=True)
plt.xlabel('# of wheat heads')
plt.ylabel('# of images')
plt.title('# of wheat heads vs. # of images')
plt.show()

## Bounding Boxes per Image

In [None]:
box_count = train_df["image_id"].value_counts()

hist_data = [box_count.values]
group_labels = ['Count'] # name of the dataset

fig = ff.create_distplot(hist_data, group_labels, bin_size=2)
fig.update_layout(title_text="Number of bounding boxes per image", template="simple_white", title_x=0.5)
fig.show()

## Create seperate columns for 'x_min','y_min', 'width', 'height' in the train dataset

In [None]:
train_df[['x_min','y_min', 'width', 'height']] = pd.DataFrame([ast.literal_eval(x) for x in train_df.bbox.tolist()], index= train_df.index)
train_df = train_df[['image_id', 'bbox', 'source', 'x_min', 'y_min', 'width', 'height']]
train_df

## Visualize few samples of current training dataset

In [None]:
# Visualize few samples of current training dataset
fig, ax = plt.subplots(nrows=2, ncols=4, figsize=(20, 10))
count=1000
for row in ax:
    for col in row:
        img = plt.imread(f'{os.path.join(dataset_path, "train", train_df["image_id"].unique()[count])}.jpg')
        col.grid(False)
        col.set_xticks([])
        col.set_yticks([])
        col.imshow(img)
        count += 1
plt.show()

## Visualize few samples of current training dataset with boxes

In [None]:
##  Thanks to https://www.kaggle.com/kaushal2896/global-wheat-detection-starter-eda kernal , Kindly upvote this kernal also

def get_bbox(image_id, df, col, color='white'):
    bboxes = df[df['image_id'] == image_id]
    
    for i in range(len(bboxes)):
        # Create a Rectangle patch
        rect = patches.Rectangle(
            (bboxes['x_min'].iloc[i], bboxes['y_min'].iloc[i]),
            bboxes['width'].iloc[i], 
            bboxes['height'].iloc[i], 
            linewidth=2, 
            edgecolor=color, 
            facecolor='none')

        # Add the patch to the Axes
        col.add_patch(rect)

In [None]:
# Visualize few samples of current training dataset
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(20, 20))
count=0
for row in ax:
    for col in row:
        img_id = train_df["image_id"].unique()[count]
        img = plt.imread(f'{os.path.join(dataset_path, "train", img_id)}.jpg')
        col.grid(False)
        col.set_xticks([])
        col.set_yticks([])
        get_bbox(img_id, train_df, col, color='red')
        col.imshow(img)
        count += 1
plt.show()

## Images with Maximum and Minimum Wheat heads

In [None]:
image_id = (train_df['image_id'].value_counts() == max(train_df["image_id"].value_counts())).index[0]
print('Maximum wheat heads :',max(train_df["image_id"].value_counts()))
img = plt.imread(f'{os.path.join(dataset_path, "train", image_id)}.jpg')
fig, ax = plt.subplots(1, figsize=(12, 12))
ax.grid(False)
ax.set_xticks([])
ax.set_yticks([])
ax.axis('off')
get_bbox(image_id, train_df, ax, color='orange')
ax.imshow(img)
plt.plot()

### Wheat Source - Share in the dataset

In [None]:
source = train_df['source'].value_counts()
print(train_df['source'].value_counts())
wheat_src_df = train_df.groupby(['source']).agg({'image_id':'count'}).reset_index()
wheat_src_df.rename(columns={'image_id':'count'},inplace=True)

In [None]:
wheat_src_df

### Wheat - Source Share

In [None]:
fig = go.Figure(data=[
    go.Pie(labels=source.index, values=source.values)
])

fig.update_layout(title='Source distribution')
fig.show()

In [None]:
fig = go.Figure(go.Bar(x=train_df['source'].value_counts().index, 
                       y=train_df['source'].value_counts(),
                       marker_color='lightsalmon'))
fig.update_layout(title_text="Bar chart of sources", title_x=0.5)
fig.show()

### Bbox count in the image

In [None]:
bbox_count = train_df.groupby("source")["image_id"].apply(lambda X: X.value_counts().mean()).reset_index().rename(columns={"image_id": "bbox_count"})

fig = go.Figure(go.Bar(x=bbox_count.source, 
                       y=bbox_count.bbox_count,
                       name='Bbox counts', marker_color='indianred'))
fig.update_layout(title_text="Bar chart of Bbox counts in image", template="simple_white", title_x=0.5)
fig.show()

## Channels Distributions

In [None]:
red_values = [np.mean(train_images[idx][:, :, 0]) for idx in range(len(train_images))]
green_values = [np.mean(train_images[idx][:, :, 1]) for idx in range(len(train_images))]
blue_values = [np.mean(train_images[idx][:, :, 2]) for idx in range(len(train_images))]
values = [np.mean(train_images[idx]) for idx in range(len(train_images))]

In [None]:
fig = ff.create_distplot([values], group_labels=["Channels"], colors=["purple"])
fig.update_layout(showlegend=False, template="simple_white")
fig.update_layout(title_text="Distribution of channel values")
fig.data[0].marker.line.color = 'rgb(0, 0, 0)'
fig.data[0].marker.line.width = 0.5
fig

### Red Channel Values

In [None]:
fig = ff.create_distplot([red_values], group_labels=["R"], colors=["red"])
fig.update_layout(showlegend=False, template="simple_white")
fig.update_layout(title_text="Distribution of red channel values")
fig.data[0].marker.line.color = 'rgb(0, 0, 0)'
fig.data[0].marker.line.width = 0.5
fig

### Blue Channel Values

In [None]:
fig = ff.create_distplot([blue_values], group_labels=["B"], colors=["blue"])
fig.update_layout(showlegend=False, template="simple_white")
fig.update_layout(title_text="Distribution of blue channel values")
fig.data[0].marker.line.color = 'rgb(0, 0, 0)'
fig.data[0].marker.line.width = 0.5
fig

### Green Channel Values

In [None]:
fig = ff.create_distplot([green_values], group_labels=["G"], colors=["green"])
fig.update_layout(showlegend=False, template="simple_white")
fig.update_layout(title_text="Distribution of green channel values")
fig.data[0].marker.line.color = 'rgb(0, 0, 0)'
fig.data[0].marker.line.width = 0.5
fig

### All the Channels togather

In [None]:
fig = go.Figure()

for idx, values in enumerate([red_values, green_values, blue_values]):
    if idx == 0:
        color = "Red"
    if idx == 1:
        color = "Green"
    if idx == 2:
        color = "Blue"
    fig.add_trace(go.Box(x=[color]*len(values), y=values, name=color, marker=dict(color=color.lower())))
    
fig.update_layout(yaxis_title="Mean value", xaxis_title="Color channel",
                  title="Mean value vs. Color channel", template="plotly_white")

In [None]:
fig = ff.create_distplot([red_values, green_values, blue_values],
                         group_labels=["R", "G", "B"],
                         colors=["red", "green", "blue"])
fig.update_layout(title_text="Distribution of Red,Blue,Green channel values", template="simple_white")
fig.data[0].marker.line.color = 'rgb(0, 0, 0)'
fig.data[0].marker.line.width = 0.5
fig.data[1].marker.line.color = 'rgb(0, 0, 0)'
fig.data[1].marker.line.width = 0.5
fig.data[2].marker.line.color = 'rgb(0, 0, 0)'
fig.data[2].marker.line.width = 0.5
fig

## Lets Catagorise the Images based on on Wheat heads 

In [None]:
df_img_wht_heads = train_df.groupby(['image_id']).agg({'source':'count'}).reset_index().rename(columns={'source':'wheat_head_cnt'})

In [None]:
df_img_wht_heads.head()

#### Based on 5 point summary , let catagorise the data

In [None]:
df_img_wht_heads['wheat_head_cnt'].describe(include="all")

In [None]:
sns.boxplot(df_img_wht_heads['wheat_head_cnt'])

There is an outlier in the total wheat head count in an image

In [None]:
def catagory(col):
    if col >= 0 and col <= 28 :
        ctg="Less_Wheat_heads"
    elif col <= 43 and col >= 28:
        ctg="Medium_Wheat_heads"
    elif col <= 59 and col >= 43:
        ctg="High_Wheat_heads"
    else:
        ctg="Extra_High_Wheat_heads"
    return ctg

In [None]:
def binary_catagory(col):
    if col >= 0 and col <= 43 :
        ctg=0
    else:
        ctg=1
    return ctg

In [None]:
df_img_wht_heads['Wheat_head_catagory']=df_img_wht_heads['wheat_head_cnt'].apply(catagory)
df_img_wht_heads['Wheat_heads_ctg_High_Low']=df_img_wht_heads['wheat_head_cnt'].apply(binary_catagory)

In [None]:
fig = go.Figure(go.Bar(x=df_img_wht_heads['Wheat_head_catagory'].value_counts().index, 
                       y=df_img_wht_heads['Wheat_head_catagory'].value_counts(),
                       marker_color='lightsalmon'))
fig.update_layout(title_text="Bar chart of Wheat Head Catagory", title_x=0.5)
fig.show()

## That's Great , 5 Point Summary Works at it best

In [None]:
df_img_wht_heads['Wheat_head_catagory'].unique()

In [None]:
df_img_wht_heads.head()

In [None]:
# generate binary values using get_dummies
df_img_wht_heads = pd.get_dummies(df_img_wht_heads, columns=["Wheat_head_catagory"],prefix="")

In [None]:
df_img_wht_heads.columns

In [None]:
fig = px.parallel_categories(df_img_wht_heads[['_Extra_High_Wheat_heads', '_High_Wheat_heads', '_Less_Wheat_heads','_Medium_Wheat_heads']], \
                             color="_Less_Wheat_heads", color_continuous_scale="sunset",\
                             title="Parallel categories plot of targets")
fig

### Observation :- 
#### *In the above plot, we can see the relationship between all four categories. As expected, it is impossible for a less wheat head (_less_what_head == 1) does not have high and extra high.*

## Understanding Evaluation Metrics


This competition is evaluated on the **mean average precision** at different intersection over union (IoU) thresholds.

`MAP(mean average precision)`: **mAP (mean average precision)** is the average of AP. In some context, we compute the AP for each class and average them. But in some context, they mean the same thing. For example, under the COCO context, there is no difference between AP and mAP.


![](https://i.stack.imgur.com/JlHnn.jpg)

> Important note: if there are no ground truth objects at all for a given image, ANY number of predictions (false positives) will result in the image receiving a score of zero, and being included in the mean average precision.




Please visit following links to know more about MAP
* https://www.kaggle.com/c/global-wheat-detection/overview/evaluation
* https://kharshit.github.io/blog/2019/09/20/evaluation-metrics-for-object-detection-and-segmentation
* https://towardsdatascience.com/breaking-down-mean-average-precision-map-ae462f623a52
* https://datascience.stackexchange.com/questions/25119/how-to-calculate-map-for-detection-task-for-the-pascal-voc-challenge
* https://www.kaggle.com/rohitsingh9990/eda-visualization-simple-baseline - Thanks to Rohit singh for his Kernal

## Conclusion

* Whole dataset is less then 1 GB ; It take less training time to build accurate model
* Intresting dataset to workwith , This Model helps Crop management in better ways

#### So it will be good competition Indeed 


<font color="red" size=3>Please upvote this kernel if you like it. It motivates me to create kernal with great content  :) </font>