![Wheat](http://cdn-a.william-reed.com/var/wrbm_gb_food_pharma/storage/images/1/9/4/3/2903491-1-eng-GB/Global-wheat-production-to-fall-in-2016-season-reports-FAO_wrbm_large.jpg)

"About this Competition"
"Supporting the shit for sake of the breads to have for the dinner"

> In this competition, you’ll detect wheat heads from outdoor images of wheat plants, including wheat datasets from around the globe. Using worldwide data, you will focus on a generalized solution to estimate the number and size of wheat heads. To better gauge the performance for unseen genotypes, environments, and observational conditions, the training dataset covers multiple regions. You will use more than 3,000 images from Europe (France, UK, Switzerland) and North America (Canada). The test data includes about 1,000 images from Australia, Japan, and China.

Wheat is a staple across the globe, which is why this competition must account for different growing conditions. Models developed for wheat phenotyping need to be able to generalize between environments. If successful, researchers can accurately estimate the density and size of wheat heads in different varieties. With improved detection farmers can better assess their crops, ultimately bringing cereal, toast, and other favorite dishes to your table.



**Urbanization, rising incomes and working women are driving a rapid rise in global wheat consumption. Models predict that by 2050 consumers will require 60 percent more wheat than today. Challenges are big: this demand must be met without opening new land and with better use of fertilizer, water, and labor.**

![Wheat](https://wheat.org/wp-content/uploads/sites/4/2014/10/LoadingOven-08-890x1024.jpg)

In [None]:
#BASIC
import numpy as np 
import pandas as pd 
import os
import cv2
import re
from tqdm.notebook import tqdm
from PIL import Image
import hashlib
import plotly.graph_objects as go
import matplotlib.patches as patches
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image, display
from plotly import graph_objs as go
import plotly.express as px
import plotly.figure_factory as ff

In [None]:
DIR = "../input/global-wheat-detection/"
TRAIN = "train.csv"

TRAIN_IMG = "train"
TEST_IMG= "test"
WIDTH = 1024
HEIGHT = 1024

TRAIN_IMAGES = [os.path.join(DIR, "train", fname) for fname in os.listdir(os.path.join(DIR, "train"))]
TEST_IMAGES = [os.path.join(DIR, "test", fname) for fname in os.listdir(os.path.join(DIR, "test"))]

train_df = pd.read_csv(os.path.join(DIR, TRAIN))


In [None]:
train_df.info()

In [None]:
train_df.head()

In [None]:
print("unique ids : ", len(train_df.image_id.unique()))
print("unique width : ", len(train_df.width.unique()))
print("unique height : ", len(train_df.height.unique()))
print("unique source : ", len(train_df.source.unique()))

In [None]:
train_df.source.value_counts()
#7 unique sources of wheat head images

> **Total Training & Testing  Images **

In [None]:
print(f"Total training images: {len(TRAIN_IMAGES)}")
print(f"Total test images: {len(TEST_IMAGES)}")


**What only ten training images**

> **EDA**

In [None]:
bbox_wrt_source = train_df.groupby(["source"]).apply(lambda x:x["image_id"].value_counts().mean())

In [None]:
bbox_wrt_source

In [None]:
source=train_df['source'].value_counts()
fig = go.Figure(data=[
    go.Pie(labels=source.index, values=source.values)
])

fig.update_layout(title='Source distribution for data')
fig.show()

**Spread of bounding boxes per image is:**

In [None]:
plt.figure(figsize=(12, 8))
sns.distplot(train_df['image_id'].value_counts().values)
plt.show()

In [None]:
plt.figure(figsize=(12, 8))
bbox_wrt_source.plot(kind='bar')
plt.show()

> Area per image

In [None]:
area_per_image = train_df.groupby("image_id").apply(lambda x: (x["width"]*x["height"]).sum()/(WIDTH*HEIGHT))
plt.figure(figsize=(10, 6))
plt.title("Area % for each image.")
print(f"Min area per image: {area_per_image.min()}%")
print(f"Max area per image: {area_per_image.max()}%")
print(f"Mean area per image: {area_per_image.mean()}%")
print(f"Std area per image: {area_per_image.std()}%")
sns.distplot(area_per_image)
plt.show()

> Normally/ Gaussian distribution of % area

In [None]:
# Visualize few samples of current training dataset
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(20, 10))
count=0
for row in ax:
    for col in row:
        img = plt.imread(f'{os.path.join(DIR, "train", train_df["image_id"].unique()[count])}.jpg')
        col.grid(False)
        col.set_xticks([])
        col.set_yticks([])
        col.imshow(img)
        count += 1
plt.show()

> Extracting Dimensions of bounding boxes in data frame

In [None]:
import ast
train_df[['x_min','y_min', 'width', 'height']] = pd.DataFrame([ast.literal_eval(x) for x in train_df.bbox.tolist()], index= train_df.index)
train_df = train_df[['image_id', 'bbox', 'source', 'x_min', 'y_min', 'width', 'height']]
train_df

*Let **Visulize** BBOX For Corresponding Images of Wheat Heads *

In [None]:
def get_bbox(image_id, df, col, color='white'):
    bboxes = df[df['image_id'] == image_id]
    
    for i in range(len(bboxes)):
        # Create a Rectangle patch
        rect = patches.Rectangle(
            (bboxes['x_min'].iloc[i], bboxes['y_min'].iloc[i]),
            bboxes['width'].iloc[i], 
            bboxes['height'].iloc[i], 
            linewidth=2, 
            edgecolor=color, 
            facecolor='none')

        # Add the patch to the Axes
        col.add_patch(rect)
    

In [None]:
# Visualize few samples of current training dataset
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(20, 20))
count=0
for row in ax:
    for col in row:
        img_id = train_df["image_id"].unique()[count]
        img = plt.imread(f'{os.path.join(DIR, "train", img_id)}.jpg')
        col.grid(False)
        col.set_xticks([])
        col.set_yticks([])
        get_bbox(img_id, train_df, col, color='red')
        col.imshow(img)
        count += 1
plt.show()
