****The Dataset of cassava plants annotated with the types of disease inflicting the plants.****
****
* [iCassava 2019 Fine-Grained Visual Categorization Challenge](https://arxiv.org/pdf/1908.02900.pdf) 

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import json
import cv2
import os

**Understanding train data**
* There are two columns - image ids and labels(0,1,2,3,4)
* there are 21,397 labeled data present as training dataset
* There are roughly 15,000 test images for evaluation

In [None]:
train_df = pd.read_csv("../input/cassava-leaf-disease-classification/train.csv")
print(train_df.head())
print(f"\nNumber of training images: {train_df.shape[0]}")

**About labels:**
* This is multi-class classification with labels 0,1,2,3 and 4.
* Each label values from 0-3 represent type of disease and label value 4 means its healthy plant leaf.
* Imbalanced Class distribution in the training dataset

In [None]:
#https://www.kaggle.com/isaienkov/cassava-leaf-disease-classification-data-analysis
img_size = {}
for filename in os.listdir('/kaggle/input/cassava-leaf-disease-classification/train_images/')[:100]:
    img = cv2.imread('/kaggle/input/cassava-leaf-disease-classification/train_images/' + filename)
    try:
        img_size[img.shape] += 1
    except:
        img_size[img.shape] = 1
        
print(f"Size of each the image: {img_size}")

In [None]:
with open("../input/cassava-leaf-disease-classification/label_num_to_disease_map.json") as f:
    class_dist = json.loads(f.read())
print(class_dist)
sns.countplot(y=train_df.astype(str).replace(class_dist).label)

**About Each disease and its sample images** |
The four major diseases affecting cassava and their major symptoms include:
*  Cassava mosaic disease (CMD) (class-3)
    - The most widespread cassava disease in sub-Saharan Africa.  
    - Foliar symptoms - mosaic, mottling, misshapen and twisted leaflets, and an overall reduction in size of leaves and plants, patches of normal green color mixed with different proportions of yellow and white depending on the severity





In [None]:
#helper function for plotting images
# https://www.kaggle.com/ihelon/cassava-leaf-disease-exploratory-data-analysis 
def image_view(image_ids,labels):
    plt.figure(figsize=(16, 8))
    for i,(img_id,lbl) in enumerate(zip(image_ids,labels)):
        plt.subplot(2,3,i+1)
        image = cv2.imread(os.path.join("../input/cassava-leaf-disease-classification/train_images",img_id))
        image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
        plt.imshow(image)
        plt.axis("off")
    plt.show()


In [None]:
sample = train_df[train_df.label==3].sample(6)
ids = sample.image_id.values
labels = sample.label.values
image_view(ids,labels)

*  **Cassava brown streak disease (CBSD)** (class-1)
    - CBSD is presently the most severe of the cassava diseases.
    - It is vectored by white flies and can also be transmitted through infected cuttings. 
    - leaf symptoms consist of a characteristic yellow or necrotic vein banding which may enlarge and coalesce to form comparatively large yellow patches.


In [None]:
sample = train_df[train_df.label==1].sample(6)
ids = sample.image_id.values
labels = sample.label.values
image_view(ids,labels)

* **Cassava bacterial blight(CBB)** (class-0)
    -  major bacterial disease which is common in moist areas
    - leaf symptoms include; black leaf spots and blights, angular leaf spots, and premature drying and shedding of leaves due to the wilting of young leaves and severe attack.


In [None]:
sample = train_df[train_df.label==0].sample(6)
ids = sample.image_id.values
labels = sample.label.values
image_view(ids,labels)

* **Cassava green mite (CGM)** (class-2)
    - This disease causes white spotting of leaves, which increase from the initial small spots to cover the entire leaf causing loss of chlorophyll.
    -  Leaves damaged by CGM may also show mottled symptoms which can be confused with symptoms of (CMD).

In [None]:
sample = train_df[train_df.label==2].sample(6)
ids = sample.image_id.values
labels = sample.label.values
image_view(ids,labels)

**Dataset Complexity**
1. The different image backgrounds and scales
2. The time of day the image was acquired
3. Multiple co-occurring diseases on one plant
4. The poor focus of some images.

In [None]:
submission = pd.read_csv("../input/cassava-leaf-disease-classification/sample_submission.csv")
print(f"Submission file : \n{submission.head()}")

[Train and inference efficientNet](https://www.kaggle.com/shivanandmn/efficientnet-pytorch-lightning-train-inference) - Notebook-1

[Train and inference simple CNN and Pytorch ](https://www.kaggle.com/shivanandmn/cnn-pytorch-lightning-beginners-model)- Notbook-2

