<div style='color: green'><center>
    <h1 style='color: red'>Cassava Leaf Analysis </h1>
    Link :  <a href='https://www.kaggle.com/jitshil143/cassava-leaf-analysis-python'>Cassava notebook link</a>  <br/>
    Author  :  <a href='https://www.kaggle.com/jitshil143'>JitShil</a>
    </center>
</div>

<h1 style='color: green'> About Competiton </h1>

As the second-largest provider of carbohydrates in Africa, cassava is a key food security crop grown by smallholder farmers because it can withstand harsh conditions. At least 80% of household farms in Sub-Saharan Africa grow this starchy root, but viral diseases are major sources of poor yields. With the help of data science, it may be possible to identify common diseases so they can be treated.

Existing methods of disease detection require farmers to solicit the help of government-funded agricultural experts to visually inspect and diagnose the plants. This suffers from being labor-intensive, low-supply and costly. As an added challenge, effective solutions for farmers must perform well under significant constraints, since African farmers may only have access to mobile-quality cameras with low-bandwidth.
![Cassava Leaf](https://image.shutterstock.com/image-photo/cassava-leaves-my-garden-260nw-1399141463.jpg)

                        fig.1: Cassava Leaf
                        
                        
                        
### Task
classify each cassava image into four disease categories or a fifth category indicating a healthy leaf.

classes Name

0. CBB (Cassava Bactrial Blight)
1. CBSD (Cassava Brown Streak Disease)
2. CGM (Cassava Green Mottle)
3. CMD (Cassava Mosaic Disease)
4. Healthy 

<center>
<h1 style='color: green'>Work Flow </h1>
</center>

<div>
    <center>
    <h3><a href='#one'>Import Libs.</a></h3>
    <h3><a href='#two'>Working Directory</a></h3>    
    <h3><a href='#three'>Data Visualization</a></h3>
    <h3><a href='#four'>Image segmentation</a></h3>
    <h3><a href='#five'>Image Agumentation</a></h3>
    </center>
</div>




<h1 id='one' style='color: green'> Import Library </h1>
A programming library is simply code that is already written, already tested, and ready for you to link to and use. Why are libraries useful? These can greatly reduce the amount of time to write code.

In [None]:
import os
import pandas as pd
import numpy as np
import seaborn as sb
from PIL import Image
import matplotlib.pyplot as plt
import json
import cv2
from skimage import io, img_as_ubyte
from skimage import util
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from skimage.filters import threshold_multiotsu
from skimage import  io, img_as_ubyte


<div id ='two'><h1  style='color: green'> Define Working Directory </h1></div>

In computing, the working directory of a process is a directory of a hierarchical file system.

In [None]:
main_dir = '../input/cassava-leaf-disease-classification/'
os.listdir(main_dir) 
train_img_path = '../input/cassava-leaf-disease-classification/train_images'

<h1 id='three' style='color: green'> Data visualization</h1>

Data visualization is the practice of translating information into a visual context, such as a map or graph, to make data easier for the human brain to understand and pull insights from. The main goal of data visualization is to make it easier to identify patterns, trends and outliers in large data sets.

## Load  CSV file

CSV is a simple file format used to store tabular data, such as a spreadsheet or database

In [None]:
dataframe = pd.read_csv(main_dir+'train.csv')

imgs_id = dataframe['image_id'].values
imgs_label = dataframe['label'].values
print('Head of the CSV file \n\n',dataframe.head())
print('\n Lenght of the image label : ',len(dataframe['label']))
print('\n length of the image id :', len(dataframe['image_id']))


## Import JSON File

JavaScript Object Notation (JSON) is a standard text-based format for representing structured data .

In [None]:
js = open(main_dir + 'label_num_to_disease_map.json')
real_classes = json.load(js)
real_classes = {int(k):v for k,v in real_classes.items()}
dataframe['class_name'] = dataframe.label.map(real_classes)

print(dataframe['class_name'])                
                
                

## Image With Class Label

Using matplotlib images and their labels are displayed.

In [None]:
def show_image(img_ids, img_classes):
    plt.figure(figsize=(16, 12))
    
    for i, (img_id, img_class) in enumerate(zip(img_ids, img_classes)):
        plt.subplot(3, 3, i + 1)
        image = cv2.imread(os.path.join(main_dir, "train_images", img_id))
#         image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = image[:, :, ::-1]
        
        plt.imshow(image)
        plt.title(f"Class: {img_class}", fontsize=12)
        plt.axis("off")
    
    plt.show()

In [None]:

new_df = dataframe.sample(9)
img_ids = new_df["image_id"].values
labels = new_df["class_name"].values

show_image(img_ids, labels)

## Histogram
A histogram is a graphical display of data using bars of different heights.

Its make easiler to understand , how much images have every classes and can compare it.

In [None]:
def hist_graph(y):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.hist(y, color='blue', linewidth =3)
    plt.title('Amount of images in five classes', fontsize=15)
    plt.xticks(np.arange(5))
    plt.xlabel('Classes')
    plt.ylabel('No. of images')   
    plt.show()   
    

print(real_classes)    
hist_graph( dataframe['label'])

        

<h1 id='four' style='color: green'>Image Segmentation</h1>


Segmentation of an image is in practice for the classification of image pixel . Segmentation techniques are used to isolate the desired object from the image in order to perform analysis of the object.

### Forground Extraction (GrabCut)

GrabCut is an image segmentation method based on graph cuts. Starting with a user-specified bounding box around the object to be segmented, the algorithm estimates the color distribution of the target object and that of the background using a Gaussian mixture model.

In [None]:

    
def fg_extrac(img_ids, img_classes):
    plt.figure(figsize=(20, 16))
    for i, (img_id, img_class) in enumerate(zip(img_ids, img_classes)):
        plt.subplot(2,2, i + 1)
        
        src = train_img_path+'/'+img_id
        img= io.imread(src)
        mask = np.zeros(img.shape[:2],np.uint8)
        bgdModel = np.zeros((1,65),np.float64)
        fgdModel = np.zeros((1,65),np.float64)
        rect = (10,10,750,550)#left,top,right,bottom
        cv2.grabCut(img,mask,rect,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_RECT)
        mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8')
        img = img*mask2[:,:,np.newaxis]               
            
        plt.imshow(img)
        plt.title(f"Class: {img_class}", fontsize=12)
    plt.show()

In [None]:
df = dataframe[10:14]
img_ids = df["image_id"].values
labels = df["class_name"].values
fg_extrac(img_ids, labels)

### GRAY

Gray is a cool, neutral, and balanced color. The color gray is an emotionless, moody color that is typically associated with meanings of dull, dirty, and dingy, as well as formal, conservative, and sophisticated.

In [None]:

    
def gray(img_ids, img_classes):
    plt.figure(figsize=(16, 12))
    
    for i, (img_id, img_class) in enumerate(zip(img_ids, img_classes)):
        plt.subplot(2, 3, i + 1)
        src = train_img_path+'/'+img_id
        img= io.imread(src)
#         gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        gray = img[:, :, ::-1]
    
        plt.imshow(gray)
        plt.title(f"Class: {img_class}", fontsize=12)
        plt.axis("off")
    
    plt.show()

          

In [None]:
df = dataframe.sample(6)
img_ids = df["image_id"].values
labels = df["class_name"].values

gray(img_ids, labels)

### Morphological transformations

Morphological transformations  are some simple opera ons based on the image shape. It is
normally performed on binary images. It needs two inputs, one is our original image, second
one is called structuring element or kernel which decides the nature of opera on.


In [None]:
    
def thresh(img_ids, img_classes):
    plt.figure(figsize=(20, 16))
    
    for i, (img_id, img_class) in enumerate(zip(img_ids, img_classes)):
        plt.subplot(2, 3, i + 1)
        src = train_img_path+'/'+img_id
        img= io.imread(src)
        kernel = np.ones((5,5),np.uint8)
        morpo = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)
        plt.imshow(morpo)
        plt.title(f"Class: {img_class}", fontsize=12)
        plt.axis("off")
    
    plt.show()

          


In [None]:
thresh(img_ids, labels)

### Skimage

Scikit-image, or skimage, is an open source Python package designed for image preprocessing.


In [None]:


    
def color_manipulation(img_ids, img_classes):
    plt.figure(figsize=(16, 12))
    
    for i, (img_id, img_class) in enumerate(zip(img_ids, img_classes)):
        plt.subplot(2,3, i + 1)
        src = train_img_path+'/'+img_id
        img= io.imread(src)
        inverted_img = util.invert(img)
        plt.imshow(inverted_img)
        plt.title(f"Class: {img_class}", fontsize=12)
        plt.axis("off")
    
    plt.show()

          

In [None]:
df = dataframe[2010:2016]
img_ids = df["image_id"].values
labels = df["class_name"].values

color_manipulation(img_ids, labels)

### Thresold

Image thresholding is a simple form of image segmentation. It is a way to create a binary image from a grayscale or full-color image. This is typically done in order to separate "object" or foreground pixels from background pixels to aid in image processing.


In [None]:
 

    
def thresold(img_ids, img_classes):
    plt.figure(figsize=(16, 12))
    
    for i, (img_id, img_class) in enumerate(zip(img_ids, img_classes)):
        plt.subplot(2,3, i + 1)
        src = train_img_path+'/'+img_id
        img= io.imread(src)
        thresh = cv2.threshold(img,170,255,cv2.THRESH_BINARY)[1]
        
        plt.imshow(thresh)
        plt.title(f"Class: {img_class}", fontsize=12)
        plt.axis("off")
    
    plt.show()

          

In [None]:

thresold(img_ids, labels)

## Color Intensity

In [None]:
def color_intensity(img_ids, img_classes):
    plt.figure(figsize=(16, 12))
    
    for i, (img_id, img_class) in enumerate(zip(img_ids, img_classes)):
        plt.subplot(2,3, i + 1)
        src = train_img_path+'/'+img_id
        img= io.imread(src)
        hist = plt.hist(img.ravel(), bins = 256, color = 'orange', )
        hist = plt.hist(img[:, :, 0].ravel(), bins = 256, color = 'red', alpha = 0.5)
        hist = plt.hist(img[:, :, 1].ravel(), bins = 256, color = 'Green', alpha = 0.5)
        hist = plt.hist(img[:, :, 2].ravel(), bins = 256, color = 'Blue', alpha = 0.5)
        hist = plt.xlabel('Intensity Value')
        hist = plt.ylabel('Count')
        hist = plt.legend(['Total', 'Red Channel', 'Green Channel', 'Blue Channel'])
        
        plt.title(f"Class: {img_class}", fontsize=12)
    
    plt.show()

In [None]:
color_intensity(img_ids, labels)

# Image data split between train and validation


In [None]:


train,val = train_test_split(dataframe, test_size = 0.3, random_state = 42, stratify = dataframe['class_name'])


<h1 id='five' style='color:green'> Data Agumentation </h1>
Data augmentation in data analysis are techniques used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data. It acts as a regularizer and helps reduce overfitting when training a machine learning model.

![Data Agumentation](https://www.researchgate.net/profile/Geoff_Nitschke/publication/319210096/figure/fig1/AS:631669694402561@1527613199045/Overview-of-the-Data-Augmentation-DA-methods-evaluated.png)

                        Fig.2: After agumentation image

In [None]:

img_row= 400
img_col=400

train_datagen = ImageDataGenerator(rescale = 1/255.0,
                            rotation_range = 40,
                            width_shift_range = 0.4,
                            height_shift_range = 0.4,
                            shear_range = 0.4,
                            zoom_range = 0.1,
                            horizontal_flip = True,
                            vertical_flip = True,
                            featurewise_center = True,
                            samplewise_center = True,       
                            featurewise_std_normalization= True,
                            samplewise_std_normalization = True,       
                            fill_mode = 'nearest')

validation_datagen = ImageDataGenerator(rescale=1.0/255)

train_generator = train_datagen.flow_from_dataframe(train,
                                                directory = train_img_path,
                                                x_col = 'image_id',
                                                y_col = 'class_name',
                                                target_size = (img_row,img_col),
                                                color_mode = 'rgb',
                                                class_mode = 'categorical',
                                                interpolation = 'nearest',
                                                shuffle = True,
                                                batch_size = 64, 
                                                )


validation_generator = validation_datagen.flow_from_dataframe(val,
                                                directory = train_img_path,
                                                x_col = 'image_id',
                                                y_col = 'class_name',
                                                target_size = (img_row,img_col),
                                                color_mode = 'rgb',
                                                class_mode = 'categorical',
                                                interpolation = 'nearest',
                                                shuffle = True,
                                                batch_size = 64, 
                                                )


## Train and validatiion classes indices

In [None]:
print('train classes Name and indices: \n',train_generator.class_indices)
print('validation classes Name and indices: \n',validation_generator.class_indices)

## Cassava leaf after agumentation 

In [None]:
for image, label in validation_generator:
    plt.figure(figsize= (16,12))
    print(image.shape)
    for i in range(0,6):
        plt.subplot(2,3,1+i)
        plt.imshow(image[i],  cmap ='gray')    
    plt.show()
    break

# Reference

https://www.wikipedia.org/
https://stackoverflow.com/
https://matplotlib.org/
https://pandas.pydata.org/
https://www.python.org/doc/

# Thank you for reading 


### Work in progress.......