# CASSAVA LEAF DISEASE CLASSIFICATION

As the second-largest provider of carbohydrates in Africa, cassava is a key food security crop grown by smallholder farmers because it can withstand harsh conditions. At least 80% of household farms in Sub-Saharan Africa grow this starchy root, but viral diseases are major sources of poor yields. With the help of data science, it may be possible to identify common diseases so they can be treated.

source: https://www.kaggle.com/c/cassava-leaf-disease-classification/overview

## Data Preparation

### stream data using API
Before we can import data from Kaggle to google.colab, we need to download the API token by Login to Kaggle > My Account > Home > Create New API Token. The API token wil be downloaded in the format of kaggle.json, then we need to upload it to goole colab hosted runtime.

In [None]:
# upload your 'kaggle.json' to hosted runtime
from google.colab import files
files.upload()

we need to do several adjustment such as installing kaggle library using pip and so on, until we adjust the access permisions

In [None]:
# Install kaggle library 
!pip install -q kaggle
# Make ".kaggle" directory in root directory
!mkdir -p ~/.kaggle
# Copy the API token to the kaggle directory
!cp kaggle.json ~/.kaggle/
# Check the directory
!ls ~/.kaggle
# Adjust access permissions
!chmod 600 /root/.kaggle/kaggle.json

# WTF reinstall kaggle
!pip uninstall -y kaggle
!pip install kaggle

The cell bellow contains the command to download the data into your hosted directory (google server). Basicly you just migrated the whole dataset from kaggle's server into google's server.

**Specificly For competition** open your kaggle.json and change kaggle user name and key accordingly

In [None]:
import os

# authentication 
# os.environ["KAGGLE_USERNAME"] = 'reizkianyesaya'
# os.environ["KAGGLE_KEY"] = '92844990e0f3c2306230314f99e0e015'

# migrate data from kaggle to google
!mkdir downloaded-data
!kaggle competitions download -c cassava-leaf-disease-classification -p downloaded-data

In [None]:
# check the folders inside your directory
!ls
# check what is inside data folder
!ls downloaded-data

In [None]:
# unzip the data
!apt-get install -y fuse-zip
!mkdir {"extracted-data"}
!fuse-zip downloaded-data/cassava-leaf-disease-classification.zip extracted-data

In [None]:
!ls extracted-data

data has been extracted in the folder "extracted-data"

In [None]:
os.listdir("extracted-data")

### data cleaning
The train data inside 'extracted-data/train_images' are scattered. They are not organized in specific folders for each different classes. Although, the label for each specific images can be obtained by refering to the table in **'extracted-data/train.csv'**. **To do:** we need to separate each images according to their class

In [None]:
# reading the train.csv as reference label
import pandas as pd
import json

BASE_DIR = "extracted-data"

# read disease classes name in json file
with open(os.path.join(BASE_DIR, "label_num_to_disease_map.json")) as file:
    map_classes = json.loads(file.read())
print(json.dumps(map_classes, indent=4))

# implement classesname in train_csv
train_csv = pd.read_csv(os.path.join(BASE_DIR, "train.csv"))
train_csv["class_name"] = train_csv["label"].astype(str).map(map_classes)

In [None]:
train_csv

In [None]:
# extract image_id and label_id to list
image_id = train_csv["image_id"].tolist()
label_id = train_csv["label"].tolist()

In [None]:
# Create directory for specific class
# allert! run this code once (bellow are LINUX command line to run on google server)
!mkdir extracted-data/train_images/CBB
!mkdir extracted-data/train_images/CBSD
!mkdir extracted-data/train_images/CGM
!mkdir extracted-data/train_images/CMD
!mkdir extracted-data/train_images/Healthy

In [None]:
# check if the folders were created in 'extracted-data/train_images' directory
!ls extracted-data/train_images

In [None]:
# Iterate to move the unorganized images in train_images directory to specific class folders
import shutil

# create folders for specific classes
TRAIN_DIR = os.path.join(BASE_DIR,"train_images")
TRAIN_DIR_CBB = os.path.join(TRAIN_DIR,"CBB")
TRAIN_DIR_CBSD = os.path.join(TRAIN_DIR,"CBSD")
TRAIN_DIR_CGM = os.path.join(TRAIN_DIR,"CGM")
TRAIN_DIR_CMD = os.path.join(TRAIN_DIR,"CMD")
TRAIN_DIR_Healthy = os.path.join(TRAIN_DIR,"Healthy")

# move the images file to specific folder by IF CONDITION
for ImageFileName,ImageLabel in zip(image_id,label_id):
  if ImageLabel == 0:
    shutil.move(os.path.join(TRAIN_DIR,ImageFileName), os.path.join(TRAIN_DIR_CBB,ImageFileName))
  if ImageLabel == 1:
    shutil.move(os.path.join(TRAIN_DIR,ImageFileName), os.path.join(TRAIN_DIR_CBSD,ImageFileName))
  if ImageLabel == 2:
    shutil.move(os.path.join(TRAIN_DIR,ImageFileName), os.path.join(TRAIN_DIR_CGM,ImageFileName))
  if ImageLabel == 3:
    shutil.move(os.path.join(TRAIN_DIR,ImageFileName), os.path.join(TRAIN_DIR_CMD,ImageFileName))
  if ImageLabel == 4:
    shutil.move(os.path.join(TRAIN_DIR,ImageFileName), os.path.join(TRAIN_DIR_Healthy,ImageFileName))

In [None]:
# check if the 'extracted-data/train_images' only contain CLASSES FOLDER
!ls extracted-data/train_images