<a href="https://colab.research.google.com/github/mrparamvir/End-to-end-multi-class-Leaf-Classification/blob/master/End_to_end_Multi_class_Leaf_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  🍂🍁 End-to-end Multi-class Leaf Classification
This notebook builds an end-to-end multi class image classifier using TensorFlow 2.0 and TensorFlow Hub.

## 1. Problem

Identifying the species of plants given an image of a leaf. 


## 2. Data

The data we're using is from Kaggle's Leaf Classification competition.

https://www.kaggle.com/c/leaf-classification/data

## 3. Evaluation

The evaluation is a file with prediction probabilities for each leaf species of each test image.

https://www.kaggle.com/c/leaf-classification/overview/evaluation

## 4. Features

Some information about the data:
* We're dealing with images (unstructured data) so it's probably best we use deep learning/transfer learning.
* There are 99 species of plants (this means there are 99 different classes).
* There are around 900+ images in the training set (these images have species).
* There are around 500+ images in the test set (these images have no species, because we'll want to predict them).


In [None]:
# # Unzip the uploaded data into Google Drive
# !unzip "drive/My Drive/Leaf Classification/leaf-classification.zip" -d "drive/My Drive/Leaf Classification"

### Get our wrorkspace ready

* Import TensorFlow 2.x ✅
* Import TensorFlow Hub ✅
* Make sure we're using GPU ✅

In [None]:
# Import necessary tools
import tensorflow as tf
import tensorflow_hub as hub
print("TF version:", tf.__version__)
print("TF Hub version:", hub.__version__)

# Check for GPU availability
print("GPU", "available (YESSSS!!!) 😀" if tf.config.list_physical_devices("GPU") else "not available :(")

TF version: 2.2.0
TF Hub version: 0.8.0
GPU available (YESSSS!!!) 😀


## Getting our data ready (turning into Tensors)

With all machine learning models, our data has to be in numerical format. So that's what we'll be doing first. Turning our images into Tensors
(numerical representations).

Let's start by accessing our data checking out the labels.

In [None]:
# # Unzip the data in the Leaf Classification
# !unzip "drive/My Drive/Leaf Classification/images.zip" -d "drive/My Drive/Leaf Classification"
# !unzip "drive/My Drive/Leaf Classification/train.csv.zip" -d "drive/My Drive/Leaf Classification"
# !unzip "drive/My Drive/Leaf Classification/test.csv.zip" -d "drive/My Drive/Leaf Classification"
# !unzip "drive/My Drive/Leaf Classification/sample_submission.csv.zip" -d "drive/My Drive/Leaf Classification"

In [None]:
data= pd.read_csv("drive/My Drive/Leaf Classification/train.csv.zip",index_col=False)
test_data= pd.read_csv("drive/My Drive/Leaf Classification/test.csv.zip", index_col=False)
data.head(2)

Unnamed: 0,id,species,margin1,margin2,margin3,margin4,margin5,margin6,margin7,margin8,margin9,margin10,margin11,margin12,margin13,margin14,margin15,margin16,margin17,margin18,margin19,margin20,margin21,margin22,margin23,margin24,margin25,margin26,margin27,margin28,margin29,margin30,margin31,margin32,margin33,margin34,margin35,margin36,margin37,margin38,...,texture25,texture26,texture27,texture28,texture29,texture30,texture31,texture32,texture33,texture34,texture35,texture36,texture37,texture38,texture39,texture40,texture41,texture42,texture43,texture44,texture45,texture46,texture47,texture48,texture49,texture50,texture51,texture52,texture53,texture54,texture55,texture56,texture57,texture58,texture59,texture60,texture61,texture62,texture63,texture64
0,1,Acer_Opalus,0.007812,0.023438,0.023438,0.003906,0.011719,0.009766,0.027344,0.0,0.001953,0.033203,0.013672,0.019531,0.066406,0.0,0.029297,0.0,0.03125,0.011719,0.0,0.025391,0.023438,0.001953,0.0,0.015625,0.0,0.03125,0.0,0.013672,0.029297,0.015625,0.011719,0.003906,0.025391,0.0,0.001953,0.011719,0.009766,0.041016,...,0.008789,0.015625,0.044922,0.0,0.037109,0.012695,0.02832,0.0,0.019531,0.026367,0.005859,0.0,0.004883,0.016602,0.03418,0.056641,0.006836,0.000977,0.022461,0.037109,0.004883,0.021484,0.035156,0.000977,0.004883,0.015625,0.0,0.0,0.006836,0.037109,0.007812,0.0,0.00293,0.00293,0.035156,0.0,0.0,0.004883,0.0,0.025391
1,2,Pterocarya_Stenoptera,0.005859,0.0,0.03125,0.015625,0.025391,0.001953,0.019531,0.0,0.0,0.007812,0.003906,0.027344,0.023438,0.0,0.033203,0.0,0.009766,0.009766,0.007812,0.007812,0.019531,0.007812,0.0,0.0,0.007812,0.027344,0.003906,0.037109,0.007812,0.048828,0.054688,0.027344,0.003906,0.0,0.0,0.003906,0.013672,0.033203,...,0.050781,0.001953,0.021484,0.003906,0.027344,0.023438,0.0625,0.0,0.038086,0.0,0.019531,0.0,0.001953,0.003906,0.015625,0.004883,0.10449,0.0,0.061523,0.007812,0.008789,0.013672,0.011719,0.001953,0.035156,0.007812,0.0,0.0,0.053711,0.036133,0.000977,0.0,0.0,0.000977,0.023438,0.0,0.0,0.000977,0.039062,0.022461


In [None]:
#Checking Null values
obj = data.isnull().sum()
obj
# for key, value in obj.iteritems():
#   print(key,":",value)   

obj_2 = test_data.isnull().sum()
obj_2
# for key, value in obj_2.iteritems():
#   print(key,":",value)     

id           0
margin1      0
margin2      0
margin3      0
margin4      0
            ..
texture60    0
texture61    0
texture62    0
texture63    0
texture64    0
Length: 193, dtype: int64

In [None]:
obj2 = data['species'].value_counts()
obj2
# for key, value in obj2.iteritems():
#   print(key,":",value)

Ilex_Cornuta                 10
Fagus_Sylvatica              10
Quercus_Coccinea             10
Tilia_Platyphyllos           10
Celtis_Koraiensis            10
                             ..
Acer_Capillipes              10
Quercus_x_Turneri            10
Quercus_Cerris               10
Quercus_Phillyraeoides       10
Lithocarpus_Cleistocarpus    10
Name: species, Length: 99, dtype: int64

In [None]:
from sklearn.preprocessing import LabelEncoder
encoder=LabelEncoder()
le=encoder.fit(data.species)
labels=le.transform(data.species)
classes=list(le.classes_)
classes[:2]

['Acer_Capillipes', 'Acer_Circinatum']

In [None]:
data=data.drop(['id','species'],axis=1)
test_id=test_data.id
test_data=test_data.drop(['id'],axis=1)

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(data,labels,test_size=.2,shuffle=True,stratify=labels)

In [None]:
from sklearn.ensemble import ExtraTreesClassifier
lda = ExtraTreesClassifier(bootstrap=False,
                           ccp_alpha=0.0,
                           class_weight=None,
                           criterion='gini',
                           max_depth=60,
                           max_features='sqrt',
                           max_leaf_nodes=None,
                           max_samples=None,
                           min_impurity_decrease=0.0,
                           min_impurity_split=None,
                           min_samples_leaf=2,
                           min_samples_split=10,
                           min_weight_fraction_leaf=0.0,
                           n_estimators=195,
                           n_jobs=None, oob_score=False,
                           random_state=6713, verbose=0,
                           warm_start=False)

lda.fit(x_train,y_train)

ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                     criterion='gini', max_depth=60, max_features='sqrt',
                     max_leaf_nodes=None, max_samples=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=2, min_samples_split=10,
                     min_weight_fraction_leaf=0.0, n_estimators=195,
                     n_jobs=None, oob_score=False, random_state=6713, verbose=0,
                     warm_start=False)

In [None]:
lda.score(x_train,y_train), lda.score(x_test,y_test)

(1.0, 0.9696969696969697)

In [None]:
predicted=lda.predict_proba(test_data)

sample_df=pd.read_csv('drive/My Drive/Leaf Classification/sample_submission.csv.zip',index_col=False)
sample_df.head(2)

Unnamed: 0,id,Acer_Capillipes,Acer_Circinatum,Acer_Mono,Acer_Opalus,Acer_Palmatum,Acer_Pictum,Acer_Platanoids,Acer_Rubrum,Acer_Rufinerve,Acer_Saccharinum,Alnus_Cordata,Alnus_Maximowiczii,Alnus_Rubra,Alnus_Sieboldiana,Alnus_Viridis,Arundinaria_Simonii,Betula_Austrosinensis,Betula_Pendula,Callicarpa_Bodinieri,Castanea_Sativa,Celtis_Koraiensis,Cercis_Siliquastrum,Cornus_Chinensis,Cornus_Controversa,Cornus_Macrophylla,Cotinus_Coggygria,Crataegus_Monogyna,Cytisus_Battandieri,Eucalyptus_Glaucescens,Eucalyptus_Neglecta,Eucalyptus_Urnigera,Fagus_Sylvatica,Ginkgo_Biloba,Ilex_Aquifolium,Ilex_Cornuta,Liquidambar_Styraciflua,Liriodendron_Tulipifera,Lithocarpus_Cleistocarpus,Lithocarpus_Edulis,...,Quercus_Coccinea,Quercus_Crassifolia,Quercus_Crassipes,Quercus_Dolicholepis,Quercus_Ellipsoidalis,Quercus_Greggii,Quercus_Hartwissiana,Quercus_Ilex,Quercus_Imbricaria,Quercus_Infectoria_sub,Quercus_Kewensis,Quercus_Nigra,Quercus_Palustris,Quercus_Phellos,Quercus_Phillyraeoides,Quercus_Pontica,Quercus_Pubescens,Quercus_Pyrenaica,Quercus_Rhysophylla,Quercus_Rubra,Quercus_Semecarpifolia,Quercus_Shumardii,Quercus_Suber,Quercus_Texana,Quercus_Trojana,Quercus_Variabilis,Quercus_Vulcanica,Quercus_x_Hispanica,Quercus_x_Turneri,Rhododendron_x_Russellianum,Salix_Fragilis,Salix_Intergra,Sorbus_Aria,Tilia_Oliveri,Tilia_Platyphyllos,Tilia_Tomentosa,Ulmus_Bergmanniana,Viburnum_Tinus,Viburnum_x_Rhytidophylloides,Zelkova_Serrata
0,4,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,...,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101
1,7,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,...,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101,0.010101


In [None]:
df_sub=pd.DataFrame(predicted,columns=sample_df.columns[1:])
df_sub.head(2)

Unnamed: 0,Acer_Capillipes,Acer_Circinatum,Acer_Mono,Acer_Opalus,Acer_Palmatum,Acer_Pictum,Acer_Platanoids,Acer_Rubrum,Acer_Rufinerve,Acer_Saccharinum,Alnus_Cordata,Alnus_Maximowiczii,Alnus_Rubra,Alnus_Sieboldiana,Alnus_Viridis,Arundinaria_Simonii,Betula_Austrosinensis,Betula_Pendula,Callicarpa_Bodinieri,Castanea_Sativa,Celtis_Koraiensis,Cercis_Siliquastrum,Cornus_Chinensis,Cornus_Controversa,Cornus_Macrophylla,Cotinus_Coggygria,Crataegus_Monogyna,Cytisus_Battandieri,Eucalyptus_Glaucescens,Eucalyptus_Neglecta,Eucalyptus_Urnigera,Fagus_Sylvatica,Ginkgo_Biloba,Ilex_Aquifolium,Ilex_Cornuta,Liquidambar_Styraciflua,Liriodendron_Tulipifera,Lithocarpus_Cleistocarpus,Lithocarpus_Edulis,Magnolia_Heptapeta,...,Quercus_Coccinea,Quercus_Crassifolia,Quercus_Crassipes,Quercus_Dolicholepis,Quercus_Ellipsoidalis,Quercus_Greggii,Quercus_Hartwissiana,Quercus_Ilex,Quercus_Imbricaria,Quercus_Infectoria_sub,Quercus_Kewensis,Quercus_Nigra,Quercus_Palustris,Quercus_Phellos,Quercus_Phillyraeoides,Quercus_Pontica,Quercus_Pubescens,Quercus_Pyrenaica,Quercus_Rhysophylla,Quercus_Rubra,Quercus_Semecarpifolia,Quercus_Shumardii,Quercus_Suber,Quercus_Texana,Quercus_Trojana,Quercus_Variabilis,Quercus_Vulcanica,Quercus_x_Hispanica,Quercus_x_Turneri,Rhododendron_x_Russellianum,Salix_Fragilis,Salix_Intergra,Sorbus_Aria,Tilia_Oliveri,Tilia_Platyphyllos,Tilia_Tomentosa,Ulmus_Bergmanniana,Viburnum_Tinus,Viburnum_x_Rhytidophylloides,Zelkova_Serrata
0,0.001923,0.000733,0.0,0.000641,0.0,0.012656,0.0,0.0,0.000641,0.0,0.002051,0.0,0.003134,0.0,0.00057,0.0,0.001923,0.0,0.00057,0.0,0.005128,0.010806,0.0,0.0,0.0,0.0,0.002399,0.001374,0.0,0.0,0.0,0.014794,0.001465,0.0,0.0,0.000733,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.00057,0.0,0.091026,0.000641,0.008282,0.0,0.002727,0.00057,0.001026,0.0,0.005682,0.0,0.032027,0.099856,0.0,0.002877,0.03049,0.0,0.000641,0.0,0.0,0.128177,0.026962,0.0,0.001211,0.0,0.014849,0.0,0.0,0.0,0.0,0.004558,0.0,0.0,0.0,0.015582,0.001026
1,0.003663,0.000641,0.012821,0.027462,0.002564,0.005769,0.023917,0.002564,0.005037,0.0,0.0,0.0,0.002442,0.0,0.00057,0.0,0.0,0.002491,0.0,0.049674,0.0,0.000641,0.000733,0.0,0.0,0.0,0.0,0.001709,0.0,0.0,0.001026,0.000733,0.001282,0.0,0.0099,0.0,0.057285,0.003653,0.009402,0.009868,...,0.023028,0.0,0.004109,0.0,0.01198,0.000855,0.00114,0.0,0.001282,0.011679,0.051306,0.0,0.002613,0.0,0.00057,0.0,0.004943,0.040842,0.0,0.001923,0.009483,0.00057,0.00695,0.025539,0.004477,0.006296,0.028079,0.005454,0.02104,0.0,0.005085,0.014843,0.0,0.0,0.0,0.021272,0.007271,0.01964,0.001282,0.004772


In [None]:
df_sub1=pd.DataFrame(test_id)
df_sub1.head(2)

Unnamed: 0,id
0,4
1,7


In [None]:
final_sub=pd.concat([df_sub1,df_sub],axis=1)
final_sub.to_csv('drive/My Drive/Leaf Classification/leaf_classification_model_submission_1.csv',index=False)