### DOMAIN: 
Botanical Research
### • CONTEXT: 
University X is currently undergoing some research involving understanding the characteristics of plant and plant seedlings at various stages of growth. They already have have invested on curating sample images. They require an automation which can create a classifier capable of determining a lant's species from a photo.
### • DATA DESCRIPTION: 
The dataset comprises of images from 12 plant species.
Source: https://www.kaggle.com/c/plant-seedlings-classification/data.
### • PROJECT OBJECTIVE: 
To create a classifier capable of determining a plant's species from a photo.


### 1. Import and Understand the data
#### A. Extract ‘plant-seedlings-classification.zip’ into new folder (unzipped) using python

In [2]:
pip install opencv-python

Collecting opencv-python
  Downloading opencv_python-4.7.0.68-cp37-abi3-win_amd64.whl (38.2 MB)
Installing collected packages: opencv-python
Successfully installed opencv-python-4.7.0.68
Note: you may need to restart the kernel to use updated packages.


In [1]:
# Import Basic Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import os
import tensorflow as tf
print(tf.__version__) 
import cv2
from glob import glob

2.9.1


In [7]:
# Import ZipFile module
from zipfile import ZipFile
  
# specifying the zip file name
file_name = "C:\\Users\\laxmimanasaa\\Downloads\\plant-seedlings-classification (2).zip"
  
# opening the zip file in READ mode
with ZipFile(file_name, 'r') as zip:
    # printing all the contents of the zip file
    zip.printdir()
  
    # extracting all the files
    print('Extracting all the files now...')
    zip.extractall('unzipped')
    print('Done!')

File Name                                             Modified             Size
plant-seedlings-classification/                2021-10-08 11:16:30            0
plant-seedlings-classification/.DS_Store       2021-10-08 11:16:36         6148
__MACOSX/plant-seedlings-classification/._.DS_Store 2021-10-08 11:16:36          120
plant-seedlings-classification/train/          2021-10-05 16:09:24            0
plant-seedlings-classification/train/Cleavers/ 2021-10-05 16:09:26            0
plant-seedlings-classification/train/.DS_Store 2021-10-06 16:30:12        10244
__MACOSX/plant-seedlings-classification/train/._.DS_Store 2021-10-06 16:30:12          120
plant-seedlings-classification/train/Sugar beet/ 2021-10-05 15:12:52            0
plant-seedlings-classification/train/Common Chickweed/ 2021-10-05 15:12:46            0
plant-seedlings-classification/train/Loose Silky-bent/ 2021-10-05 15:12:48            0
plant-seedlings-classification/train/Scentless Mayweed/ 2021-10-05 15:12:50           

plant-seedlings-classification/train/Scentless Mayweed/6d80eac2a.png 2021-10-05 15:49:14        14779
plant-seedlings-classification/train/Scentless Mayweed/866893cf2.png 2021-10-05 15:49:14         8552
plant-seedlings-classification/train/Scentless Mayweed/549c186a8.png 2021-10-05 15:49:14         6231
plant-seedlings-classification/train/Scentless Mayweed/8842741cb.png 2021-10-05 15:49:14         8498
plant-seedlings-classification/train/Scentless Mayweed/b15980d50.png 2021-10-05 15:49:14        35762
plant-seedlings-classification/train/Scentless Mayweed/a0a13a1fe.png 2021-10-05 15:49:14        47403
plant-seedlings-classification/train/Scentless Mayweed/1ed148332.png 2021-10-05 15:49:14        17496
plant-seedlings-classification/train/Scentless Mayweed/9ab3b61db.png 2021-10-05 15:49:14       825803
plant-seedlings-classification/train/Scentless Mayweed/18387e60f.png 2021-10-05 15:49:14         4530
plant-seedlings-classification/train/Scentless Mayweed/8f2534b22.png 2021-10-05 15

plant-seedlings-classification/train/Charlock/4e0cef11d.png 2021-10-05 15:49:08      1171304
plant-seedlings-classification/train/Charlock/2fd604008.png 2021-10-05 15:49:08        39000
plant-seedlings-classification/train/Charlock/1b534df5b.png 2021-10-05 15:49:08       238086
plant-seedlings-classification/train/Charlock/c61d3ee3c.png 2021-10-05 15:49:10       335661
plant-seedlings-classification/train/Charlock/a8e7520de.png 2021-10-05 15:49:10      2338976
plant-seedlings-classification/train/Charlock/d733b32d8.png 2021-10-05 15:49:10       914348
plant-seedlings-classification/train/Charlock/c5cca5955.png 2021-10-05 15:49:10       314121
plant-seedlings-classification/train/Charlock/8b3f0fba7.png 2021-10-05 15:49:08       241803
plant-seedlings-classification/train/Charlock/f340a3378.png 2021-10-05 15:49:10      1934121
plant-seedlings-classification/train/Charlock/0fa930fa9.png 2021-10-05 15:49:08      2588434
plant-seedlings-classification/train/Charlock/ae813adcd.png 2021-10-05

Done!


### B. Map the images from train folder with train labels to form a DataFrame.

In [20]:
# There are 12 classes/categories and folders in the train folder
# Create list of folder names
folders = ['Black-grass', 'Charlock', 'Cleavers', 'Common Chickweed', 'Common wheat', 'Fat Hen', 'Loose Silky-bent',
              'Maize', 'Scentless Mayweed', 'Shepherds Purse', 'Small-flowered Cranesbill', 'Sugar beet']

# Number of folders determine number of classes/categories for classification
num_folders = len(folders)
num_folders

12

In [33]:
# Check number of images in each folder/category
data_dir = "C:\\Users\\laxmimanasaa\\Downloads\\plant-seedlings-classification\\plant-seedlings-classification"
train_dir = os.path.join(data_dir, 'train')
for category in folders:
    print('{} {} images'.format(category, len(os.listdir(os.path.join(train_dir, category)))))

Black-grass 263 images
Charlock 390 images
Cleavers 287 images
Common Chickweed 611 images
Common wheat 221 images
Fat Hen 475 images
Loose Silky-bent 654 images
Maize 221 images
Scentless Mayweed 516 images
Shepherds Purse 231 images
Small-flowered Cranesbill 496 images
Sugar beet 385 images


In [34]:
# Create dataframe
train = []
for category_id, category in enumerate(folders):
    for file in os.listdir(os.path.join(train_dir, category)):
        train.append(['train/{}/{}'.format(category, file), category_id, category])
train = pd.DataFrame(train, columns=['Name of Image', 'Type of Image', 'Actual Image'])
train.head()

Unnamed: 0,Name of Image,Type of Image,Actual Image
0,train/Black-grass/0050f38b3.png,0,Black-grass
1,train/Black-grass/0183fdf68.png,0,Black-grass
2,train/Black-grass/0260cffa8.png,0,Black-grass
3,train/Black-grass/05eedce4d.png,0,Black-grass
4,train/Black-grass/075d004bc.png,0,Black-grass


In [35]:
# Check shape of dataframe
train.shape

(4750, 3)

### C. Write a function that will select n random images and display images along with its species.

In [None]:
# Define Function
def PlotNRandomImages(N=5):
    temp = train.drop(['Type of Image'], axis=1)
    temp = temp.sample(N)
    temp = temp.values.tolist()
    
    for i, j in temp:
        plt.imshow(cv2.imread(data_dir+i));
        plt.title(j)
        plt.axis('off')
        plt.show()
    
# Call Function
PlotNRandomImages()

### 2. Data preprocessing
#### A. Create X & Y from the DataFrame.

In [38]:
# Load Images
images_path = "C:\\Users\\laxmimanasaa\Downloads\\plant-seedlings-classification\\plant-seedlings-classification\\train\\*.png"
images = glob(images_path)
train_images = []
train_labels = []

for img in images:
    train_images.append(cv2.imread(img))
    train_labels.append(img.split('/')[-2])
X = np.asarray(train_images)
Y = pd.DataFrame(train_labels)

X.shape, Y.shape

((0,), (0, 0))