# Computer Vision: Plants Classification

This dataset is based on the **Plant Seedlings Dataset**, which contains images of approximately 960 unique plants belonging to 12 species at several growth stages, with a resolution of about 10 pixels per mm of annotated RGB images.

The dataset includes the following species:


|English     |Latin               |EPPO|
|:-----------|:-------------------|:---|
|Maize       |Zea mays L.         |ZEAMX|
|Common wheat|Triticum aestivum L.|TRZAX|
|Sugar beet|Beta vulgaris var. altissima|BEAVA|
|Scentless Mayweed|Matricaria perforata Mérat|MATIN|
|Common Chickweed|Stellaria media|STEME|
|Shepherd’s Purse|Capsella bursa-pastoris|CAPBP|
|Cleavers|Galium aparine L.|GALAP|
|Charlock|Sinapis arvensis L.|SINAR|
|Fat Hen|Chenopodium album L.|CHEAL|
|Small-flowered Cranesbill|Geranium pusillum|GERSS|
|Black-grass|Alopecurus myosuroides|ALOMY|
|Loose Silky-bent|Apera spica-venti|APESV|

Your mission, should you choose to accept it... consist on:
- create a model that classifies the full range of categories as accuretely as possible.
- save the model for further analysis.

If you're caught of killed during the mission, the dataen team will disavow any knowledge of your actions. This notebook will not self-destruct (disappointing right?). Good luck!


In [22]:
%matplotlib inline

import os
import sys
from time import time
import pickle
import pathlib
import itertools
from tqdm import tqdm_notebook as tqdm
import numpy as np
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
import pandas_profiling

np.random.seed(42)

## 1. Data Preparation
### 1.1 Load data

In [5]:
PLANT_CLASSES = ['Black-grass', 'Charlock', 'Cleavers', 'Common Chickweed', 'Common wheat', 
                 'Fat Hen', 'Loose Silky-bent', 'Maize', 'Scentless Mayweed', 
                 'Shepherds Purse', 'Small-flowered Cranesbill', 'Sugar beet']
CLASSES_DICT_NAMES = {name: k for k, name in zip(range(len(PLANT_CLASSES)), PLANT_CLASSES)}
CLASSES_DICT_NUM = {k: name for k, name in zip(range(len(PLANT_CLASSES)), PLANT_CLASSES)}

DF_PART1 = "./data/plants_part1.gz"
DF_PART2 = "./data/plants_part2.gz"
DF_PART3 = "./data/plants_part3.gz"
RESHAPE_SIZE = (65, 65, 3)
RANDOM_STATE = 42

CLASSES_DICT_NAMES

{'Black-grass': 0,
 'Charlock': 1,
 'Cleavers': 2,
 'Common Chickweed': 3,
 'Common wheat': 4,
 'Fat Hen': 5,
 'Loose Silky-bent': 6,
 'Maize': 7,
 'Scentless Mayweed': 8,
 'Shepherds Purse': 9,
 'Small-flowered Cranesbill': 10,
 'Sugar beet': 11}

In [6]:
df_p1 = pd.read_csv(DF_PART1)
df_p2 = pd.read_csv(DF_PART2)
df_p3 = pd.read_csv(DF_PART3)
df = pd.concat([df_p1, df_p2, df_p3], axis=0)
df.shape
df.columns[:10]

Index(['label', 'class', '0', '1', '2', '3', '4', '5', '6', '7'], dtype='object')

We can ignore the column 'label'. The column class is the entry we must use for our classification.

The rest of the columns belong to the image and we must reshape those values into 65x65x3 to obtain the images.

In [8]:
df_labels = df[['class']]
df.drop(labels=['label', 'class'], axis=1, inplace=True)
df_images = df.values.reshape(-1, *RESHAPE_SIZE)