[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ishandandekar/Looking-Fruit/blob/main/Looking_Fruit_nbk.ipynb)

# Looking_Fruit

👋 Hello and welcome to the **Looking_Fruit** notebook. In this notebook I try to replicate the [Fruits-360](https://www.researchgate.net/publication/321475443_Fruit_recognition_from_images_using_deep_learning) research paper. In this paper, researchers have tried to classify images of **131** fruits and vegetables. The data used for these modelling experiments is provided by the paper researchers themselves.

In [None]:
# Check for GPU
!nvidia-smi -L

## Step 0: Defining the problem


**Objective:**  
To classify the images of various fruits and vegetables with best f1-score.  

**Files:**
- *Train*: This folder contains folders labelled as fruit's/vegetable's name. These subfolders contain images of the respective fruit/vegetable. This folder will be used for training purpose.
- *Test*: This folder contains folders labelled as fruit's/vegetable's name. These subfolders contain images of the respective fruit/vegetable. This folder will be used for testing purpose.

## Step 1: Getting the data
The data used for this project is publicaly available on [Kaggle](https://www.kaggle.com/datasets/ishandandekar/fruitimagedataset).

- Use Kaggle's API to download the data into Colab.
- Get utility functions to help in future.
- Configure data files to read using Python.


In [None]:
# Getting the helper functions script
!wget https://raw.githubusercontent.com/ishandandekar/Looking-Fruit/main/helper_functions.py

# Get the necessary functions from the python script
from helper_functions import plot_loss_curves, unzip_data

--2022-08-21 18:21:36--  https://raw.githubusercontent.com/ishandandekar/Looking-Fruit/main/helper_functions.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1215 (1.2K) [text/plain]
Saving to: ‘helper_functions.py’


2022-08-21 18:21:36 (32.6 MB/s) - ‘helper_functions.py’ saved [1215/1215]



In [None]:
# Install the kaggle library
!pip install -q kaggle

# Upload the Kaggle API keys
from google.colab import files
files.upload()

!mkdir ~/.kaggle

# Copy the json file to the folder
!cp kaggle.json ~/.kaggle

# Change permissions for json to work with the Kaggle API
!chmod 600 ~/.kaggle/kaggle.json

# Download the dataset
!kaggle datasets download -d ishandandekar/fruitimagedataset

# Unzip data
unzip_data('fruitimagedataset.zip')

Saving kaggle.json to kaggle.json
Downloading fruitimagedataset.zip to /content
 97% 385M/398M [00:02<00:00, 124MB/s]
100% 398M/398M [00:02<00:00, 165MB/s]


## Step 2: Know more about the data

- Get the statistics about the data.
- Check if the labels are imbalanced.
- Visualize random samples in data.
- (*If required*) Trim data.
- (*If required*) Preprocess the data.
- Make data processing faster using `ImageDataGenerator`.

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import os
import random
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
# Setting up necessary variables
TRAIN_PATH = '/content/data/train/train'
TEST_PATH = '/content/data/test/test'

In [None]:
# Checking the number of classes
classes = []

for dirname, _, filenames in os.walk('/content/data'):
    if dirname.startswith(TRAIN_PATH):
        classes.append(dirname[len(TRAIN_PATH):])

print(f"Total number of classes to deal with: {len(classes)}")

In [None]:
# Show random classes present in the dataset
list_of_five_random_labels = random.sample(labels,5)
list_of_five_random_labels

In [None]:
# Number of images of each fruit/vegetable as a pandas.DataFrame

# List to append the count of images
number_of_images_train = []

for label in classes:
    path = f'{TRAIN_PATH}/{label}'
    count = len(os.listdir(path))
    number_of_images_train.append(count)

train_image_count_df = pd.DataFrame({"Label":classes,"Number of Images":number_of_images_train})

# To view first 10 rows of the dataframe
train_image_count_df.head(10)

In [None]:
# Label with most number of images
print(f"Label with most number of images:")
print(train_images_count_df.sort_values("Number of Images",ascending=False).head(1))
print(f"Label with least number of images:")
print(train_images_count_df.sort_values("Number of Images",ascending=True).head(1))

In [None]:
# Show random sample from training of a random fruit/vegetable

random_label = random.choice(labels)
sample_path = f'{TRAIN_PATH}/{label}'
random_image= random.choice(os.listdir(sample_path))
random_image_path = f'{sample_path}/{random_image}'

img = mpimg.imread(random_image_path)
imgplot = plt.imshow(img)
plt.axis(False)
plt.title(f'{label}')
plt.show()

In [None]:
# Creating ImageDataGenerators for better data processing

# Image size has been specified in the research paper
IMAGE_SIZE = (100,100)
BATCH_SIZE = 32

train_datagen = ImageDataGenerator(width_shift_range=0,
                                   height_shift_range=0,
                                   zoom_range=0,
                                   horizontal_flip=0,
                                   vertical_flip=0)

# Need the test data as is, but need to make it process faster
test_datagen = ImageDataGenerator()

train_gen = train_datagen.flow_from_directory(TRAIN_PATH,
                                              labels='inferred',
                                              target_size=IMAGE_SIZE,
                                              class_mode='sparse',
                                              batch_size=BATCH_SIZE,
                                              shuffle=True,
                                              classes=classes)

test_gen = test_datagen.flow_from_directory(TEST_PATH,
                                            labels='inferred',
                                            target_size=IMAGE_SIZE,
                                            class_mode='sparse',
                                            batch_size=BATCH_SIZE,
                                            shuffle=False,
                                            classes=classes)

## Step 3: Describing modelling experiments

- This notebook contains 7 models built to get the best **f1-score** on the test dataset. These models also include the models made by the researchers themselves.  
- Models to be made:
  1. **Model 0** : A simple model with fully connected multiple Dense layers; this model acts as a baseline.
  1. **Model 1** : 2 pairs of CNN and MaxPool layers with a Flatten layer and Dense layer in the end for classification.
  1. **Model 2** : Multiple CNN layers, MaxPool layers with a Flatten layer and Dense layer in the end; *should get better results from this.*
  1. **Model 3** : Using transfer learning, exploit ResNet model for classification.
  1. **Model 4** : Using transfer learning, exploit EfficientNetBx for classification.
  1. **Model 5** : Use fine-tuned ResNet model for classification.
  1. **Model 6** : Use fine-tuned EfficientNetBx for classification.
- Get classification metrics for each model.


#### Model 0

#### Model 1

#### Model 2

#### Model 3

#### Model 4

#### Model 5

#### Model 6

## Step 4: Compare results and conclude experiments
- Test each model on the given test dataset.
- Use graphs and matrices to visualize results.
- (*Optional*) Tune hyperparameters of the best model.
- Compare best models results with researchers best model.
- Export the best model.
