## Experiment Introduction

### Experimental Background

There is a large amount of information in images, which usually requires a lot of language to describe clearly. Among many processing images, classifying them will be the most essential task. In this experiment, we will use convolutional neural network to identify flowers in a real environment, and we will use Keras (backend tensorflow), a deep learning framework, to build a convolutional neural network model to solve the image classification problem.
For the dataset, the pictures are divided into five classes: chamomile, tulip, rose, sunflower, dandelion. Photos are not high resolution, about 320x240 pixels. Photos are not reduced to a single size, they Photos are not reduced to a single size, they have different proportions.

### Experimental environment

- Huawei Modelarts platform
    - work environment:Multi-Engine 1.0
    - Instance Flavor:8 vCPUs | 32 GiB

1. Python 3.6.5
2. Tensorflow 1.13.1
3. Kears 2.2.4
- Matplotlib 3.2.2
- Numpy 1.18.5
- OpenCV 3.4.1

### Purpose of the experiment.

1. enhance the understanding of the process of building neural networks using Keras
- Explore the impact that unbalanced data can have
- Learn how to use mature models to make efficient use of existing datasets

**<font color='red'>Note: Don't worry if WARNING or UserWarning appears in the run results, it won't affect the results. </font>**


## import python packages

### Introduce relevant python packages and modules

All the functions needed for this experiment are already included here, allowing you to add your own

In [1]:
!pip install tensorflow==2.10.1 opencv-python matplotlib

Collecting tensorflow==2.10.1
  Using cached tensorflow-2.10.1-cp39-cp39-win_amd64.whl (455.9 MB)
Collecting opencv-python
  Using cached opencv_python-4.7.0.68-cp37-abi3-win_amd64.whl (38.2 MB)
Collecting matplotlib
  Using cached matplotlib-3.6.3-cp39-cp39-win_amd64.whl (7.2 MB)
Collecting h5py>=2.9.0
  Using cached h5py-3.8.0-cp39-cp39-win_amd64.whl (2.6 MB)
Collecting flatbuffers>=2.0
  Using cached flatbuffers-23.1.21-py2.py3-none-any.whl (26 kB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1
  Using cached tensorflow_io_gcs_filesystem-0.30.0-cp39-cp39-win_amd64.whl (1.5 MB)
Collecting keras-preprocessing>=1.1.1
  Using cached Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Collecting gast<=0.4.0,>=0.2.1
  Using cached gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting termcolor>=1.1.0
  Using cached termcolor-2.2.0-py3-none-any.whl (6.6 kB)
Collecting google-pasta>=0.1.1
  Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting keras<2.11,>=2.10.0
  Using cached 

In [2]:
!pip install -U scikit-learn scipy

Collecting scikit-learn
  Using cached scikit_learn-1.2.1-cp39-cp39-win_amd64.whl (8.4 MB)
Collecting scipy
  Using cached scipy-1.10.0-cp39-cp39-win_amd64.whl (42.5 MB)
Collecting threadpoolctl>=2.0.0
  Using cached threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Collecting joblib>=1.1.1
  Using cached joblib-1.2.0-py3-none-any.whl (297 kB)
Installing collected packages: threadpoolctl, scipy, joblib, scikit-learn
Successfully installed joblib-1.2.0 scikit-learn-1.2.1 scipy-1.10.0 threadpoolctl-3.1.0


In [3]:
# layers include common network layers. 
# optimizers include common optimizers. 
# Sequential is used to construct a linear (from beginning to end) network structure.
# Model functional model, complex models can be constructed. 
from keras import layers, optimizers, Sequential, Model
# contains models commonly used for migration learning. 
from keras import applications# is used for image enhancement. 
from keras.preprocessing.image import ImageDataGenerator

# common packages: control files and folders. 
import glob
import os
# cv2 = opencv
import cv2
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

## Data reading and visualization

### Define parameters related to dataset reading

In [4]:
# Path of the dataset, which is used for training and verification.
# path = '../DL _data/flower_photos/'
path = '../Deep-Learning-Tutorial/DL _data/flower_photos/'
# Scale images. The size is 128*128*3(width * height * channals)
w, h, c = 128, 128, 3

# to ensure that the generated random numbers are predictable, that is, the same seed value. The generated random numbers are the same. 
# This parameter will be transferred to the random_state of function train_test_split.
seed = 785 

### Define the function read_img

Create a function that reads all the image data in a folder and resizes the images in a uniform format

**input**: function parameter 'path', as the path to the incoming folder

**output**: return data, label, flower_dict, image_list_for_plot
1. data, ndarray storing images, data.shape = (image_nums, w, h, c)
- label, store the label ndarray corresponding to images, label.shape = (image_nums,)
- flower_dict, stores a list of number-flower names, e.g. {0: 'daisy', 1: 'dandelion', 2: 'tulips'...}
- image_list_for_plot, a list of images to be used for visualization, with internal elements like (images, label_name), 45 images in total, 9 images for each type of flower.

**Hint**: you can use, os.listdir, glob.glob, cv2.resize, np.asarray and other methods.

**Note**: If you use cv2.imread() function to read the pictures, the color space of the pictures is 'BGR', you need to convert to 'RGB' to facilitate the visualization later.

In [5]:
def read_img(path):
    # Create an empty dictionary corresponding to the number-flower name
    flower_dict = {} 
    # Create a hierarchical list cate for traversing the data folder below the data storage directory, os.path.isdir is used to determine if the file is a directory
    cate = [path+x for x in os.listdir(path) if os.path.isdir(path+x)]
    # Create empty list for saving images, image tags
    imgs=[]
    labels=[]
    # Create empty list to hold information about images used for visualization
    image_list_for_plot=[]
    for idx,folder in enumerate(cate):                                 
        counter = 1
        flower_dict[idx] = folder.split('/')[-1]
        # Use the glob.glob function to search for images that match a specific format "/*.jpg" under each hierarchical file and iterate through them
        for im in glob.glob(folder+'/*.jpg'):                        
            img=cv2.imread(im)                                         
            img=cv2.resize(img,(w,h))                           
            imgs.append(img)                                           
            labels.append(idx)                                         
            if counter <= 9:
                image_list_for_plot.append((folder.split('/')[-1], cv2.cvtColor(img, cv2.COLOR_BGR2RGB)))
                counter+=1
    return np.asarray(imgs,np.float32),np.asarray(labels,np.int32),flower_dict, image_list_for_plot


In [6]:
data, label, flower_dict, image_list_for_plot=read_img(path)                                              
print("shape of data:",data.shape)                                      
print("shape of label:",label.shape)  
print(len(image_list_for_plot))

shape of data: (3373, 128, 128, 3)
shape of label: (3373,)
45
