<a href="https://colab.research.google.com/github/ramsoi53/ramsoi/blob/main/Rice_Classification_By_CNN%EC%9D%98_%EC%82%AC%EB%B3%B8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:

# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES
# TO THE CORRECT LOCATION (/kaggle/input) IN YOUR NOTEBOOK,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.

import os
import sys
from tempfile import NamedTemporaryFile
from urllib.request import urlopen
from urllib.parse import unquote, urlparse
from urllib.error import HTTPError
from zipfile import ZipFile
import tarfile
import shutil

CHUNK_SIZE = 40960
DATA_SOURCE_MAPPING = 'rice-image-dataset:https%3A%2F%2Fstorage.googleapis.com%2Fkaggle-data-sets%2F2049052%2F3399185%2Fbundle%2Farchive.zip%3FX-Goog-Algorithm%3DGOOG4-RSA-SHA256%26X-Goog-Credential%3Dgcp-kaggle-com%2540kaggle-161607.iam.gserviceaccount.com%252F20240514%252Fauto%252Fstorage%252Fgoog4_request%26X-Goog-Date%3D20240514T144027Z%26X-Goog-Expires%3D259200%26X-Goog-SignedHeaders%3Dhost%26X-Goog-Signature%3Db168b73f75a9764fcebeb2995434a7c9c3e451456e6ee08b7d7074d51c154161974779b1cc6b6ee2a2bda7f524c3327e6a9cb58681e167709317dcbb12781d60a22f986c71df5f219b1e51e635c2879a8f2f1a097d2400e57fcee5508c7ac9d113c917e86ac17db7e16aab0604bab4bce1be2013346623ec0bd371392a22cf1b17a661a967beb51937ae2f120525efc55e115e7f26c9af8941494e1c01ed88af4651bb9e367f97a5aa620e076742b28096873f97f4535f6be31350fbe9a58672788503286dc8d51d3f406f15278415ec69f99f396d6220f3f691c95a8707b0cbc9b12c75c551d4b4e7512cf4370584de5864aaf01075a030ffb9da19fb37e2e9'

KAGGLE_INPUT_PATH='/kaggle/input'
KAGGLE_WORKING_PATH='/kaggle/working'
KAGGLE_SYMLINK='kaggle'

!umount /kaggle/input/ 2> /dev/null
shutil.rmtree('/kaggle/input', ignore_errors=True)
os.makedirs(KAGGLE_INPUT_PATH, 0o777, exist_ok=True)
os.makedirs(KAGGLE_WORKING_PATH, 0o777, exist_ok=True)

try:
  os.symlink(KAGGLE_INPUT_PATH, os.path.join("..", 'input'), target_is_directory=True)
except FileExistsError:
  pass
try:
  os.symlink(KAGGLE_WORKING_PATH, os.path.join("..", 'working'), target_is_directory=True)
except FileExistsError:
  pass

for data_source_mapping in DATA_SOURCE_MAPPING.split(','):
    directory, download_url_encoded = data_source_mapping.split(':')
    download_url = unquote(download_url_encoded)
    filename = urlparse(download_url).path
    destination_path = os.path.join(KAGGLE_INPUT_PATH, directory)
    try:
        with urlopen(download_url) as fileres, NamedTemporaryFile() as tfile:
            total_length = fileres.headers['content-length']
            print(f'Downloading {directory}, {total_length} bytes compressed')
            dl = 0
            data = fileres.read(CHUNK_SIZE)
            while len(data) > 0:
                dl += len(data)
                tfile.write(data)
                done = int(50 * dl / int(total_length))
                sys.stdout.write(f"\r[{'=' * done}{' ' * (50-done)}] {dl} bytes downloaded")
                sys.stdout.flush()
                data = fileres.read(CHUNK_SIZE)
            if filename.endswith('.zip'):
              with ZipFile(tfile) as zfile:
                zfile.extractall(destination_path)
            else:
              with tarfile.open(tfile.name) as tarfile:
                tarfile.extractall(destination_path)
            print(f'\nDownloaded and uncompressed: {directory}')
    except HTTPError as e:
        print(f'Failed to load (likely expired) {download_url} to path {destination_path}')
        continue
    except OSError as e:
        print(f'Failed to load {download_url} to path {destination_path}')
        continue

print('Data source import complete.')


## <b> <span style='color:#2ae4f5'>|</span> Rice Variety Classification and Quality Evaluation Using Image Analysis </b>


## <b>2 <span style='color:#2ae4f5'>|</span> Import Libraries </b>

## <b>1 <span style='color:#2ae4f5'>|</span> Introduction </b>
<div style="color:white;display:fill;border-radius:8px;
            background-color:#03112A;font-size:150%;
            letter-spacing:1.0px;background-image: url(https://i.imgur.com/GVd0La1.png)">
    <p style="padding: 8px;color:white;"><b><b><span style='color:#2ae4f5''>1.1 |</span></b> Rice Variety Classification and Quality Evaluation Using Image Analysis </b></p>
</div>
        
Rice, as one of the most prevalent grain crops globally, exhibits significant genetic diversity, resulting in various rice varieties. These varieties exhibit variations in essential characteristics such as **texture**, **shape**, and **color**. By harnessing these differentiating features, it becomes possible to accurately classify and assess the quality of rice seeds.

This research initiative aims to **develop a robust image analysis system** capable of automatically identifying and categorizing different rice varieties based on their visual attributes. By employing advanced **machine learning techniques** and **deep neural networks**, the project endeavors to create a model that can accurately classify rice samples into the five target varieties.

Additionally, the developed **image analysis model** can contribute to the broader field of **computer vision** and **pattern recognition**. The insights gained from this research can be applied to other **grain crops** and **agricultural products**, leading to advancements in **automated classification** and quality evaluation across various agricultural domains.

In summary, the **Rice Variety Classification and Quality Evaluation project** utilizes a comprehensive dataset of 75,000 rice images to develop a state-of-the-art image analysis system. By accurately classifying and evaluating the quality attributes of five distinct rice varieties, this research aims to enhance rice production processes, support seed selection, and drive advancements in computer vision for agricultural applications. For more information about the dataset use the following Kaggle link:
https://www.kaggle.com/datasets/muratkokludataset/rice-image-dataset

In [None]:
# import requirement libraries and tools
import os
from tensorflow import keras
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style= "darkgrid", color_codes = True)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
import warnings
warnings.filterwarnings('ignore')

## <b>3 <span style='color:#2ae4f5'>|</span> Create a dataframe with the Images and Label </b>

In [None]:
dataset_path = '/kaggle/input/rice-image-dataset/Rice_Image_Dataset'

In [None]:
# Save Data to an empty folder

from PIL import Image

images = []
labels = []

for subfolder in os.listdir(dataset_path):
    subfolder_path = os.path.join(dataset_path, subfolder)
    if not os.path.isdir(subfolder_path):
        continue

    # 각 이미지 파일을 읽어서 images와 labels 리스트에 추가
    for image_filename in os.listdir(subfolder_path):
        image_path = os.path.join(subfolder_path, image_filename)
        images.append(image_path)
            # 해당 이미지의 레이블을 폴더 이름으로 설정하여 리스트에 추가
        labels.append(subfolder)




In [None]:
df = pd.DataFrame({'image': images, 'label':labels})


In [None]:
#Take a look at DataFrame

# from ydata_profiling import ProfileReport

# df_profile = ProfileReport(df)
# df_profile.to_notebook_iframe()


df.head()
df.describe()

## <b>4 <span style='color:#2ae4f5'>|</span> Visualization of Dataset </b>

In [None]:
image_shape = df['image'].shape
image_shape

In [None]:
from matplotlib.gridspec import GridSpec
# Create figure and grid of subplots
fig = plt.figure(figsize=(15, 15))
gs = GridSpec(5, 4, figure=fig)

# Loop through each unique category in the DataFrame
for i, category in enumerate(df['label'].unique()):
    # Get the filepaths for the first four images in the category
    filepaths = df[df['label'] == category]['image'].values[:4]

    # Loop through the filepaths and add an image to each subplot
    for j, filepath in enumerate(filepaths):
        ax = fig.add_subplot(gs[i, j])
        ax.imshow(plt.imread(filepath))
        ax.axis('off')

    # Add a label to the bottom of the subplot grid
    ax.text(300, 100, category, fontsize=25, color='darkblue')

plt.show()

In [None]:
# COUNTPLOT

plt.figure(figsize=(10,6))
ax = sns.countplot(data=df, x=df.label)
ax.set_xlabel("Name of Class")
ax.set_ylabel("The Number of Samples for each class")
plt.xticks(rotation=-45)
plt.show()


## <b>5 <span style='color:#2ae4f5'>|</span> Split Data into Train and Test </b>
**I divided our data into two separate datasets:** the **training dataset** and the **testing dataset**. The training dataset consists of **80%** of the data, while the testing dataset contains the remaining **20%**.
To facilitate the training process, I applied the **LabelEncoder to labels**. This process allowed us to convert the rice types' labels, namely **'Arborio'**, **'Basmati'**, **'Ipsala'**, **'Jasmine'**, and **'Karacadag'**, into numerical values. By assigning integer values to the labels, we enabled the utilization of these labels as target variables during the training of our machine learning model.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(df.image, df.label, test_size=0.2, random_state=44)

df_train = pd.DataFrame({'image' : X_train, 'label' : y_train})
df_test = pd.DataFrame({'image' : X_test, 'label' : y_test})

In [None]:
#encoding

encoder = LabelEncoder()
y_train = encoder.fit_transform(y_train)
y_test = encoder.fit(y_test)

## <b>6 <span style='color:#2ae4f5'>|</span> Data Augmentation </b>
To streamline the preprocessing of our images, we took the following steps:
- **we created generators for both the training and testing datasets.** These generators allow us to efficiently handle and manipulate the data during the training and testing phases.

- **to enhance the diversity and robustness of our training data, we applied data augmentation techniques specifically to the training dataset.** This augmentation process introduces variations in the images by applying transformations such as rotation, scaling, and flipping. By doing so, we expand the dataset and enable our model to learn from a wider range of image variations.

- **Additionally, we standardized the image dimensions by reshaping them to a uniform size of 50x50 pixels.** This resizing ensures that all images in the dataset have consistent dimensions, facilitating the subsequent processing and analysis stages.


In [None]:
from keras.preprocessing.image import ImageDataGenerator

image_size=(60,60)
batch_size = 32

datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range=45,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    fill_mode='nearest')


train_generator = datagen.flow_from_dataframe(df_train, x_col='image', y_col='label', target_size=image_size,
                                             batch_size=batch_size, class_mode='categorical', shuffle='True')

test_generator = datagen.flow_from_dataframe(df_test, x_col='image', y_col='label', target_size=image_size,
                                             batch_size=batch_size, class_mode='categorical', shuffle='False')

## <b>6 <span style='color:#2ae4f5'>|</span> Training Model </b>

In [None]:
from tensorflow.keras import layers
input_shape = (60,60,3)

model = keras.Sequential([layers.Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=input_shape),
                          layers.MaxPool2D(),
                          layers.Conv2D(64, kernel_size=(3,3), activation='relu'),layers.MaxPool2D(),
                          layers.Conv2D(128, kernel_size=(3,3), activation='relu'),layers.MaxPool2D(),
                          layers.Flatten(),
                          layers.Dense(32,activation='relu'),
                          layers.Dense(5, activation='softmax')])

#  **7.Compling & Training

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

history = model.fit_generator(train_generator, epochs=3, validation_data=test_generator)

## <b>8 <span style='color:#2ae4f5'>|</span> Evaluate The Model </b>

In [None]:
#plot

plt.figure(figsize=(10,6))
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['Train', 'Test'], loc='best')
plt.show()


plt.figure(figsize=(10,6))
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.legend(['Train', 'Test'], loc='best')
plt.show()

In [None]:
metrics = model.evaluate(test_generator)
print('Accuracy:', metrics[0])

In [None]:
print('Accuracy:', metrics[1])

## <b>9 <span style='color:#2ae4f5'>|</span> Save The Model </b>

In [None]:
# Save the model 추후 모델을 재사용하는 법 알아보기
model.save('CNN_model.h5')
print ("Model saved successfully!")

In [None]:
#모델 재사용
https://chat.openai.com/share/9f4f2c32-8d5e-4788-bb53-ff94ff648cfc