<a href="https://colab.research.google.com/github/phoenixfin/deeplearning-notebooks/blob/main/food_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Food Image Classification

---

Aditya Firman Ihsan

## Persiapan

### Import library

In [None]:
import tensorflow as tf
import numpy as np
import os, random
from google.colab import drive
from matplotlib import pyplot as plt
import shutil

In [None]:
tf.__version__

'2.3.0'

### Mengunduh dataset


Menginstall modul terbaru yang akan dipakai untuk mengunduh dataset dari [Kaggle](https://kaggle.com)

In [None]:
%%capture
!pip install kaggle --upgrade

Muat isi Google Drive untuk mengakses file json dari API Kaggle. File ini dibutuhkan untuk mengakses langsung dataset dari Kaggle. Pastikan anda sudah membuat token pribadi anda sendiri di website Kaggle (buka bagian *My Account* di halaman profil, kemudian klik *Create New Api Token*) dan menaruhnya di Google Drive.

Cukup *uncomment* blok kode di bawah, jalankan, ikuti pranala yang diberikan, beri akses oleh akun Google yang anda miliki, dan salin kode verifikasi yang diberikan.

Alternatifnya, anda cukup klik ikon *Mount Drive* di *Files pane* di sisi kiri jendela colab ini.


In [None]:
#  drive.mount('/content/drive')

Blok kode berikut untuk memastikan modul kaggle akan membaca token yang telah anda siapkan.

Isikan variabel `json_path` dengan *path* dari file json API yang anda telah taruh di Google Drive.

In [None]:
json_path = "Colab Notebooks/misc/"

os.environ['KAGGLE_CONFIG_DIR'] = "/content/drive/My Drive/"+json_path

Unduh dataset yang akan digunakan. Dalam notebook ini, akan digunakan data teks Dokumen BBC yang berisi 2225 file `.txt` yang telah dikelompokkan dalam 5 kelas berbeda.

Info selengkapnya mengenai dataset ini bisa lihat [di sini](https://www.kaggle.com/shivamkushwaha/bbc-full-text-document-classification)

In [None]:
!kaggle datasets download -d kmader/food41

Downloading food41.zip to /content
100% 5.29G/5.30G [01:52<00:00, 18.5MB/s]
100% 5.30G/5.30G [01:52<00:00, 50.7MB/s]


Ekstrak file zip yang telah diunduh dan kemudian menghapus file zip tersebut untuk membersihkan memori. Pastikan kemudian di *Files pane*, muncul folder `bbc`. Silakan cek isinya. 

Jika belum muncul folder tersebut, bisa klik ikon `refresh`.

In [None]:
%%capture
!unzip \*.zip  && rm *.zip;

In [None]:
data_dir = '/content/images'

# di-list dulu ada kategori apa saja untuk label gambarnya
category = []
for item in os.listdir(data_dir):
    if os.path.isdir(data_dir+'/'+item):
        category.append(item)

4. Memisahkan dataset menjadi training data dan validation data dengan ratio 6:4

In [None]:
def create_data_dir(target_dir, subdirs):
    """
    membuat struktur dataset
    """
    def trymakedir(sdir):
        try:
            os.mkdir(sdir)
        except:
            pass
    try:
        trymakedir(target_dir)
        subdir1 = ['train','validation']
        subdir2 = subdirs
        for sd1 in subdir1:
            trymakedir(target_dir+'/'+sd1)
            for sd2 in subdir2:
                subpath = target_dir+'/'+sd1+'/'+sd2
                trymakedir(subpath)
                for file in os.listdir(subpath):
                    os.remove(subpath+'/'+file)
    except OSError:
        pass

def split_data(source, train, validation, split_size):
    """
    memindahkan data gambar sesuai dengan porsinya
    """
    source_list = os.listdir(source)
    randomized = random.sample(source_list, len(source_list))
    filtered = [file for file in randomized if (os.path.getsize(source+file)!=0) ]
    training_num = round(len(filtered)*split_size)    
    for idx, img in enumerate(filtered):
        if idx < training_num:
            shutil.copyfile(source+img, train+img)
        else:
            shutil.copyfile(source+img, validation+img)

def arrange_data(source, category, ratio):
    """
    mengatur ulang dataset dari sumber yang di download agar siap 
    untuk masuk model
    """
    base_dir = '/content/dataset/'
    create_data_dir(base_dir, category)

    for cat in category:
        source_dir = source + '/' + cat + '/'
        train_dir = base_dir + '/train/' + cat + '/' 
        validation_dir = base_dir + '/validation/' + cat + '/'
        split_data(source_dir, train_dir, validation_dir, split_size=ratio)      

Saatnya memanggil fungsi di atas

In [None]:
arrange_data(data_dir, category, 0.8)

In [None]:
input_size = 299
batch_size = 64
num_epoch = 15
learning_rate = 0.001

5. Membuat generator untuk dataset tersebut

In [None]:
TRAINING_DIR = "/content/dataset/train"
VALIDATION_DIR = "/content/dataset/validation"

# Pakai augmentasi untuk menghindari overfitting
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
                                  samplewise_center = True,
                                  samplewise_std_normalization = True
)

train_generator = train_datagen.flow_from_directory(
	TRAINING_DIR,
	target_size=(input_size, input_size),
  batch_size = batch_size
)

val_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
                                  samplewise_center = True,
                                  samplewise_std_normalization = True
)

val_generator = val_datagen.flow_from_directory(
	VALIDATION_DIR,
	target_size=(input_size, input_size),
  batch_size = batch_size
)

Found 80800 images belonging to 101 classes.
Found 20200 images belonging to 101 classes.


In [None]:
import gc
gc.collect()

22

6. Membangun model

In [None]:
def set_pretrained_model(model_name='InceptionV3', 
                         lastpool='avg',
                         feature_only=True):
    premodel = getattr(tf.keras.applications, model_name)(
        include_top = not feature_only, 
        input_shape = (input_size, input_size, 3),
        pooling = lastpool, 
    )
    for layer in premodel.layers:
        layer.trainable = False

    x = premodel.output
    if lastpool == None:
        x = tf.keras.layers.AveragePooling2D(pool_size=(8, 8))(x)
        x = tf.keras.layers.Dropout(.2)(x)
        x = tf.keras.layers.Flatten()(x) 

    # x = tf.keras.layers.Dense(512, activation='relu')(x)
    # x = tf.keras.layers.Dropout(0.2)(x)
    # x = tf.keras.layers.BatchNormalization()(x)
    regularizer = tf.keras.regularizers.l2(0.0005)        
    x = tf.keras.layers.Dense(len(category), kernel_regularizer=regularizer, activation='softmax')(x)    

    return tf.keras.models.Model(premodel.input, x)

In [None]:
model = set_pretrained_model(lastpool=None)    
model.summary()

model.compile(loss = 'categorical_crossentropy', 
              optimizer=tf.keras.optimizers.Adam(lr=learning_rate), 
              metrics=['accuracy'])

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 299, 299, 3) 0                                            
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 149, 149, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 149, 149, 32) 96          conv2d[0][0]                     
__________________________________________________________________________________________________
activation (Activation)         (

In [None]:
incnet = tf.keras.applications.InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
x = incnet.output
x = tf.keras.layers.AveragePooling2D(pool_size=(8, 8))(x)
x = tf.keras.layers.Dropout(.2)(x)
x = tf.keras.layers.Flatten()(x)
output = tf.keras.layers.Dense(101, activation='softmax', kernel_regularizer=tf.keras.regularizers.l2(.0005))(x)

model = tf.keras.models.Model(inputs=incnet.input, outputs=output)
model.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])

7. Jalankan modelnya pada dataset

In [None]:
model.fit(train_generator, 
          epochs=num_epoch, 
          validation_data=val_generator, 
          verbose = 1)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<tensorflow.python.keras.callbacks.History at 0x7fc44f91de80>

8. Tes model dengan gambar baru

In [None]:
uploaded = files.upload()

for fn in uploaded.keys(): 
    # predicting images
    path = fn
    img = image.load_img(path, target_size=(150, 150))
    plt.imshow(img)
    x = image.img_to_array(img)
    x /= 255.
    x = np.expand_dims(x, axis=0)
    images = np.vstack([x])
    classes = model.predict(images, batch_size=10)
    print(fn)
    print(category[np.argmax(classes[0])])

NameError: ignored

## Membangun Model

## Convert with TFLiteConverter

In [None]:
converter = tf.lite.TFLiteConverter.from_saved_model(RPS_SAVED_MODEL)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]


tflite_model = converter.convert()
with open("converted_model.tflite", "wb") as f:
  f.write(tflite_model)

Test the TFLite model using the Python Interpreter

In [None]:
# Load TFLite model and allocate tensors.
tflite_model_file = 'converted_model.tflite'
with open(tflite_model_file, 'rb') as fid:
  tflite_model = fid.read()
  
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()

input_index = interpreter.get_input_details()[0]["index"]
output_index = interpreter.get_output_details()[0]["index"]

In [None]:
from tqdm import tqdm

# Gather results for the randomly sampled test images
predictions = []

test_labels, test_imgs = [], []
for img, label in tqdm(test_batches.take(10)):
  interpreter.set_tensor(input_index, img)
  interpreter.invoke()
  predictions.append(interpreter.get_tensor(output_index))
  
  test_labels.append(label.numpy()[0])
  test_imgs.append(img)

In [None]:
#@title Utility functions for plotting
# Utilities for plotting

class_names = ['rock', 'paper', 'scissors']

def plot_image(i, predictions_array, true_label, img):
  predictions_array, true_label, img = predictions_array[i], true_label[i], img[i]
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])
    
  img = np.squeeze(img)

  plt.imshow(img, cmap=plt.cm.binary)

  predicted_label = np.argmax(predictions_array)
  print(type(predicted_label), type(true_label))
  if predicted_label == true_label:
    color = 'green'
  else:
    color = 'red'
  
  plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)


In [None]:
#@title Visualize the outputs { run: "auto" }
index = 0 #@param {type:"slider", min:0, max:9, step:1}
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(index, predictions, test_labels, test_imgs)
plt.show()