<a href="https://colab.research.google.com/github/ntuanhung/2019B3_Experiments/blob/master/2019_Kuzushiji_B3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# For 3rd year student experiment
Kuzushiji recognition is a challenging task in image classification field. The reasons are:
+ Around 4,000 classes.
+ A large variance in many classes.
+ Images are degraded and have different solutions.
+ Great imbalance among the classes in database.

# October, 2019

Try multiple Convolutional Neural Networks (CNNs) to recognize the Japanese historical character images.
  + AlexNet
  + VGGNet (VGG16 or VGG19)
  + ResNet (Residual Network-34, -50, -101, or -152)
  + InceptionNet (v1, v2, v3, or v4)

Objectives:
  + Learn about how to train network with our data and reproduce the results on KMNIST datasets. (implementation)
  + Visualize the training process (plot train and test error and loss during training) as well as the results (plot misrecognized images).
  + Understand theory of neural network and optimization process.
  + Try data augmentation.
  + Visualize the feature maps.
  

In [1]:
#@title Mount Google Drive { vertical-output: true }
## Copy files from drive to local of runtime
%cd /content
from google.colab import drive
drive.mount('/content/drive')
!mkdir -p kmnist
%cd /content/kmnist/

#!cp /content/drive/My\ Drive/ojc/kmnist/*.* /content/kmnist

## Create symbolic links to Google Drive
!ln -s /content/drive/My\ Drive/ojc/kmnist/src /content/kmnist
!ln -s /content/drive/My\ Drive/ojc/kmnist/log /content/kmnist
!ln -s /content/drive/My\ Drive/ojc/kmnist/model /content/kmnist


/content
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive
/content/kmnist


In [2]:
#@title Download data { vertical-output: true }
dataset_name = "kanji" #@param ["k10", "k49", "kanji", "other"]

## Execute script to download database 
!python ./src/download_data.py --dataset $dataset_name

## and verify the result
!ls

## Extract kkanji.tar file
if dataset_name == "kanji":
  !tar -xf kkanji.tar
  !ls ./kkanji2/ -1 | wc -l

Downloading dataset kanji ...
Downloading kkanji.tar - 316.7 MB
100% 316690/316691 [00:47<00:00, 6611.65KB/s]
All dataset files downloaded!
kkanji.tar  log  model	src
3832


In [3]:
#@title Prepare training/testing sets for Kanji { vertical-output: true, output-height: 20 }
import os
import random
folderPath = "./kkanji2/"
kkanji2_train_fout = open("kkanji2_train.txt", "wt")
kkanji2_test_fout = open("kkanji2_test.txt", "wt")
labelList = []
countOnefile = 0
countTrainFiles = 0
countTestFiles = 0

if os.path.exists(folderPath) and os.path.isdir(folderPath):
  for subfolder in os.listdir(folderPath):
    if os.path.isdir(folderPath + subfolder):
      fileList = []
      for filename in os.listdir(folderPath + subfolder):
        if filename.endswith(".png"):
          fileList.append(filename)
      assert(len(fileList) > 0)
      if len(fileList) == 1:
        countOnefile+=1

      labelList.append(subfolder)
      random.shuffle(fileList)
      numTest = max(1, int(0.2 * len(fileList)))
      for idx in range(0, numTest, 1):
        kkanji2_test_fout.write("%s\n"%(folderPath + subfolder + "/" + fileList[idx]))
        countTestFiles += 1

      for idx in range(numTest, len(fileList), 1):
        kkanji2_train_fout.write("%s\n"%(folderPath + subfolder + "/" + fileList[idx]))
        countTrainFiles += 1

kkanji2_train_fout.close()
kkanji2_test_fout.close()

sorted(labelList)
with open("kkanji2_label.txt", "wt") as label_fout:
  for label in labelList:
    label_fout.write("%s\n"%(label))

print("Total %d classes in Kanji dataset."%(len(labelList)))
print("Number of classes without training images = %d"%(countOnefile))
print("%d training images and %d testing images."%(countTrainFiles, countTestFiles))

Total 3832 classes in Kanji dataset.
Number of classes without training images = 815
112071 training images and 28353 testing images.


In [0]:
#@title Try datagenerator { vertical-output: true }
from __future__ import absolute_import, division, print_function, unicode_literals
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 
import argparse
import numpy as np
from datetime import datetime
from random import shuffle
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout, Conv2D, MaxPooling2D

## Constants
AUTOTUNE = tf.data.experimental.AUTOTUNE
BATCH_SIZE = 128
MAX_EPOCHES = 20
IMG_HEIGHT, IMG_WIDTH = 64, 64
NUM_CHANNELS = 1
LOG_PATH = "./tensorlog/"
AUGMENTATION_PROB = 0.0

labelList = []
with open("kkanji2_label.txt") as f:
  for line in f.readlines():
    if line.strip()!="":
      labelList.append(line.strip())

def get_label(file_path):
  # convert the path to a list of path components
  parts = tf.strings.split(file_path, '/')
  # The second to last is the class-directory
  return parts[-2] == labelList

def decode_img(img):
  # convert the compressed string to a 3D uint8 tensor
  img = tf.image.decode_jpeg(img, channels=NUM_CHANNELS)
  # Use `convert_image_dtype` to convert to floats in the [0,1] range.
  img = tf.image.convert_image_dtype(img, tf.float32)
  # resize the image to the desired size.
  return tf.image.resize(img, (IMG_HEIGHT, IMG_WIDTH))

def process_path(filePath):
  label = get_label(filePath)
  # load the raw data from the file as a string
  img = tf.io.read_file(filePath)
  img = decode_img(img)
  return img, label

def prepare_dataset(fileTxt, cache=True, shuffle_buffer_size=1000):
  fileList = []
  with open(fileTxt) as f:
    for line in f.readlines():
      if line.strip()!="":
        fileList.append(line.strip())
  shuffle(fileList) 
  # Parse the list of file path
  data_list = tf.data.Dataset.from_tensor_slices(fileList)
  ds = data_list.map(process_path, num_parallel_calls=AUTOTUNE)

  # This is a small dataset, only load it once, and keep it in memory.
  # use `.cache(filename)` to cache preprocessing work for datasets that don't
  # fit in memory.
  if cache:
    if isinstance(cache, str):
      ds = ds.cache(cache)
    else:
      ds = ds.cache()

  if shuffle_buffer_size>0:
    ds = ds.shuffle(buffer_size=shuffle_buffer_size)

  # Repeat forever
  ds = ds.repeat()
  ds = ds.batch(BATCH_SIZE)
  # `prefetch` lets the dataset fetch batches in the background while the model is training.
  ds = ds.prefetch(buffer_size=AUTOTUNE)
  return fileList, ds

## Simple model
def get_uncompiled_model(input_shape, num_classes):
  inputs = tf.keras.Input(shape=input_shape, name='digits')
  x = Conv2D(32, kernel_size=(3, 3), activation='relu', padding="same",
             input_shape=input_shape, name="conv1")(inputs)
  x = Conv2D(64, (3, 3), activation='relu', padding="same", name="conv2")(x)
  x = MaxPooling2D(pool_size=(2, 2))(x)

  x = Conv2D(64, (3, 3), activation='relu', padding="same", name="conv3")(x)
  x = Conv2D(64, (3, 3), activation='relu', padding="same", name="conv4")(x)
  x = MaxPooling2D(pool_size=(2, 2))(x)

  x = Dropout(0.25)(x)
  x = Flatten()(x)
  x = Dense(128, activation='relu', name="fc10")(x)
  x = Dropout(0.5)(x)
  outputs = Dense(num_classes, activation='softmax', name='predictions')(x)
  model = tf.keras.Model(inputs=inputs, outputs=outputs)
  return model

def get_compiled_model(input_shape, num_classes, networkName="", optimizerName=""):
  model = get_uncompiled_model(input_shape, num_classes)
  # Build model with optimizer
  model.compile(loss=tf.keras.losses.categorical_crossentropy,
                optimizer=tf.keras.optimizers.Adadelta(),
                metrics=['accuracy'])
  return model

if __name__ == "__main__":
  parser = argparse.ArgumentParser(description='kuzushiji with different datasets')
  parser.add_argument('--dataset', dest='dataset', 
                      default="kanji",
                      help='dataset name should be k10/k49/kanji')
  parser.add_argument('--model', dest='model', 
                      default="simpleNet",
                      help='model name should be simpleNet/AlexNet/VGGNet/ResNet/InceptionNet')

  args = parser.parse_args()

  networkName = args.model # "simpleNet" "AlexNet" "VGGNet" "ResNet" "InceptionNet"
  optimizerName = "ADEL" #"ADEL" "SGD" "ADAM" "RMSP"
  datasetName = args.dataset
  logFile = "_".join([networkName, optimizerName, datasetName]) + "_" + datetime.now().strftime('%Y.%m.%d_%H.%M.%S')
  logFolder = LOG_PATH + logFile + "/"

  # prepare data generator
  trainFileList, train_ds = prepare_dataset("kkanji2_train.txt", cache=False,
                                            shuffle_buffer_size=len(labelList))
  testFileList, test_ds = prepare_dataset("kkanji2_test.txt", cache=False,
                                          shuffle_buffer_size=-1)
  num_classes = len(labelList)
  print("Total %d categories. Train set with %d images and test set with %d images."
        %(num_classes, len(trainFileList), len(testFileList)))

  if tf.keras.backend.image_data_format() == 'channels_first':
      input_shape = (NUM_CHANNELS, IMG_HEIGHT, IMG_WIDTH)
  else:
      input_shape = (IMG_HEIGHT, IMG_WIDTH, NUM_CHANNELS)
  model = get_compiled_model(input_shape, num_classes)
  model.summary()

  ## Callback functions --> use for multiple purposes
  # save the best model based on validation loss
  fpath = logFolder + 'weights.{epoch:02d}-{loss:.2f}-{accuracy:.2f}-{val_loss:.2f}-{val_accuracy:.2f}.hdf5'
  cp_cb = tf.keras.callbacks.ModelCheckpoint(filepath=fpath, monitor='val_loss', 
                                             verbose=1, save_best_only=True, mode='auto')
  # early stopping training process 
  es_cb = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=20, 
                                           verbose=1, mode='auto')
  
  # save the log information into log file
  lg_cb = tf.keras.callbacks.CSVLogger(logFolder + logFile + '.csv')
  
  # Train and evaluate model on dataset using generator instead of whole dataset
  print("Training %s"%logFile)
  model.fit_generator(generator = train_ds, validation_data = test_ds,
                      steps_per_epoch = int(len(trainFileList)/BATCH_SIZE),
                      validation_steps= int(len(testFileList)/BATCH_SIZE),
                      callbacks = [cp_cb, es_cb, tb_cb, lg_cb],
                      epochs=MAX_EPOCHES, verbose=1)
  
  print("Evaluating %s"%logFile)
  train_score = model.evaluate_generator(generator=train_ds, verbose=0,
                                         steps = int(len(trainFileList)/BATCH_SIZE)) 
  test_score = model.evaluate_generator(generator=test_ds, verbose=0,
                                        steps = int(len(testFileList)/BATCH_SIZE))
  print('Train loss:', train_score[0])
  print('Train accuracy:', train_score[1])
  print('Test loss:', test_score[0])
  print('Test accuracy:', test_score[1])

In [25]:
#@title Preliminary experiments on KMNIST  { output-height: 10 }
dataset_name = "kanji" #@param ["k10", "k49", "kanji"]
model_name = "simpleNet" #@param ["simpleNet", "AlexNet", "VGGNet", "ResNet", "InceptionNet", "kNN"]
if model_name == "kNN":
  ## Example of KNN method
  !python ./src/kuzushiji_knn.py --dataset $dataset_name
else:
  ## Training an example of CNN
  if dataset_name == "kanji":
    !python ./src/kuzushiji_cnn_kanji.py --model $model_name
  else:
    !python ./src/kuzushiji_cnn.py --dataset $dataset_name --model $model_name

Fitting KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=-1, n_neighbors=4, p=2,
                     weights='distance')
^C


In [0]:
#@title Data generator by tensorflow { vertical-output: true }
# change directory to kmnist project
%cd /content/kmnist/
# make sure use tensorflow 2.0
%tensorflow_version 2.x 

from __future__ import absolute_import, division, print_function, unicode_literals
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 
import numpy as np
from datetime import datetime
from random import shuffle
import tensorflow as tf
from tensorflow.keras import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout, Conv2D, MaxPooling2D

## Constants
AUTOTUNE = tf.data.experimental.AUTOTUNE
BATCH_SIZE = 128
MAX_EPOCHES = 20
IMG_HEIGHT, IMG_WIDTH = 64, 64
NUM_CHANNELS = 1
LOG_PATH = "./tensorlog/"
AUGMENTATION_PROB = 0.0

labelList = []
with open("kkanji2_label.txt") as f:
  for line in f.readlines():
    if line.strip()!="":
      labelList.append(line.strip())

import matplotlib.pyplot as plt
def plot_images(dataset, n_images, samples_per_image):
    plt.figure() #figsize=(10.0, 10.0))
    k=0
    for images, labels in dataset.repeat(samples_per_image).batch(n_images):
      label_np = labels.numpy()
      pos = np.argmax(label_np, axis=-1)
      s = " ".join([ labelList[idx] for idx in pos ])
      print(images.shape, labels.shape, s)
      for n in range(k, k+n_images):
        ax = plt.subplot(samples_per_image,n_images,n+1)
        plt.imshow(np.squeeze(images[n-k]), cmap = plt.get_cmap("gray"))
        plt.title(labelList[pos[n-k]])
        plt.axis('off')
      k+=n_images
      if k>=samples_per_image*n_images:
        break
    plt.show()

def rotate(x: tf.Tensor, y: tf.Tensor) -> tf.Tensor:
  """Rotation augmentation
  Args: x: Image
  Returns: Augmented image
  """
  # Rotate 0, 90, 180, 270 degrees
  return tf.image.rot90(x, tf.random.uniform(shape=[], minval=0, maxval=4, dtype=tf.int32)), y

def flip(x: tf.Tensor, y: tf.Tensor) -> tf.Tensor:
    """Flip augmentation
    Args: x: Image to flip
    Returns: Augmented image
    """
    x = tf.image.random_flip_left_right(x)
    x = tf.image.random_flip_up_down(x)
    return x, y

def color(x: tf.Tensor, y: tf.Tensor) -> tf.Tensor:
    """Color augmentation
    Args: x: Image
    Returns: Augmented image
    """
    x = tf.image.random_hue(x, 0.08)
    x = tf.image.random_saturation(x, 0.6, 1.6)
    x = tf.image.random_brightness(x, 0.05)
    x = tf.image.random_contrast(x, 0.7, 1.3)
    return x, y

def zoom(x: tf.Tensor, y: tf.Tensor) -> tf.Tensor:
    """Zoom augmentation
    Args: x: Image
    Returns: Augmented image
    """
    # Generate 20 crop settings, ranging from a 1% to 20% crop.
    scales = list(np.arange(0.8, 1.0, 0.01))
    boxes = np.zeros((len(scales), 4))
    for i, scale in enumerate(scales):
        x1 = y1 = 0.5 - (0.5 * scale)
        x2 = y2 = 0.5 + (0.5 * scale)
        boxes[i] = [x1, y1, x2, y2]
    def random_crop(img):
        # Create different crops for an image
        crops = tf.image.crop_and_resize([img], boxes=boxes, 
                                         box_indices=np.zeros(len(scales)), 
                                         crop_size=(IMG_HEIGHT, IMG_WIDTH))
        # Return a random crop
        return crops[tf.random.uniform(shape=[], minval=0, maxval=len(scales), dtype=tf.int32)]
    choice = tf.random.uniform(shape=[], minval=0., maxval=1., dtype=tf.float32)
    # Only apply cropping 50% of the time
    return tf.cond(choice < 0.5, lambda: x, lambda: random_crop(x)), y

def get_label(file_path):
  # convert the path to a list of path components
  parts = tf.strings.split(file_path, '/')
  # The second to last is the class-directory
  return parts[-2] == labelList

def decode_img(img):
  # convert the compressed string to a 3D uint8 tensor
  img = tf.image.decode_jpeg(img, channels=NUM_CHANNELS)
  # Use `convert_image_dtype` to convert to floats in the [0,1] range.
  img = tf.image.convert_image_dtype(img, tf.float32)
  # resize the image to the desired size.
  return tf.image.resize(img, (IMG_HEIGHT, IMG_WIDTH))

def process_path(filePath):
  label = get_label(filePath)
  # load the raw data from the file as a string
  img = tf.io.read_file(filePath)
  img = decode_img(img)
  return img, label

def prepare_dataset(fileTxt, cache=True, shuffle_buffer_size=1000):
  fileList = []
  with open(fileTxt) as f:
    for line in f.readlines():
      if line.strip()!="":
        fileList.append(line.strip())
  shuffle(fileList)
  data_list = tf.data.Dataset.from_tensor_slices(fileList)
  ds = data_list.map(process_path, num_parallel_calls=AUTOTUNE)

  # if preproc_fn.keywords is not None and 'resize' not in preproc_fn.keywords:
  #   assert batch_size == 1, "Batching images must be of the same size"
  # ds = ds.map(preproc_fn, num_parallel_calls=AUTOTUNE)

  # augmentations
  augmentations = [rotate, zoom]
  # Add the augmentations to the dataset
  for f in augmentations:
      # Apply the augmentation, run 4 jobs in parallel.
      # ds = ds.map(f, num_parallel_calls=4)
      ds = ds.map(lambda x, y: tf.cond(tf.random.uniform([], 0, 1) > AUGMENTATION_PROB, 
                                    lambda: f(x, y), lambda: (x,y)), num_parallel_calls=AUTOTUNE)
  # Make sure that the values are still in [0, 1]
  ds = ds.map(lambda x, y: (tf.clip_by_value(x, 0.0, 1.0), y), num_parallel_calls=AUTOTUNE)

  # This is a small dataset, only load it once, and keep it in memory.
  # use `.cache(filename)` to cache preprocessing work for datasets that don't
  # fit in memory.
  if cache:
    if isinstance(cache, str):
      ds = ds.cache(cache)
    else:
      ds = ds.cache()

  if shuffle_buffer_size>0:
    ds = ds.shuffle(buffer_size=shuffle_buffer_size)

  # Repeat forever
  ds = ds.repeat()
  ds = ds.batch(BATCH_SIZE)
  # `prefetch` lets the dataset fetch batches in the background while the model
  # is training.
  ds = ds.prefetch(buffer_size=AUTOTUNE)
  return fileList, ds

## Simple model
def get_uncompiled_model(input_shape, num_classes):
  inputs = tf.keras.Input(shape=input_shape, name='digits')
  x = Conv2D(32, kernel_size=(3, 3), activation='relu', padding="same",
             input_shape=input_shape, name="conv1")(inputs)
  x = Conv2D(64, (3, 3), activation='relu', padding="same", name="conv2")(x)
  x = MaxPooling2D(pool_size=(2, 2))(x)

  x = Conv2D(64, (3, 3), activation='relu', padding="same", name="conv3")(x)
  x = Conv2D(64, (3, 3), activation='relu', padding="same", name="conv4")(x)
  x = MaxPooling2D(pool_size=(2, 2))(x)

  # x = Conv2D(128, (3, 3), activation='relu', padding="same", name="conv5")(x)
  # x = Conv2D(128, (3, 3), activation='relu', padding="same", name="conv6")(x)
  # x = MaxPooling2D(pool_size=(2, 2))(x)

  # x = Conv2D(128, (3, 3), activation='relu', padding="same", name="conv7")(x)
  # x = Conv2D(128, (3, 3), activation='relu', padding="same", name="conv8")(x)
  # x = MaxPooling2D(pool_size=(2, 2))(x)

  x = Dropout(0.25)(x)
  x = Flatten()(x)
  # x = Dense(1024, activation='relu', name="fc9")(x)
  # x = Dropout(0.25)(x)
  x = Dense(128, activation='relu', name="fc10")(x)
  x = Dropout(0.5)(x)
  outputs = Dense(num_classes, activation='softmax', name='predictions')(x)
  model = tf.keras.Model(inputs=inputs, outputs=outputs)
  return model

def get_compiled_model(input_shape, num_classes, networkName="", optimizerName=""):
  model = get_uncompiled_model(input_shape, num_classes)
  # Build model with optimizer
  # model.compile(loss=tf.keras.losses.categorical_crossentropy,
  #               optimizer=tf.keras.optimizers.Adadelta(),
  #               metrics=['accuracy'])
  model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=1e-3),
            loss='categorical_crossentropy',
            metrics=['accuracy'])
  return model

if __name__ == "__main__":
  networkName = "simpleCNN" # "simpleCNN" "AlexNet" "VGGNet" "ResNet" "InceptionNet"
  optimizerName = "RMSP" #"ADEL" "SGD" "ADAM" "RMSP"
  datasetName = "kanji"
  logFile = "_".join([networkName, optimizerName, datasetName]) + "_" + datetime.now().strftime('%Y.%m.%d_%H.%M.%S')
  logFolder = LOG_PATH + logFile + "/"

  # prepare data generator
  trainFileList, train_ds = prepare_dataset("kkanji2_train.txt", cache=False,
                                            shuffle_buffer_size=len(labelList))
  testFileList, test_ds = prepare_dataset("kkanji2_test.txt", cache=False,
                                          shuffle_buffer_size=-1)
  num_classes = len(labelList)
  print("Total %d categories. Train set with %d images and test set with %d images."
        %(num_classes, len(trainFileList), len(testFileList)))
  # for img, lb in train_ds:
  #    print(img.shape, np.argmax(lb.numpy(), axis=-1), np.min(img.numpy()), np.max(img.numpy()))  
  # plot_images(train_ds, n_images=8, samples_per_image=10)

  if tf.keras.backend.image_data_format() == 'channels_first':
      input_shape = (NUM_CHANNELS, IMG_HEIGHT, IMG_WIDTH)
  else:
      input_shape = (IMG_HEIGHT, IMG_WIDTH, NUM_CHANNELS)
  model = get_compiled_model(input_shape, num_classes)
  model.summary()

  # if args.resume !="":
  #   model.load_weights(args.resume)
  #   print("Succesfully loaded weights from %s"%(args.resume))

  # Callback functions
  fpath = logFolder + 'weights.{epoch:02d}-{loss:.2f}-{accuracy:.2f}-{val_loss:.2f}-{val_accuracy:.2f}.hdf5'
  cp_cb = tf.keras.callbacks.ModelCheckpoint(filepath=fpath, monitor='val_loss', 
                                             verbose=1, save_best_only=True, mode='auto')
  es_cb = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=20, 
                                           verbose=1, mode='auto')
  tb_cb = tf.keras.callbacks.TensorBoard(log_dir=logFolder, histogram_freq=1)
  lg_cb = tf.keras.callbacks.CSVLogger(logFolder + logFile + '.csv')
  
  # Train and evaluate model on dataset
  print("Training %s"%logFile)
  model.fit_generator(generator = train_ds, validation_data = test_ds,
                      steps_per_epoch = int(len(trainFileList)/BATCH_SIZE),
                      validation_steps= int(len(testFileList)/BATCH_SIZE),
                      callbacks = [cp_cb, es_cb, tb_cb, lg_cb],
                      epochs=MAX_EPOCHES, verbose=1)
                      
  json_string = model.to_json()
  open(os.path.join(logFolder + logFile + '.json'), 'w').write(json_string)
  
  print("Evaluating %s"%logFile)
  train_score = model.evaluate_generator(generator=train_ds, verbose=0,
                                         steps = int(len(trainFileList)/BATCH_SIZE)) 
  test_score = model.evaluate_generator(generator=test_ds, verbose=0,
                                        steps = int(len(testFileList)/BATCH_SIZE))
  print('Train loss:', train_score[0])
  print('Train accuracy:', train_score[1])
  print('Test loss:', test_score[0])
  print('Test accuracy:', test_score[1])

/content/kmnist
TensorFlow 2.x selected.
Total 3832 categories. Train set with 112071 images and test set with 28353 images.
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
digits (InputLayer)          [(None, 64, 64, 1)]       0         
_________________________________________________________________
conv1 (Conv2D)               (None, 64, 64, 32)        320       
_________________________________________________________________
conv2 (Conv2D)               (None, 64, 64, 64)        18496     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 32, 32, 64)        0         
_________________________________________________________________
conv3 (Conv2D)               (None, 32, 32, 64)        36928     
_________________________________________________________________
conv4 (Conv2D)               (None, 32, 32, 64)        36928     
__

In [37]:
#@title Visualize training process by tensorboard
LOG_PATH = "/content/kmnist/tensorlog/simpleCNN_RMSP_kanji_2019.11.02_00.41.22"
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_PATH)
)
# Install
! npm install -g localtunnel

# Tunnel port 6006 (TensorBoard assumed running)
get_ipython().system_raw('lt --port 6006 >> url.txt 2>&1 &')

# Get url
! cat url.txt

[K[?25h/tools/node/bin/lt -> /tools/node/lib/node_modules/localtunnel/bin/lt.js
+ localtunnel@2.0.0
added 35 packages from 21 contributors in 2.497s
your url is: https://tall-skunk-50.localtunnel.me


In [0]:
#@title Augmentation experiments


In [0]:
#@title Visualize feature maps


### Experiment results
KMNIST with only 10 classes is really easy to work even with KNN method.

From now, let try with Hiragana classes.

### Discussions



# November, 2019