# Project Secondus: Kuzushiji Classification

## What is Kuzushiji?

Kuzushiji is a cursive writing style that was used in Japan for thousands of years which started in the 8th century. Japanese kids were taught this style of writing until 1900 when Japan removed Kuzushiji from the school curriculum as modern Japanese print became more popular.

![kuzushiji.jpeg](attachment:kuzushiji.jpeg)


## Challenges? 
Only a small population of the nation (0.01% of native Japanese speakers) are fluent readers of Kuzushiji. Therefore, having an AI algorithm that is able to transcribe Kuzushiji to modern Japanese would be significant to lead new discoveries in Japanese history.


## Dataset used: KMNIST
### Link: https://github.com/rois-codh/kmnist

![kmnist.png](attachment:kmnist.png)
Contains 70,000 28x28 grayscale images spanning 10 classes (one from each column of hiragana), and is perfectly balanced like the original MNIST dataset (6k/1k train/test for each class)






### 1.1 Importing Necessary Libraries


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from sklearn.manifold import TSNE

#Import Tensorflow and Keras
import tensorflow as tf
from tensorflow import keras
from keras import backend as K

#Importing Weights and Biases
import wandb
from wandb.keras import WandbCallback
wandb.init(project="secondus")

Using TensorFlow backend.


wandb: Wandb version 0.8.27 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade


W&B Run: https://app.wandb.ai/mkiyohara/secondus/runs/fehirigx

### 1.2 Preparing the Dataset

In [None]:
# Input image dimensions
img_rows, img_cols = 28, 28

#Function to load dataset
def load(f):
    return np.load(f)['arr_0']

# Load the data
x_train = load('kmnist-train-imgs.npz')
x_test = load('kmnist-test-imgs.npz')
y_train = load('kmnist-train-labels.npz')
y_test = load('kmnist-test-labels.npz')


# Preparing the data:
print("Scaling input data...")
max_val = np.max(x_train).astype(np.float32)
print("Max value: " +  str(max_val))
x_train = x_train.astype(np.float32) / max_val
x_test = x_test.astype(np.float32) / max_val
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)

"""
Image Processing:

(general info)
	Channels Last: Image data is represented in a three-dimensional array where the last channel represents the color channels, e.g. [rows][cols][channels].
	Channels First: Image data is represented in a three-dimensional array where the first channel represents the color channels, e.g. [channels][rows][cols].
"""
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)
    
# Convert class vectors to binary class matrices (One hot encoding)
num_classes = len(np.unique(y_train))
print("Number of classes in this dataset: " + str(num_classes))
if num_classes > 2:
	print("One hot encoding targets...")
	y_train = keras.utils.to_categorical(y_train, num_classes)
	y_test = keras.utils.to_categorical(y_test, num_classes)

print("Original input shape: " + str(x_train.shape[1:]))

### 1.3 Train Model