## Kuzushiji Recognition with the concept of Hand-Written digit recognition

This kernel is created with respect to the competition [Kuzushiji Recognition](https://www.kaggle.com/c/kuzushiji-recognition/data)<br> and also it is forked from a previous kernel [Kuzushiji Visualisation](https://www.kaggle.com/anokas/kuzushiji-visualisation)

**Saa.. Hajimeruu...<br>(eng sub: so.. Lets Begin..)**

This kernel demostrates my approach to Kuzushiji recognition with the simplest Hand-written Digit recognition technique, which is the most basic thing  that we learn for understanding optical character recognition.<br>My Article on Hand-Written Digit recognition.Link - [Handwritten Digits Recognition](https://medium.com/@basu369victor/handwritten-digits-recognition-d3d383431845). I have written this article with respect to my very first experience in optical character recognition. 
## **What is Kuzushiji ?**
Kuzushiji is a Japanese cursive script. Many of pre-modern documents are, whether they were handwritten or print, written in kuzushiji. It is extremely important to get familiar with kuzushiji in order to read pre-modern Japanese texts. Understanding 変体仮名 (hentaigana), the variant (and obsolute) form of modern hiragana, is another essential skill to read pre-modern Japanese texts. Hiragana is one of the components of the Japanese phonetic lettering system, and with a few exceptions, each sound in the Japanese language is represented by one hiragana character. However, until the Japanese script reform of 1900, each syllable had been written using a variety of hiragana, originated in manʼyōgana, an ancient writing system that employs Chinese kanji characters to represent the Japanese language. The variant form is now called 変体仮名( hentaigana).

Source - [Crane(Competition Discussion)](https://www.kaggle.com/c/kuzushiji-recognition/discussion/100951)
![Kuzushiji](https://miro.medium.com/max/1400/1*Y-JaqNDSQMvklXn39KOrkg.jpeg)

Vast portions of Japanese historical documents now cannot be read by most Japanese people. By helping to automate the transcription of kuzushiji we would contribute to unlocking a priceless trove of books and records.

The specific task is to locate and classify each kuzushiji character on a page. While complete bounding boxes are provided for the training set, only a single point within the ground truth bounding box is needed for submissions.

## The most difficult part of this Problem.....
![konosuba](https://i.ytimg.com/vi/rTztN4OYSOk/maxresdefault.jpg)
<br>According to me the most difficult part of this problem was not recognition of Kuzushiji, rather it was the detection of Kazushiji. <br>Training a machine learning or deep learning model to recognize Kuzushiji would never be big deal. At the end of the road, for recognition of Kuzushiji, the only thing that matters is accuracy. And there are so many machine learning algorithms and deep neural networks, we could approach this recognition problem with any of them to reach the desired accuracy.<br>But what about the detection of Kuzushiji. To detect Kuzushiji simply from a sheet of paper, it requires a big brain. It is not about the detection of a single character, there are multiple characters on a single sheet of paper and you have to detect all of them. Choosing an algorithm or technique to detect characters on a sheet of paper requires time, patient and understanding.

## Why did I choose the most basic approch for Kuzushiji Detection?
<br>
It might sound awkward, but I am going, to tell the truth.<br><br>
First of all, in both the problem hand-written digit recognition and Kuzushiji recognition, the recognition word was common. After that in both the problem, you need to detect something which is written with very bad handwriting. I both the case I struggled most of the time for detection of characters from a sheet of paper, I mean when I learned handwritten digit recognition for the first time, the detection part costs me a lot of time. No matter how many times I tried it always resulted in either not detecting anything or detection of wrong objects instead of digits. It might sound like I did not have any valid logic to approach this problem but that's the truth.
![](https://i.kym-cdn.com/photos/images/original/000/898/121/784.jpg)
<br>
Here I have applied the same detection technique which I used to solve simple problems like handwritten digit recognition and I hope you would also be familiar with this technique.
![](https://i.ytimg.com/vi/ur6JY2Hl-MM/hqdefault.jpg)

In [None]:
from PIL import Image, ImageDraw, ImageFont
from os import listdir
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cv2
from skimage.feature import hog
import os
from tqdm import tqdm
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
from keras import backend as K
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.python import keras
from keras.layers import Dense, Flatten, Conv2D, Dropout, MaxPooling2D, BatchNormalization,Input
from keras.models import Model,load_model
from IPython.display import SVG
from keras.callbacks import EarlyStopping,ModelCheckpoint
from keras.utils.vis_utils import model_to_dot
from keras.utils import plot_model
#%matplotlib inline
print(os.listdir("../input/"))
InputPath = "../input/artificial-lunar-rocky-landscape-dataset/images/"
# Any results you write to the current direct

First, in order to visualise the dataset, we need a font that can display the full range of Japanese characters. We're using [Noto Sans](https://en.wikipedia.org/wiki/Noto_fonts), an open source font by Google which can display very almost all the characters used within this competition.

In [None]:
fontsize = 50

# From https://www.google.com/get/noto/
!wget -q --show-progress https://noto-website-2.storage.googleapis.com/pkgs/NotoSansCJKjp-hinted.zip
!unzip -p NotoSansCJKjp-hinted.zip NotoSansCJKjp-Regular.otf > NotoSansCJKjp-Regular.otf
!rm NotoSansCJKjp-hinted.zip

font = ImageFont.truetype('./NotoSansCJKjp-Regular.otf', fontsize, encoding='utf-8')

# Visualising the training data
You'll notice that some of the characters "off to the side" of columns in the text aren't annotated in the training set. These characters are annotations and not part of the main text of the books, so they shouldn't be transcribed by your model.

In [None]:
df_train = pd.read_csv('../input/train.csv')
unicode_map = {codepoint: char for codepoint, char in pd.read_csv('../input/unicode_translation.csv').values}
unicode_map

In [None]:
df_train.isnull().sum()

In [None]:
# This function takes in a filename of an image, and the labels in the string format given in train.csv, and returns an image containing the bounding boxes and characters annotated
def visualize_training_data(image_fn, labels):
    # Convert annotation string to array
    labels = np.array(labels.split(' ')).reshape(-1, 5)
    #print(labels)
    
    # Read image
    imsource = Image.open(image_fn).convert('RGBA')
    bbox_canvas = Image.new('RGBA', imsource.size)
    char_canvas = Image.new('RGBA', imsource.size)
    bbox_draw = ImageDraw.Draw(bbox_canvas) # Separate canvases for boxes and chars so a box doesn't cut off a character
    char_draw = ImageDraw.Draw(char_canvas)

    for codepoint, x, y, w, h in labels:
        x, y, w, h = int(x), int(y), int(w), int(h)
        char = unicode_map[codepoint] # Convert codepoint to actual unicode character

        # Draw bounding box around character, and unicode character next to it
        bbox_draw.rectangle((x, y, x+w, y+h), fill=(255, 255, 255, 0), outline=(255, 0, 0, 255))
        char_draw.text((x + w + fontsize/4, y + h/2 - fontsize), char, fill=(0, 0, 255, 255), font=font)
        Croped_image = imsource.crop((x, y, x+w, y+h))
        plt.figure()
        print(str(unicode_map[codepoint]))
        plt.imshow(Croped_image)
        plt.show()

    imsource = Image.alpha_composite(Image.alpha_composite(imsource, bbox_canvas), char_canvas)
    imsource = imsource.convert("RGB") # Remove alpha for saving in jpg format.
    return np.asarray(imsource)

In [None]:
np.random.seed(1337)

for i in range(1):
    img, labels = df_train.values[np.random.randint(len(df_train))]
    viz = visualize_training_data('../input/train_images/{}.jpg'.format(img), labels)
    
    plt.figure(figsize=(15, 15))
    plt.title(img)
    plt.imshow(viz, interpolation='lanczos')
    plt.show()

In [None]:
def preProcessImage(image):
    #image = np.asarray(image)
    #image = image.resize((300,300))
    #image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    ret,th1 = cv2.threshold(image,155,255,cv2.THRESH_BINARY)
    return th1

**Extract Data** - It is a function that crops out the kuzushiji from the image, resize it to a fixed size, covert it to grayscale and then does a Binary Threshold. <br>
**Simple Thresholding** -  If pixel value is greater than a threshold value, it is assigned one value (may be white), else it is assigned another value (may be black). The function used is cv2.threshold. First argument is the source image, which should be a grayscale image. Second argument is the threshold value which is used to classify the pixel values. Third argument is the maxVal which represents the value to be given if pixel value is more than (sometimes less than) the threshold value. OpenCV provides different styles of thresholding and it is decided by the fourth parameter of the function

In [None]:
# This function takes in a filename of an image, and the labels in the string format given in a submission csv, and returns an image with the characters and predictions annotated.
def Extract_Data():
    X_=[]
    y_=[]
    # Convert annotation string to array #300
    for img, labels in tqdm(df_train[:420].values):
        try:
            image_fn = '../input/train_images/{}.jpg'.format(img)
            labels = np.array(labels.split(' ')).reshape(-1, 5)
            # Read image
            imsource = Image.open(image_fn).convert('RGBA')
            bbox_canvas = Image.new('RGBA', imsource.size)
            char_canvas = Image.new('RGBA', imsource.size)
            bbox_draw = ImageDraw.Draw(bbox_canvas) # Separate canvases for boxes and chars so a box doesn't cut off a character
            char_draw = ImageDraw.Draw(char_canvas)

            for codepoint, x, y, w, h in labels:
                x, y, w, h = int(x), int(y), int(w), int(h)
                char = unicode_map[codepoint] # Convert codepoint to actual unicode character

                # Draw bounding box around character, and unicode character next to it
                #bbox_draw.rectangle((x-10, y-10, x+10, y+10), fill=(255, 0, 0, 255))
                #char_draw.text((x+25, y-fontsize*(3/4)), char, fill=(255, 0, 0, 255), font=font)
                Croped_image = imsource.crop((x, y, x+w, y+h))
                image = Croped_image.resize((300,300))
                image = np.asarray(image)
                image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
                ret,th1 = cv2.threshold(image,155,255,cv2.THRESH_BINARY_INV)
                X_.append(th1)
                y_.append(str(unicode_map[codepoint]))
        except:
            pass
    X_ = np.array(X_)
    y_ = np.array(y_)

    '''imsource = Image.alpha_composite(Image.alpha_composite(imsource, bbox_canvas), char_canvas)
    imsource = imsource.convert("RGB") '''# Remove alpha for saving in jpg format.
    return X_,y_

In [None]:
XX_,yy_ = Extract_Data()

In [None]:
plt.figure()
plt.imshow(XX_[99])

In [None]:
XX_.shape

In [None]:
unique, counts = np.unique(yy_, return_counts=True)
print(unique, counts )

In [None]:
NoOfClasses = len(unique)
NoOfClasses

In [None]:
IMG_ROWS=300
IMG_COLS=300
def PreProcessData(X,y):
    lb = LabelEncoder()
    y_integer = lb.fit_transform(y)
    out_y = np_utils.to_categorical(y_integer)
    num_images = X.shape[0]
    out_x = X.reshape(num_images, IMG_ROWS, IMG_COLS, 1)
    #out_x = x_shaped_array / 255
    return out_x, out_y

In [None]:
lb = LabelEncoder()
y_integer = lb.fit_transform(yy_)

In [None]:
X,y = PreProcessData(XX_,yy_)

In [None]:
X_train, X_val, y_train, y_val = train_test_split(X, y, random_state=42)

In [None]:
K.clear_session()
def Kuzushiji_Classifier(in_):
    model_ = Conv2D(32,(3,3),activation='relu', padding="same")(in_)
    model_ = BatchNormalization()(model_)
    model_ =  Conv2D(32,(3, 3), activation='relu')(model_)
    model_ = BatchNormalization()(model_)
    model_ = Conv2D(32,5,strides=2,padding='same',activation='relu')(model_)
    model_ = MaxPooling2D((2, 2))(model_)
    model_ = BatchNormalization()(model_)
    model_ = Dropout(0.4)(model_)
    model_ = Conv2D(64,(3, 3), strides=2,padding='same', activation='relu')(model_)
    model_ = MaxPooling2D(pool_size=(2, 2))(model_)
    model_ = BatchNormalization()(model_)
    model_ = Conv2D(64, kernel_size=(3, 3), strides=2,padding='same', activation='relu')(model_)
    model_ = Dropout(0.4)(model_)
    model_ = Flatten()(model_)
    model_ = Dense(128, activation='relu')(model_)
    model_ = Dropout(0.4)(model_)
    model_ = Dense(NoOfClasses, activation='softmax')(model_)
    return model_

In [None]:
Input_Sample = Input(shape=(300, 300,1))
Output_ = Kuzushiji_Classifier(Input_Sample)
Model_Enhancer = Model(inputs=Input_Sample, outputs=Output_)

In [None]:
Model_Enhancer.compile(loss = "categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
Model_Enhancer.summary()

In [None]:
checkpointer = ModelCheckpoint('model_Kuzushiji.h5', verbose=0,mode='auto', monitor='val_acc',save_best_only=True)

In [None]:
ModelHistory = Model_Enhancer.fit(X_train, y_train,
          batch_size=100,
          epochs=32,
          verbose=1,callbacks=[checkpointer],
          validation_data=(X_val, y_val))

In [None]:
#Loss Curves
plt.figure(figsize=[20,9])
plt.plot(ModelHistory.history['loss'], 'r')
plt.plot(ModelHistory.history['val_loss'], 'b')
plt.legend(['Training Loss','Validation Loss'])
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Curves')

In [None]:
#Accuracy Curves
plt.figure(figsize=[20,9])
plt.plot(ModelHistory.history['acc'], 'r')
plt.plot(ModelHistory.history['val_acc'], 'b')
plt.legend(['Training Accuracy','Validation Accuracy'])
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Accuracy Curves')

According to the the Accuracy and Loss curve my model might look overfitted, but to overcome this overfitting you could do a slight change in the processing technique. In **the Extract_Data** function where I have used the code 
> <br>ret,th1 = cv2.threshold(image,155,255,cv2.THRESH_BINARY_INV)<br>

you could use *THRESH_BINARY* instead of *THRESH_BINARY_INV* and this would regularize the model

In [None]:
Model_ = load_model('model_Kuzushiji.h5')

# Visualising predictions
For the test set, you're only required to predict a single point within each bounding box instead of the entire bounding box (ideally, the centre of the bounding box). It may also be useful to visualise the box centres on the image:

## About the Detection technique
<br>First of all we have used OpenCV library for this purpose. At first the image is gray scaled,  thenit is passed through inverse binary threshold, which one of the type of sample thresholding technique. We then use **findContours** to detect the location of ink spots over the image. <br>**Contours** can be explained simply as a curve joining all the continuous points (along the boundary), having same color or intensity. The contours are a useful tool for shape analysis and object detection and recognition.<br> We do not need to detect every ink spots because some of them indicates boundary line or just simple unnecessary spot on the sheet of paper. Therefore we have fixed a size which is 7000 in this case to exclude those unnecessary spots and lines from being detected and only the Kuzushiji could be detected properly.
![](https://res.cloudinary.com/teepublic/image/private/s---JKs_-9D--/t_Preview/b_rgb:ffffff,c_limit,f_jpg,h_630,q_90,w_630/v1555995870/production/designs/4697372_2.jpg)

In [None]:
def VisualizeKuzushiji(imagePath):
    img = cv2.imread(imagePath)
    imsource = Image.open(imagePath)#fromarray(img)
    char_draw = ImageDraw.Draw(imsource)
    im_grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    ret, im_th = cv2.threshold(im_grey, 130, 255, cv2.THRESH_BINARY_INV)
    ctrs,_ = cv2.findContours(im_th.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    rects = [cv2.boundingRect(ctr) for ctr in ctrs]
    Kuzushijis = []
    for rect in rects:
        leng = int(rect[3] * 1.6)
        pt1 = int(rect[1] + rect[3]//2 - leng// 2)
        pt2 = int(rect[0] + rect[2]//2 - leng// 2)
        roi = im_th[pt1:pt1+leng, pt2:pt2+leng]
        #bbox_draw.rectangle((rect[0], rect[1], rect[0] + rect[2],rect[1] + rect[3]), fill=(0, 225, 0, 0))
        #print(roi.size)
        if roi.size>7000:
            cv2.rectangle(img, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (225, 0, 0), 6)
            roi = cv2.resize(roi, (300,300))
            #roi = cv2.dilate(roi, (3, 3))
            ret,th1 = cv2.threshold(roi,155,255,cv2.THRESH_BINARY)
            ProcessImage = th1.reshape(1,IMG_ROWS, IMG_COLS, 1)
            y_pred = Model_.predict(ProcessImage)
            y_true = np.argmax(y_pred,axis=1)
            Kuzushiji = lb.inverse_transform(y_true)
            #print(Kuzushiji[0])
            Kuzushijis.append(str(Kuzushiji[0]))
            char_draw.text((rect[0]+10, rect[1]),str(Kuzushiji[0]), fill=(0,22,225,0), font=font)
            #cv2.putText(img, str(Kuzushiji[0]), (rect[0], rect[1]),font, 2, (0, 255, 255), 3)
    return img,imsource

In [None]:
img1, imsource1 = VisualizeKuzushiji('../input/train_images/100241706_00014_2.jpg')
plt.figure(figsize=(30,30))
plt.subplot(1,4,1)
plt.title("Detection of Kuzushiji",fontsize=20)
plt.imshow(img1)
plt.subplot(1,4,2)
plt.title("Recognition of Kuzushiji",fontsize=20)
plt.imshow(imsource1)

In [None]:
plt.figure(figsize=(30,30))
plt.title("Recognition of Kuzushiji",fontsize=20)
plt.imshow(imsource1)

In [None]:
img2, imsource2 = VisualizeKuzushiji('../input/test_images/test_001c37e2.jpg')
plt.figure(figsize=(30,30))
plt.subplot(1,4,1)
plt.title("Detection of Kuzushiji",fontsize=20)
plt.imshow(img2)
plt.subplot(1,4,2)
plt.title("Recognition of Kuzushiji",fontsize=20)
plt.imshow(imsource2)

In [None]:
plt.figure(figsize=(30,30))
plt.title("Recognition of Kuzushiji",fontsize=20)
plt.imshow(imsource2)

In [None]:
img3, imsource3 = VisualizeKuzushiji('../input/test_images/test_009f58c8.jpg')
plt.figure(figsize=(30,30))
plt.subplot(1,4,1)
plt.title("Detection of Kuzushiji",fontsize=20)
plt.imshow(img3)
plt.subplot(1,4,2)
plt.title("Recognition of Kuzushiji",fontsize=20)
plt.imshow(imsource3)

In [None]:
plt.figure(figsize=(30,30))
plt.title("Recognition of Kuzushiji",fontsize=20)
plt.imshow(imsource3)

In [None]:
img4, imsource4 = VisualizeKuzushiji('../input/test_images/test_1abdbbfe.jpg')
plt.figure(figsize=(30,30))
plt.subplot(1,4,1)
plt.title("Detection of Kuzushiji",fontsize=20)
plt.imshow(img4)
plt.subplot(1,4,2)
plt.title("Recognition of Kuzushiji",fontsize=20)
plt.imshow(imsource4)

In [None]:
plt.figure(figsize=(30,30))
plt.title("Recognition of Kuzushiji",fontsize=20)
plt.imshow(imsource4)

## About the recognition model
![](https://i.imgur.com/Gz0pm57.jpg)
<br>
The model that is created is a Deep neural network model. this model is highly inspired by the kernel - [Classifying Cursive hiragana(崩し字) KMNIST using CNN](https://www.kaggle.com/gpreda/classifying-cursive-hiragana-kmnist-using-cnn). We have used Conv2D, Dense, Maxpooling2D and BatchNormalization layers to construct our model. For more detail, about each of the layers, you could google them out or visit the above-mentioned kernel.<br>
This model is taking too much time to train. For only 32 epochs it is taking almost 2 hours. But, once I tested it for 40 epochs and surprisingly its accuracy kept increasing till then.<br>
But the fact is I am a very lazy guy, waiting for just 32 epochs my condition is like this..
![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRxfyK9-L_xuJyHy3L9rgET1P0KnrYyS79EgQYHhmew9bemE9b7)
So don't expect me to wait for another 10 epochs...
<br>If you have patience then go for it.

## Future Scopes
* As I mentioned before, the higher number of epochs would give you higher accuracy.
* The parameters or the neural network architecture could be tuned or different neural network architectures could be implemented for better accuracy.
* I found this kernel very interesting for both detection and recognition of Kuzushiji recognition.
[CenterNet -Keypoint Detector-](https://www.kaggle.com/kmat2019/centernet-keypoint-detector)

## Conclusion
* I have tried out my way in solving this problem and also this is not the best way. So you could try out your way while solving this problem.
* I tried out the basic way of optical character recognition to solve this problem and it worked pretty well and satisfying.
* To approach any problem please do not forget the basics, basic concepts could be the solution to Giant and tough problems.

### I hope this Kernel was helpful to you (ﾉ^_^)ﾉ...
### Please Upvote this Kernel if you like it....
![Thank you](https://memestatic.fjcdn.com/pictures/Konosuba+wallpapers+megumin+edition+some+of+these+are+probably_6f11f9_6344521.jpg)