Hanzi-Reader

iOS app that uses a convolutional neural network to identify pictures of handwritten Chinese characters

Pretraining

A base model using Lenet-5 architecture was created with keras subclassing and pretrained on MNIST dataset. The architecture consists of inputs 28x28 fed into a 5x5x6 conv layer --> average pooling layer --> 5x5x16 conv layer --> average pooling layer --> flatten --> fc 120 --> fc 84 --> softmax 10 classes

Transfer Learning & Fine Tuning

Transfer learning was used to fine tune the model on a custom image dataset collected of handwritten Chinese characters. These digits were drawn with pen, pencil, and marker. Fine tuning involved freezing the convolutional layers and retraining only the last few fully connected layers on the custom dataset. This is done because convolutional layers trained on the MNIST handwritten digits may also share some features with a custom handwritten dataset, while the last few FC layers are more specialized and need to be adapted to the new dataset through fine tuning.

Model Evaluation and Improvement

After fine tuning the model achieve a training accuracy of 99.31% and testing accuracy of 82.11%, indicating that overfitting has occured. This is probably due to the dataset size being so small, some ways to prevent overfitting include adding more data (increasing the dataset size or data augmentation), regularization, or neural network architecture search. Manual error analysis revealed that the incorrectly classified samples were all drawn with pencil or pen. So make the best use of the pretrained model, these samples were removed, leading to a 100% accuracy on the training set and 99% accuracy on the test set with 50 epochs of gradient descent.

Convert Tensorflow model to CoreML

The model was then converted to a CoreML model using the coremltools library. The coremltools library allows specification of an image input type. When exported into Xcode, the model takes an input of a grayscale 28x28 image and outputs a String of predicted class label.

iOS app using SwiftUI, CoreML, and Vision frameworks

The iOS app layout was built using the SwiftUI framework. Upon a button click, users can choose to take a photo from the camera or upload an image from photo library within the app. Previously, images fed into a CoreML model need to be resized, grayscaled, and converted to a CVPixelBuffer. This is complicated and can slow down the model. Instead the new Vision framework can do all the preprocessing for us. The Vision framework was then used to take the input image and preprocess it to fit the input specifications that the CoreML model expects, as well as take the ouput of the model and format it to a String to be displayed on the screen.

App demo video: https://drive.google.com/file/d/1CT25tNmgSGQFtWThEVIeexCE5QB7YCCD/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
HanziReader		HanziReader
dataset		dataset
model_finetuned		model_finetuned
model_pretrained		model_pretrained
results		results
.DS_Store		.DS_Store
CoreModel.mlmodel		CoreModel.mlmodel
README.md		README.md
coreml_input_output.png		coreml_input_output.png
coreml_model.mlmodel		coreml_model.mlmodel
error_analysis.py		error_analysis.py
error_analysis_pretraining.py		error_analysis_pretraining.py
finetuning.py		finetuning.py
lenet-5.jpeg		lenet-5.jpeg
model_finetuned.h5		model_finetuned.h5
model_pretrained.h5		model_pretrained.h5
pretraining.py		pretraining.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hanzi-Reader

Pretraining

Transfer Learning & Fine Tuning

Model Evaluation and Improvement

Convert Tensorflow model to CoreML

iOS app using SwiftUI, CoreML, and Vision frameworks

About

Releases

Packages

Contributors 2

Languages

mrivera42/Hanzi-Reader

Folders and files

Latest commit

History

Repository files navigation

Hanzi-Reader

Pretraining

Transfer Learning & Fine Tuning

Model Evaluation and Improvement

Convert Tensorflow model to CoreML

iOS app using SwiftUI, CoreML, and Vision frameworks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages