The goal of this project is to recognise the 100 most commonly used chinese characters, written in calligraphy in the semi-cursive script. As compared to handwritten chinese characters, chinese calligraphy characters are harder to recognise, even for trained experts because they can be written in a less restrictive way to express the author’s personality and emotions... (read more here)
Sample data from character "总":
lear-gist-python is used to extract GIST features. The installation instruction can be found in the repo.
The other depedencies can be installed by running:
$ pip install -r requirements.txt
notebooks/*
- ipynb files to collect and preprocess the data.train.ipynb
- Demo to train the model. The data directories and model hyperparameters have to be specified in config.yaml.data/cccr/*
- Train, validation and test set.data/common.txt
- The most common chinese characters arranged in descending order, ie. the first 100 lines contain the 100 most commonly used characters.