This project aims to classify Japanese sentence to how well similar to some Japanese classical writers, such as Soseki Natsume, Ogai Mori, Ryunosuke Akutagawa and so on.
Getting data from Aozora-bunko(青空文庫)

I'm getting training data from Aozora-bunko, which is a public repository for Japanese classical pieces with outdated copyright.

In order to download text pieces from Aozora-bunko, I made It downloads all the pieces in zip file of certain authors, specified in target_author.csv, extracts the zips to SHIFT-JIS encoded text files, converts them to utf-8 and finally changes the file format to csv. The last csv has each lines splitted by "。", the Japanese form of period, and new line feed.

Training the model

Thanks to the following paper and blog, I use character-level convolurional neural network to train the classification model for Japanese sentences.

After downloading and changing the text to csv, run the to generate classification model.


As you have finished generating the model, now you are ready to use it.