No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Scripts for HiraganatimesCorpus

What's this?

  • These scripts make the corpus machine-friendly


  • python 2.X
  • NLTK for sentence split

How to use

  • find THE_CORPUS_DIRECROTY -type f | grep TXT | python ./ PREFIX_OF_OUTPUT_FILE -

##Known problems

  • The numbers of both sentence of both languages are not equal in the new format (from 2011.01)

They are marked with FIXME in the script.


  • General Public License Version3
  • Copyright (C) 2013- Yuta Hayashibe