Skip to content
This repository has been archived by the owner. It is now read-only.
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

README.md

Scripts for HiraganatimesCorpus

What's this?

  • These scripts make the corpus machine-friendly

Requirements

  • python 2.X
  • NLTK for sentence split

How to use

  • find THE_CORPUS_DIRECROTY -type f | grep TXT | python ./converter.py PREFIX_OF_OUTPUT_FILE -

##Known problems

  • The numbers of both sentence of both languages are not equal in the new format (from 2011.01)

They are marked with FIXME in the script.

License

  • General Public License Version3
  • Copyright (C) 2013- Yuta Hayashibe

About

No description, website, or topics provided.

Resources

Releases

No releases published

Packages

No packages published

Languages