Be notified of new releases
Create your free GitHub account today to subscribe to this repository for new releases and build software alongside 31 million developers.Sign up
If you are the beginner of NLP, I don't recommend that you use this tag. I recommend that you use the latest version of master branch. I don't accept your request or complaint for this tag. (^O^).
We created the seed file of a neologism dictionary of a POS tagger on 2015-06-23.
The seed file in this tag (v0.0.2) will not update forever.
Therefore, this tag is very useful for the following applications.
- Experiments for evaluation of the research results
- Reproducibility of the experimental results of others
- Creation of the processing results of morphological analysis that doesn't update forever
We created the seed file using following resources.
- Dump data of hatena keyword
- Japanese postal code number data download (ken_all.lzh)
- The name-of-the-station list of whole country of Japan
- The entry data of the person name (last name / first name)
- The entry data of Unicode emoji
- The entry data of the adverbs
- A lot of documents, which crawled from Web