Skip to content

n-manas/Corpus-of-Everyday-Japanese-Conversation---Yomichan-Frequency-Dictionary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

File employed to create dictionary: yomichan_fre_dict_from_tsv.py

Source:

Corpus of Everyday Japanese Conversation https://www2.ninjal.ac.jp/conversation/cejc/cejc-wc.html (2nd zip, file 3_cejc_frequencylist_suw_token.xlsx)

Information about Yomichan:

https://github.com/FooSoft/yomichan

Information about the project (in English):

https://www.ninjal.ac.jp/english/research/cr-project/project-3/institute/spoken-language/

Summary from website:

The Corpus of Everyday Japanese Conversation (CEJC) is a vocabulary and word count table based on 200 hours of recorded data (approximately from April 2016 to 2020).

Our project will develop a large-scale corpus of Japanese everyday conversation in a balanced manner. Since informants record their conversations in everyday situations by themselves, naturally occurring conversations can be collected. To build an empirical foundation for the corpus design, we conducted a survey of ordinary conversational behavior of about 250 adults."

Since there were several ranks included in the file, the overall rank was chosen to generate this frequency dictionary.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages