Investigate Japanese board #36

pserwylo · 2017-07-22T00:06:38Z

Pretty much exactly the same as #35 (but that is for Chinese). The Japanese UI translation was one of the first to be contributed, so I want to also have a Japanese version of the board, but I don't have enough knowledge of the language to figure out if it is meaningful or how to go about implementing it.

Riotism · 2017-08-02T16:56:48Z

For a Japanese board, I would imagine each tile being a kana. A kana is a syllable (like letters) that make up a "word". There are two ways of writing the same kana, hiragana and katakana. There is already a word game involving linking kana (Shiritori) which uses a similar mechanism. I am not a Japanese speaker but I definitely see a Japanese board being feasible.

wichmann · 2018-05-10T09:53:36Z

Disclaimer: I do not speak Japanese as mother tongue. I'm learning Japanese as a hobby and can understand/read it to some degree. But I would be interested to see Lexica with Japanese words.

I agree with @Riotism. IMHO it would be possible and reasonable to use only kana. The two syllabaries (hiragana and katakana) are used for different purposes but are otherwise interchangeable. Therefore you could use just one of them.

To get a Japanese word list, I think you would have to start with a good dictionary and get readings (kana) for all words, because most Japanese words are written with a script called kanji, which are logograms like the Chinese characters. Then you could convert all kana into one of the syllabaries and use the result as word list.

Currently the dictionary most free apps are using is JMdict/EDICT. It can be used under the terms of the Creative Commons licence. I wrote a simple Python script to get the dictionary and create a usable word list from that. Here is the result: https://gist.github.com/wichmann/7912e0f7694ad8fdbd584b94b2e792f0.

pserwylo · 2018-05-18T10:37:56Z

Oh, that is great, thanks so much @wichmann! I've taken your word list, and it does indeed work successfully (working on my fork on a branch called japanese). My first attempt at running my scripts successfully:

Build a trie data structure from the word list (./gradlew buildDictionary_jp)
Generates a probability distribution to produce boards (./gradlew analyseLanguage_jp). After generating 1000 boards, it produces boards which average about 35 wrods, the worst board was about 10 words, and the best board was about 75 boards. I'll keep running the algorithm and see if I can improve those statistics and get some better boards.

I will try and prepare a release with it to get further feedback, but before that I'll quickly test:

That the game can actually be played (e.g. it doesn't crash when using Kanji)
Will document the process properly, including including your script and documenting the JMdict/EDICT stuff.

@wichmann - Do you mind if I include your script in the ./tools directory? If so, what license may I use? Preferably the GPLv3+ license, but it is of course your choice.

Also, would you be able to provide any feedback on the letter scores I've taken from Wikipedia and added here?

pserwylo · 2018-05-18T11:00:52Z

I've taken the "small letters" and put them next to what I think looked like (to my naive English-reading eyes) to be the larger versions of the same letter, giving them the same score:

pserwylo@0ad1bbd

Commit message above explains further.

pserwylo · 2018-05-18T11:38:08Z

Now I've dealt with the diacritics in this commit:

pserwylo@2bd305a

Only a few more characters left:

ゐ
ゑ
を
〜
が
ぎ
ー

Any feedback for these?

pserwylo · 2018-05-18T11:50:09Z

FYI, I'm guessing that the idea in #71 will also be appropriate here, based on the wikipedia article about Scrabble letters, and how they seem to be somewhat normalized (with regards to diacritics). If so, it will probably have to wait until myself or someone else is able to implement the neccessary changes to the guts of Lexica and how it stores word lists internally.

wichmann · 2018-05-21T08:34:19Z

@pserwylo - Thanks for all your work. Of course you can include my script. As license the GPLv3+ is fine by me.

As for the characters left:

"ゐ" and "ゑ" are obsolete hiragana which are not used today, only in old texts. "〜" represents a Japanese tilde, IMHO it is never used in words, only for ranges or special purposes. All words with these three characters can be eliminated from the word list, as there are only a few of those.

"を" is used as a grammatical marker ("particle") and in loan words, but usually not in japanese dictionary words. Mostly, it is present in the word list, because the list contains phrases where it serves as particle.

"が" and "ぎ" are just versions of "か" and "き" with diacritics.

"ー" is used as a symbol for a long vowel, almost never used with hiragana, only with katakana. My script tries to convert all words to hiragana and it falsely leaves these characters in. In hiragana the symbol should be replaced by the vowel which it represents. Maybe there is a better way to make the conversion in the script?!

pserwylo · 2021-03-23T15:53:14Z

Closing as a Japanese dictionary has existed for some time. If there are any issues with it, we can always open new issues.

pserwylo mentioned this issue May 20, 2018

Add Japanese dictionary #81

Merged

pserwylo closed this as completed Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate Japanese board #36

Investigate Japanese board #36

pserwylo commented Jul 22, 2017 •

edited

Loading

Riotism commented Aug 2, 2017 •

edited

Loading

wichmann commented May 10, 2018

pserwylo commented May 18, 2018

pserwylo commented May 18, 2018

pserwylo commented May 18, 2018

pserwylo commented May 18, 2018

wichmann commented May 21, 2018

pserwylo commented Mar 23, 2021

Investigate Japanese board #36

Investigate Japanese board #36

Comments

pserwylo commented Jul 22, 2017 • edited Loading

Riotism commented Aug 2, 2017 • edited Loading

wichmann commented May 10, 2018

pserwylo commented May 18, 2018

pserwylo commented May 18, 2018

pserwylo commented May 18, 2018

pserwylo commented May 18, 2018

wichmann commented May 21, 2018

pserwylo commented Mar 23, 2021

pserwylo commented Jul 22, 2017 •

edited

Loading

Riotism commented Aug 2, 2017 •

edited

Loading