Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Furigana search mode #1

Open
djahandarie opened this issue Jul 4, 2021 · 2 comments
Open

Furigana search mode #1

djahandarie opened this issue Jul 4, 2021 · 2 comments

Comments

@djahandarie
Copy link

First off, love the project, this is a wonderful idea.

One reason I use corpora is to find the right reading for a non-dictionary word.

Right now it's hard to use massif for the purpose, so it'd be nice if massif had a checkbox to only show results with furigana. (Something like the percentages on furigana.info would be a bonus but honestly not that important because I like to look through all the results individually anyways).

P.S. I've noticed that sometimes massif doesn't show the furigana for compound words. Eg search "枯れた魔術師" — the originals have furigana on all the hits but massif doesn't show it.

@rsimmons
Copy link
Owner

rsimmons commented Jul 4, 2021

Thanks for the feedback!

One reason I use corpora is to find the right reading for a non-dictionary word.
Ah, words that are likely to have furigana because they are not in a dictionary and natives would also need them? Can you give me a couple examples just for reference/testing?

In this initial version, I coded things quickly and meant to completely strip furigana and then maybe revisit them later and handle them properly. I see now that many are in there, but not all, per your example. So I'll bump that up the list.

And once that's done, I can see adding a checkbox per your suggestion.

@djahandarie
Copy link
Author

For some examples, 夜闇 is listed as やあん in the dictionary, but this is often intended to be read as よやみ. 絹服 is unlisted in the dictionary, and furigana.info only shows けんぷく but this is often read きぬふく. Then you have things like 蛇王 which could be read へびおう or じゃおう but it'd be interesting to see the distribution. 豹頭 is often read ひょうとう but it's be nice to see if anyone ever gives it ひょうあたま. Basically any novel/rare compound is kinda flexible in its reading and it's useful to be able to look up what authors tend to intend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants