Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Irregular inflections are still incorrect #802

Closed
dovej opened this issue Sep 25, 2017 · 18 comments
Closed

Irregular inflections are still incorrect #802

dovej opened this issue Sep 25, 2017 · 18 comments
Assignees
Labels

Comments

@dovej
Copy link

dovej commented Sep 25, 2017

See #758
The solution that closed the above bug was to special case 行く, いい, etc. That's not a good solution, because it failed to catch extensions like もっていく and かっこいい.

A better solution would be for the inflection panel to take in the POS code as an input and correctly handle the irregular ones like vs, vk, adj-ix, v5k-s, etc. The full list of JMDict POS codes should be considered, there are many obscure irregulars: http://www.edrdg.org/wwwjdic/wwwjdicinf.html#code_tag.

@mvysny
Copy link
Owner

mvysny commented Sep 26, 2017

Unfortunately I do not have the time to examine the irregularities myself. Could you please post the verb which is not inflected properly, along with base1, base2, base3, base4, base5 base-te and base-ta forms? Thanks!

@dovej
Copy link
Author

dovej commented Sep 26, 2017

I don't think I'm communicating this effectively. It is not an issue of individual verbs, it's an issue of entire classes of verbs/adjectives being handled incorrectly.

You don't need to reinvent the wheel, nor try to fix every irregular individually. EDICT already has a conjugator that works well and covers most non-archaic examples:

http://edrdg.org/~smg/cgi-bin/hgweb-jmdictdb.cgi/file/tip/python/conj.py?style=gitweb
http://edrdg.org/~smg/cgi-bin/hgweb-jmdictdb.cgi/file/tip/pg/data/conj.csv?style=gitweb
http://edrdg.org/~smg/cgi-bin/hgweb-jmdictdb.cgi/file/tip/pg/data/conjo.csv?style=gitweb
http://edrdg.org/~smg/cgi-bin/hgweb-jmdictdb.cgi/file/tip/pg/data/conjo_notes.csv?style=gitweb
http://edrdg.org/~smg/cgi-bin/hgweb-jmdictdb.cgi/file/tip/pg/data/conotes.csv?style=gitweb

Note the above files need to be saved and opened to display the characters correctly.

@mvysny
Copy link
Owner

mvysny commented Sep 26, 2017

Thanks, but that's precisely it: I unfortunately lack the time and skills to grok Python scripts and figure out how to run them. Also, I can't use them in Android - I'd need to convert them to Java first. Also, Aedict already has an inflector which I'd like to use (it's not public though).

The best way for you to provide me with data is to state the following here:

  1. the verb class in JMDict (say, v5aru), which is not being inflected properly
  2. Verb example, e.g. kudasaru
  3. base1 inflection: kudasara
  4. base2 inflection: kudasai
  5. etc base3, base4, base5, base-te, base-ta

@dovej
Copy link
Author

dovej commented Sep 26, 2017

I see. I don't have the time to chug through all this right now, not sure when I will. Does this help?:

I've extracted the relevant explanation of the algorithm from the python code, see info.txt. I took the conjugation csv and substituted in all the descriptors so they don't need to be looked up, see conjo.xlsx.

Even EDICT is missing a bunch of archaics and some things like v5uru, but it's a start. The spreadsheet contains plenty of conjugations I'm sure Aedict already gets right. The ones that need to be looked at more carefully are:
adj-ix
v1-s,
v5aru (I think this is good now)
v5k-s
v5r-i
v5u-s
vk
vs-s (maybe already okay?)
vs-i

info.txt
conjo.xlsx

@mvysny
Copy link
Owner

mvysny commented Sep 27, 2017

Thanks, that CSV file is pretty nifty. I'll revisit those specials one by one and I'll let you know.

@mvysny
Copy link
Owner

mvysny commented Sep 27, 2017

Can you please provide how exactly adj-ix should be inflected differently than adj-i? I don't see any difference in conjo.xlsx. Can you please provide an example?

Can you please tell me how かっこいい should be inflected?

@mvysny
Copy link
Owner

mvysny commented Sep 27, 2017

v1-s: Ichidan verb kureru special class: くれる; the difference to v1 is that form5 does not end with ro so it's くれ!

@dovej
Copy link
Author

dovej commented Sep 27, 2017

Cool!

Check over the algorithm notes at the bottom of info.txt. I'll walk through adj-ix non-past negative plain with かっこいい.:

  1. stem=1, okuri=くない, euphr=よ, euphk=
  2. Clearly this is kana (Honestly, in general, this step can kinda be skipped/assumed as kana. JMDict is trying to be slick by conjugating する as できる but 為る as 出来る. Nice, but not exactly a critical distinction. する is the only example that makes this distinction.)

3a) Skip because euphr is not null.
3b) Skip because euphk is null.
3c) stem+1=1+1=2, Remove 2 characters: かっこいい→かっこ, and append euphr: かっこ→かっこよ
3d) append okuri: かっこよ→かっこよくない

The only ones that currently have these extra steps are adj-ix, vk, vs-s, and vs-i. So basically いい, くる, and する, but processing by the POS code is necessary because each of these is effectively an entire class of verbs with multiple dictionary entries for compounds that conjugate like them.

@mvysny
Copy link
Owner

mvysny commented Sep 27, 2017

Thanks! v5k-s fixed. Now for this pesky adj-ix

@mvysny
Copy link
Owner

mvysny commented Sep 27, 2017

Fixed adj-ix

@mvysny
Copy link
Owner

mvysny commented Sep 27, 2017

Fixed v5r-i

@mvysny
Copy link
Owner

mvysny commented Sep 28, 2017

v5u-s done

@mvysny
Copy link
Owner

mvysny commented Sep 28, 2017

vk afaik only applies to "kuru". There already is a special inflection rules for kuru, please let me know if the inflection is not working properly. The same applies to vs-s suru.

@mvysny
Copy link
Owner

mvysny commented Sep 28, 2017

vs-i a special verb category ending with suru - special inflections for this group are already in place.

With this I believe that all inflections are in place. Fixed in Aedict 3.44; please reopen if any inflections are off.

@mvysny mvysny closed this as completed Sep 28, 2017
@dovej
Copy link
Author

dovej commented Sep 28, 2017

3.44 isn't on the play store yet, but just preempting...:

vk applies to more than just くる. An example is もってくる which is not correct in 3.43.

vs-s, this one is a doozy. As far as I can tell, the spreadsheet is wrong. I don't understand how WWWJDIC is conjugating this correctly if the code is wrong... Anyway, my understanding is that vs-i applies to する and derivates like 全うする. vs-s on the other hand applies to single-kanji suru verbs like 愛する. These have a special conjugation for certain stems, see:
http://www.guidetojapanese.org/forum/viewtopic.php?pid=17183#p17183
http://nihongo.monash.edu/cgi-bin/wwwjdic?1W%B0%A6%A4%B9%A4%EB_vs-s
http://nihongo.monash.edu/cgi-bin/wwwjdic?1W%C8%B3%A4%B9%A4%EB_vs-s
for examples.

I'm trying to find WWWJDIC's actual conjugation code, because clearly it can't be using just the sheet I posted. So far I've come up dry. I'll try emailing the developer.

@mvysny
Copy link
Owner

mvysny commented Sep 28, 2017

Thanks for letting me know. You're right with vk, this is going to be fixed in Aedict 3.45.

Regarding vs-s: ah, you're right, I'll fix it.

@mvysny mvysny reopened this Sep 28, 2017
@mvysny
Copy link
Owner

mvysny commented Sep 28, 2017

vs-s: Fixed in Aedict 3.45

@mvysny mvysny closed this as completed Sep 28, 2017
@dovej
Copy link
Author

dovej commented Sep 30, 2017

For future reference, I've attached the corrected spreadsheet. The JMDict developers have been notified and fixed the issue with vs-s.

conjo.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants