-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(deinflect): Add logic to handle all inflections of the copula だ #988
Conversation
Add one more conditional for the type of "です"
Codecov Report
@@ Coverage Diff @@
## main #988 +/- ##
==========================================
+ Coverage 78.97% 79.12% +0.15%
==========================================
Files 7 7
Lines 3001 3004 +3
Branches 174 184 +10
==========================================
+ Hits 2370 2377 +7
+ Misses 628 622 -6
- Partials 3 5 +2
Continue to review full report at Codecov.
|
I think the change to make it conjugate as a verb might be too broad since it will cause it to match other verb endings as well: I wasn't thinking it through when I thought we could reuse the existing す verb logic... 🙇🏾 Since です is so irregular I think the complete fix would be to add a new type for To add an entry to deinflect.dat is faily straight forward. It starts with a set of reasons and then each entry has the form of: The first two are encoding the logic we want and the last is a 0 based index into the reasons at the top of the file. The type mask is opaque but it looks to be two values OR'd together (type of inflected word and type of deinflected word) as was clarified in this fork: https://gist.github.com/lynn/07f7ce3c314223d2aca19ec2bb0540cd#file-patch-deinflect-py-L7 I think if we add a new base mask for です then we can easily add a line to deinflect.dat and update your branch a bit to make things work specifically for this case! What do you think? |
I agree! I did think that the case of です and its conjugation is pretty irregular and need to be differentiated from any other verbs or adjectives. However, I am not sure on how to come up with a new typemask and would need some guidance on that. |
No problem! I have some ideas about that. For the type of the inflected word, I think 0x0080 (OTHER) makes sense in this case (though if we added forms like ではない it would be different). For BASE we'll need a new type in the upper half of the mask (imagine > node -p "0x80|0x2000"
8320 So I think a type mask of 8320 should work well for でした ー> です Let me know if you have any other questions. |
BTW, I was reading http://www.japaneseprofessor.com/reference/grammar/conjugations-of-the-japanese-copula/ which made me realize the honorific forms are also lost in rikaikun. I think we'll be able to add several new lines for various copula conjugations once the base logic is in. You could do it at the same time or just focus on でした depending on your time/want. |
Thanks! I will try working on it as much as I can submit another PR when I'm satisfied with my change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that works.
For now, I'll close this PR then and feel free to re-open when ready, or let me know if you have any more questions on the issue. |
Add entries to でした、じゃありません、じゃありませんでした
ではございません, ではございませんでした, でございましょう, でございまして
I think I am satisfied with my changes and ready to re-open. How do I re-open a pull request? |
I added entries and type(past negative) to deinflect.dat so that rikaikun now recognizes: I am not sure on what to do with ございます on its own as I am not sure if it is still considered a copula or not |
Thanks for the update! Looks like we're pretty close. I think maybe things became complicated because the dictionary contains many congugated forms of the copula which you tried to work with directly and thus created congugation mappings to intermediate inflections. It makes sense given that the dictionary contains things like じゃない even though that's the negative form of だ but I think we should maintain consistency and conjugate back to the base form in all cases. This means that we'll show two entries for some words but that's okay in my mind because sometimes these short common phrases have idiomatic meaning as well as the basic grammar meaning. I think not relying on the dictionary containing certain common inflections is also more robust since they might remove the inflected forms in a future update. In that case, it makes sense to add mappings back to だ since it's the polite affirmative form of the copula. We can start with the mappings that don't have any other entry in the dictionary or fill out the entire table. I'll leave a comment with what I mean inline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good with some small comments.
We should also add some tests for the data.ts changes. There aren't tests for that file yet but it should be fairly straightforward to do in the same manner of other files like background.ts.
That being said, it can be tricky writing the first tests for a class, so let me know if you want me to add a commit with some tests this time around.
Co-authored-by: Erek Speed <melink14@gmail.com>
Co-authored-by: Erek Speed <melink14@gmail.com>
fix entry for ではない and じゃない
- Add chai assertion helpers for loose matching of objects inside of arrays. - Add positive and negative cases for new/changed branches.
I think it should be basically done now though it looks like I need to increase the timeout for the new data tests (loading the data is slower in github than on my local computer...) |
Gradually increase if tests are flaky in the cloud.
@all-contributors Add @maawisul for code |
I've put up a pull request to add @maawisul! 🎉 |
Add @maawisul as a contributor for code. This was requested by melink14 [in this comment](#988 (comment))
🎉 This PR is included in version 2.4.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Fixes #89