Fixed false positive for "食べられる" (potential verb) by JapaneseBrokenExpression #880
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
JapaneseBrokenExpression would cause false positives for potential verb.
It would assert error for "食べられる", "見られる", "寝られる", and so on.
What I did:
About special case
The tokenizer will parse ”見れる" as one token.
It might depends on dicts of Kuromoji or so.
This issue looks similar takuyaa/kuromoji.js#28
I think we need to add other special cases (if there are).
About baseForm of tokens
It's better to use BaseForm in this logic
Best way.
But to get baseForm, we need to change TokenElement and NoelogdJapaneseTokenizer.
It looks other Validators won't use BaseForm of each tokens, and it's only necessary with Japanese.
So in this PullRequest, I avoided to change them.
As ScreenShot
Before
After
Note
I'm not good at Java, so feel free to change my code and syntax as you like.