Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[en] Chunker improvement #7044

Closed
MikeUnwalla opened this issue Aug 24, 2022 · 7 comments
Closed

[en] Chunker improvement #7044

MikeUnwalla opened this issue Aug 24, 2022 · 7 comments

Comments

@MikeUnwalla
Copy link
Contributor

Is it possible change the behaviour of the chunker?

With a plural noun in a singular noun phrase, the chunker incorrectly gives the last word the chunk E-NP-plural. Some examples:

The aircraft maintenance manager is in the hangar.
Does your box converter operate correctly?
The happy sheep is grazing in the field.
The shiny new aircraft was on the runway.
I’d like a fish pie and a kilo of sprouts.

(Only a very small proportion of plural noun phrases end with a singular noun. Example, ‘sergeants major’: https://www.merriam-webster.com/dictionary/sergeants%20major).

@danielnaber
Copy link
Member

I have made a change locally. The change wasn't difficult, but at least 12 tests break now. These seem to be cases where the chunker is wrong and has always been, but now it causes an error. One example:

Adults average in length; the largest recorded specimen weighed 2.65 kg.

"Adults average" is considered a noun phrase, and now it's considered a singular noun phrase, breaking ADJECTIVE_IN_ATTRIBUTE. I won't have time to fix these cases. @MikeUnwalla, if you're interested in working on those cases, I could commit my changes to a branch. Of course, more issues might show up after those 12 tests are fixed...

@MikeUnwalla MikeUnwalla self-assigned this Aug 25, 2022
@MikeUnwalla
Copy link
Contributor Author

Hi Daniel,

Yes, I will work on these cases.

I will probably need help with setting up a branch.

Another type of chunker error is where there are different readings for upper-case text and lower-case text:

Sudden operation of the gyro can cause unwanted movement of the horizontal stabilizer.
SUDDEN OPERATION OF THE GYRO CAN CAUSE UNWANTED MOVEMENT OF THE HORIZONTAL STABILIZER.

image

danielnaber added a commit that referenced this issue Aug 25, 2022
@danielnaber
Copy link
Member

The work-in-progress branch (which tests that don't work) is now available here: https://github.com/languagetool-org/languagetool/tree/issue-7044-chunker

MikeUnwalla added a commit that referenced this issue Aug 29, 2022
@MikeUnwalla
Copy link
Contributor Author

@languagetool-org/developers, if @danielnaber is out of the office today, can one of you please look at #7050 ? Many thanks. I think that I have possibly not used GitHub correctly. I am not sure what to do.

danielnaber added a commit that referenced this issue Sep 2, 2022
danielnaber added a commit that referenced this issue Sep 2, 2022
@danielnaber
Copy link
Member

danielnaber commented Sep 11, 2022

PR has been merged now.

@MikeUnwalla
Copy link
Contributor Author

@danielnaber, thank you.

I know about more chunker errors. Do you want them? If yes, I will make a new issue.

@danielnaber
Copy link
Member

I know about more chunker errors. Do you want them? If yes, I will make a new issue.

We cannot really fix them, but maybe we can add antipatterns, so please send them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants