Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[en] Incorrect tagging of proper nouns in disambigation.xml (NNP_DER_SPIEGEL and others) #7224

Open
MikeUnwalla opened this issue Oct 8, 2022 · 0 comments

Comments

@MikeUnwalla
Copy link
Contributor

I think that DER spiegel is a German newspaper.

image

Java rule EN_SPECIFIC_CASE finds the typo, but 'DER spiegel' is not a proper noun, and should not get the postag NNP + NNP.

Who or what is the Seven NATion Army?

NNP_SEVEN_NATION_ARMY gives NATion a postag.

THE CITY OF THOUSAND OAKS IS IN CALIFORNIA.

NNP_THOUSAND_OAKS does not give NNP + NNP to 'THOUSAND OAKS'.

Suggestion: move most multiword proper nouns from disambigation.xml to multiwords.txt. The logic of multiwords.txt makes sure that incorrect typography will not get a postag and that upper-case text will get a postag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant