Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plurals of words -ness #16

Open
jaumeortola opened this issue May 20, 2024 · 4 comments
Open

Plurals of words -ness #16

jaumeortola opened this issue May 20, 2024 · 4 comments
Assignees

Comments

@jaumeortola
Copy link
Member

The dictionary contains ~3500 words like this:

acuteness=acuteness/NN:U,acutenesses/NNS=all

I guess the plural form -nessess is very infrequent. Is it okay to have the plurals in the dictionary? For all -ness words?
Is this the desired tagging?

@AzadehSafakish
Copy link
Collaborator

I guess the plural form -nessess is very infrequent. Is it okay to have the plurals in the dictionary? For all -ness words?

If they're tagged with NN:U, I would say no.
But, there might be some that actually are countable (like 'harness/harnesses'). I wouldn't remove the plural forms of countable nouns.

Do you have a list I can quickly review?

@jaumeortola
Copy link
Member Author

There are the lists. One from "clean" (tagged and in all spelling dicts) and one from "pending".
ness-pending.txt
ness-clean.txt

@jaumeortola
Copy link
Member Author

Looking into a dictionary of frequencies, these are the only plurals in the first 400,000 word forms:

businesses
witnesses
illnesses
weaknesses
eyewitnesses
harnesses
likenesses
thicknesses
sicknesses
agribusinesses
lionesses
governesses
canonesses
kindnesses
wildernesses
highnesses
deaconesses
consciousnesses
jeunesses (French?)
eye-witnesses
finesses
sadnesses

@AzadehSafakish
Copy link
Collaborator

Looking into a dictionary of frequencies, these are the only plurals in the first 400,000 word forms:

Makes sense. Almost all of them are countable too.
I took a quick look at the previous lists, and I think most of those words can be discarded.
Keeping only the words posted above is probably a safe bet, but it would be good to ensure that any derivatives of those words are included as well: businesses/agribusinesses/ecobusinesses, consciousnesses/pseudo-consciousnesses, etc. (those might not be real words, just made up for sake of example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants