Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pt] Add words to spellcheck #8705

Merged
merged 5 commits into from
May 26, 2023

Conversation

p-goulart
Copy link
Collaborator

@p-goulart p-goulart commented May 25, 2023

All of these occurred 15+ times in w20.

  • I've skipped Profa and Drª, as I think we need to work out a better solution here using the abbreviation rules we already have (languagetooler-gmbh/languagetool-premium#5732)
  • I've noticed hahaha, and I'm certain we can add some kind of smart-ish 'laughing' rule that ignores stuff like (h[ea])+h? – just a thought... it's annoying having to manually ignore laughter... could be a style thing (languagetooler-gmbh/languagetool-premium#5732);
  • the 'adequa' -> 'adéqua' thing is straight up prescriptivist, but not in a cute way; IMO this should be a style choice, with adequa available in all non-ultra-formal registers? (languagetooler-gmbh/languagetool-premium#5734)
  • okay is weird, but I'm not sure where we stand on that type of thing; do we ask to correct to OK? I personally write 'okay' in pt-BR but I might just be abroad too long (not a huge issue, let's assume it's a foreign word for now);
  • what in the universe is the PoS of initialisms like LGBT? they're used almost exclusively adjectivally, but they're not quite adjectives... they're not exactly adjectival phrases, either.

 - all of these occurred 15+ times in w20;
 - there are many questions.
@susanaboatto
Copy link
Collaborator

susanaboatto commented May 25, 2023

  1. Prof, Dr: Yes, we can add the abbreviation inconsistencies to our list of issues in the feedback board;
  2. hahaha: That would probably be necessary if we were to add hahaha to the added.txt file. As of now, I believe only ha is in the dictionary. Again, we can add this to the feedback board as a to-do task.
  3. adequa/adéqua: add an issue to the feedback board, so we can tackle this when creating style rules. But it's okay to add adéqua to the tagset, as it is not incorrect.
  4. okay: personally, I think it sounds like an estrangeirismo, but we can add it to the dictionary and make a formality/shorten_it/foreign_words rule in the style file if we notice it's causing problems.
  5. LGBT: It functions as an adjective, so we tag it as a common adjective → AQ0CS0 // AQ0CP0 (LGBTs), or else unwanted rules will be triggered. But we should also tag those as nouns. We can do the same for other variants (LGBTQ, LGBTQIA+, etc.).

Copy link
Collaborator

@susanaboatto susanaboatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

@p-goulart
Copy link
Collaborator Author

also re:adéqua, the form that's marked as incorrect is actually adequa, which is super 🤯 to me; adéqua is currently the only correct form (and the one suggested to users)

PINs PIN NCMP000
switch switch NCMS000
Switch switch NCMS000
SWITCH switch NCMS000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This triple version of words (lower case, first upper case, all upper case) shouldn't be necessary. It will probably create problems in the synthesis of suggestions.
Is anything not working as expected if these three versions are missing? The speller? Then add the three words to spelling.txt, but not to added.txt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, they're added to the hunspell files but IMO they shouldn't even be required there, which is already something I brought up with @susanaboatto when she was describing to me what was to be done here. I'll describe it in an issue. If this variation causes issues in added.txt, then removed they shall be.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this @jaumeortola!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would eliminate the need to add capitalized and all upper case words to speller.txt. https://github.com/languagetooler-gmbh/languagetool-premium/issues/4703

@susanaboatto susanaboatto merged commit 697bb08 into languagetool-org:master May 26, 2023
2 checks passed
@p-goulart p-goulart deleted the pt/dict/w20 branch June 1, 2023 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants