Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pt] multiwords.txt adding new words — 2024-03-24 #10433

Closed
marcoagpinto opened this issue Mar 24, 2024 · 3 comments
Closed

[pt] multiwords.txt adding new words — 2024-03-24 #10433

marcoagpinto opened this issue Mar 24, 2024 · 3 comments

Comments

@marcoagpinto
Copy link
Member

Heya, @p-goulart @susanaboatto @maphjo @danielnaber

May I add words to multiwords.txt before the release date around the end of the month?

There are several nouns and proper names that appear as typos or as separate words in my thesis.

Also, what is the difference in the postags of adding an _ (underscore) after it or not adding one, in that file?

Thanks!

@marcoagpinto
Copy link
Member Author

marcoagpinto commented Mar 24, 2024

Heya,

Here is the list of words I wanted to add.

The list was larger, but some words already had morphological information.

However, it is strange that some appear as typos or as separate words, for example:

Helmuth von Moltke
John von Neumann

With the latest nightly and LibreOffice, “von” appears as a typo, and so does “Helmuth”.

Notice that the words may be too technical, but I would love to hear your feedback:

NOUNS:
desvio padrão
nevoeiro de guerra
tactical military shooter
tactical military shooters


PROPER NAMES:
Biologia Evolucionária
Convenções de Genebra
Entropia de Carvalho Rodrigues
Entropia de Carvalho-Rodrigues
Exército Republicano Irlandês
Lei dos Grandes Números
Métrica da Soma dos Quadrados
Métrica do Espaço-Tempo
Pirâmide Cognitiva
Projeto Echelon
República Francesa
Revolução Francesa
Teatro de Operações
Teorema do Macaco Infinito
Teoria da Confirmação
Teoria do Caos
Teoria dos Sistemas Complexos
Tio Patinhas
União Soviética



ORGANIZATIONS
Crescente Vermelho
Máfia Italiana
Tríade Chinesa

@p-goulart
Copy link
Collaborator

I would add the proper names to spelling-global.txt, rather than the PT-specific multiwords.

As for everything else, there should be no issue with adding anything to multiwords.txt, feel free to make a PR!

@p-goulart
Copy link
Collaborator

Also, what is the difference in the postags of adding an _ (underscore) after it or not adding one, in that file?

This is there to prevent certain matches in the XML rules. Occasionally, especially with the multi-token prepositional phrases, it is important to make the distinction. (That is my understanding, at least, based on what I've seen... it was done before my time here started.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants