Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portuguese - pre+ post+ reform #96

Closed
marcoagpinto opened this issue Apr 3, 2014 · 33 comments
Closed

Portuguese - pre+ post+ reform #96

marcoagpinto opened this issue Apr 3, 2014 · 33 comments

Comments

@marcoagpinto
Copy link
Member

With the new feature added by Daniel that allows to select subtypes of languages under a major language:
"English " > "US" "AU" "NZ" "GB" "CA" "ZA".

This made me think about adding support for Portuguese pre-reform and post-reform.

I guess I can start working on the Portuguese, pre-agreement and post-agreement.

It should appear in the combo box as:
Portuguese -> PT-PRE
Portuguese -> PT-POS

I was wondering if in the next version we could do something about this.

For example, these are the needed files:

  1. grammar.xml (works for both)
  2. compounds.txt (pre-reform)
  3. compounds.txt (post-reform)
  4. dictionary (pre-reform)
  5. dictionary (post-reform)

As you all know, 2) is pre-reform, so far with around 3500 words. The problem is that with post-reform the compound rules have changed, which means I need to create another file with compound words. I will try to purchase a Portuguese Dictionary post-reform soon but the supermarket will probably only have it on stock when school time begins (within months).

  1. and 5) can be found in Minho University site:
    4: http://natura.di.uminho.pt/download/sources/Dictionaries/openoffice/Pre-AO/
    5: http://natura.di.uminho.pt/download/sources/Dictionaries/openoffice/Pos-AO/

I am sharing the dictionary files taken from Minho University, on my Dropbox:
PT-PRE: https://dl.dropboxusercontent.com/u/30674540/oo4x-pt-PT-preao-14.4.1.1.oxt.zip
PT-POS: https://dl.dropboxusercontent.com/u/30674540/oo4x-pt-PT-posao-14.1.1.1.oxt.zip
They are both dated from yesterday.

Could someone also create the .txt for the compound words post-agreement?

I looked in the supermarket and they do have a post-agreement dictionary, but it is "2013" and I can wait a couple of months or so for the "2014" to be released... meanwhile I can use the Priberam site to get some compound words post-agreement. Microsoft Office 2010 uses Priberam.

The grammar.xml works for both, so no need to create another file.

@marcoagpinto
Copy link
Member Author

Microsoft Word 2010:

pre-post-reform-20140315

@danielnaber
Copy link
Member

Should Brazilian Portuguese use the existing PortugueseCompoundRule (i.e. pre-reform)?

@marcoagpinto
Copy link
Member Author

I am not sure, Daniel.

Could you ask to the Brazilian person who is translating in Transifex?

Thanks!

Kind regards,
>Marco A.G.Pinto
-----------------------

On 21/04/2014 10:47, Daniel Naber wrote:

Should Brazilian Portuguese use the existing PortugueseCompoundRule
(i.e. pre-reform)?


Reply to this email directly or view it on GitHub
#96 (comment).

@danielnaber
Copy link
Member

Work has started on this. compounds.txt is now called pre-reform-compounds.txt and post-reform-compounds.tx is also available (but still empty).

@danielnaber
Copy link
Member

Marco, could you provide a short sentence for testing, i.e. a sentence that has an error when pre-reform is set but not with post-reform (or vice versa)?

@marcoagpinto
Copy link
Member Author

Daniel,

I am not sure if the messages I send to GitHub arrive there since I
don't receive a copy.

Here is an example:
pre-reform:
"Actualmente sou director do curso de arquitectura"

post-reform:
"Atualmente sou diretor do curso de arquitetura"

Does this help?

Kind regards,
>Marco A.G.Pinto
-----------------------

On 21/04/2014 13:11, Daniel Naber wrote:

Marco, could you provide a short sentence for testing, i.e. a sentence
that has an error when pre-reform is set but not with post-reform (or
vice versa)?


Reply to this email directly or view it on GitHub
#96 (comment).

@danielnaber
Copy link
Member

Thanks. Yes, all your messages arrive.

@danielnaber
Copy link
Member

There's a problem: in LibreOffice 4.1 and 4.1, I can only see Portuguese for Angola, Portugal, and Brazil. We'll get quite a confusion if LT supports different variants (pre/post) but you cannot set them in LibreOffice. Are there any plans for LibreOffice to support both variants?

@marcoagpinto
Copy link
Member Author

Well, the trick is to change in the LT settings in AOO/LO manually.

AOO/LO only accepts one pt_PT dictionary at a time... in other words, it
means that if I want to write in pre I need to have only the pre
dictionary installed... if I want the post I need to have only that one
installed.

AOO/LO doesn't know if we are using pre or post, it just uses the pt_PT
dictionary installed.

Years ago I wrote to AOO's mailing list suggesting a M$ Office approach
(like in the screenshot I posted in LT)... I wanted to repost but people
are so busy with the AOO 4.1 release that won't pay any attention to
it... I even sent a private e-mail to three guys from Apache regarding
an update of the English dictionaries (I am the maintainer) but they
didn't reply... this means they must be really busy.

I hope this helps!

Kind regards,

On 21/04/2014 13:48, Daniel Naber wrote:

There's a problem: in LibreOffice 4.1 and 4.1, I can only see
Portuguese for Angola, Portugal, and Brazil. We'll get quite a
confusion if LT supports different variants (pre/post) but you cannot
set them in LibreOffice. Are there any plans for LibreOffice to
support both variants?


Reply to this email directly or view it on GitHub
#96 (comment).

@danielnaber
Copy link
Member

"AOO/LO doesn't know if we are using pre or post" - and thus LT won't know which compound rule to use. You might want to contact the LO people, maybe they are more responsive.

@marcoagpinto
Copy link
Member Author

Daniel,

Why can't we have in Tools > Language Tool > configuration:
combo box: "Your mother tongue:" a "pre" and a "post"?

:(

On 21/04/2014 14:40, Daniel Naber wrote:

"AOO/LO doesn't know if we are using pre or post" - and thus LT won't
know which compound rule to use. You might want to contact the LO
people, maybe they are more responsive.


Reply to this email directly or view it on GitHub
#96 (comment).

@danielnaber
Copy link
Member

Because that's the wrong place... that field is only used for the false friend rule and we'd make everything very complicated if we also use it for something else.

@marcoagpinto
Copy link
Member Author

:((((((

I will try to e-mail AOO and LO after AOO 4.1 is released.

How do I subscribe to the LO mailing list?

PS-> Having two compound files is good. I will start working on the post
one soon, even if not used. Then, after we have a way of using both
pre+post, it will come handy.

Thanks!

On 21/04/2014 15:09, Daniel Naber wrote:

Because that's the wrong place... that field is only used for the
false friend rule and we'd make everything very complicated if we also
use it for something else.


Reply to this email directly or view it on GitHub
#96 (comment).

danielnaber added a commit that referenced this issue Apr 21, 2014
@rffontenelle
Copy link

Hey, Brazilian Portuguese translator here!

Daniel, pt_BR faced some changes with this reform, just like pt-PT. So, I believe the same pre-/post-reform situation applies to pt_BR. (e.g. MS Office 2010 has same Portuguese mode for pt_BR)

@danielnaber
Copy link
Member

Closing this report, as it seems there's nothing we can do until the situation in LO/OO changes. Feel free to re-open when that changes.

@TiagoSantos81
Copy link
Contributor

LO ship now with post AO dictionaries by default. Some issues reported by users come from the disagreement between spellchecking and compund word suggestions made by LanguageTools. Changing defaults solves this issue.

@marcoagpinto
Copy link
Member Author

A couple of years ago or so, I mentioned the idea of having a "Language Specific Settings".

For pt_PT people would be able to change between the pre and post reform compounds since it is no big deal as one just needs to toggle between two compounds files.

I also mentioned the need of allowing to suggest to replace words to Italic for foreign and Latin words. Easy too, since one could have a .txt with the words.

And the last thing would be an option to convert from pre to post and vice versa. Also a .txt file with pre + TAB + post.

I don't believe all this is hard to do and maybe someone who knows how to code in Java could implement it.

@marcoagpinto marcoagpinto reopened this Nov 1, 2016
@TiagoSantos81
Copy link
Contributor

TiagoSantos81 commented Nov 1, 2016

I have read this post. That idea is great but it is difficult to implement.
In LibreOffice the bundled dictionary are post agreement so the default behaviour should be with the same default. Most users do not change settings nor search for more apropriate options.

I don't believe all this is hard to do and maybe someone who knows how to code in Java could implement it.

Somebody would have already done it, if it was that easy.

I do not know how to do it (maybe I will add learning Java to November TODO list /sarc/) but what I have done in my local build as I have mentioned before is just change the default pointers like this:

-import org.languagetool.rules.pt.PreReformPortugueseCompoundRule;
+import org.languagetool.rules.pt.PostReformPortugueseCompoundRule;

and this:

-            new PreReformPortugueseCompoundRule(messages),
+            new PostReformPortugueseCompoundRule(messages),

Even a empty compound.txt is more useful than false positives all over the place. But we will not lose the benefits of this feature since I pushed today a populated hyphenator. See:

f201908

If you agree with it, I will push tomorrow the file changes needed to change the preset and later you and me will try to implement your toggle idea. Is this a reasonable compromise?

@marcoagpinto
Copy link
Member Author

Yes, it is a good idea :)

@TiagoSantos81
Copy link
Contributor

TiagoSantos81 commented Nov 1, 2016

Great! Them tomorrow I will run the compile tests again and push the changes.
Breaking the build once a day is more than enough... /sarc

@TiagoSantos81
Copy link
Contributor

TiagoSantos81 commented Nov 2, 2016

@danielnaber
Daniel, sorry for bringing you in to this topic again.
Before pushing the change today, I would like to confirm with you if you agree with changing the default to PosReform as I have mentioned earlier. I test everything in LibreOffice and I have not noticed side-effects so far.
Since the release date is also far, this is also a great time to test if it works well. Worst case, we revert back to former default behaviour.

On the future solution, I re-read the discussion and I understand that a interface toggle just for Portuguese is off-limits.
Future solutions may revolve around adding Angola and Mozambique as new variants that use PreReformPortuguese. In the list the names could change to:

Portuguese - Angola (Pre-AO)
Portuguese - Brasil (Post-AO)
Portuguese - Moçambique (Pre-AO)
Portuguese - Portugal (Post-AO)

So far free dictionaries to those variants are also the ones from Universidade do Minho. This would appease all portuguese still wanting the PreReform spell-checking from LanguageTools.

@danielnaber
Copy link
Member

I'm fine with this change. Did you run all tests with mvn clean test? testrules.sh will only test the rules (i.e. the XML), but for code changes, the whole test suite should run.

@TiagoSantos81
Copy link
Contributor

Thank you for the prompt reply, Daniel.
I have made only: ./build.sh languagetool-standalone package -DskipTests and it went fine.
I will do now with mvn clean test and build LO extension again.

TiagoSantos81 added a commit that referenced this issue Nov 9, 2016
* compound rules moved to relevant locales (AO, MZ, PT)
* PT has all compound rules active by default
* closes issue #96
@TiagoSantos81
Copy link
Contributor

Future solutions may revolve around adding Angola and Mozambique as new variants that use PreReformPortuguese. In the list the names could change to:

Portuguese - Angola (Pre-AO)
Portuguese - Brasil (Post-AO)
Portuguese - Moçambique (Pre-AO)
Portuguese - Portugal (Post-AO)

Implemented in:
7c6b766
16d6652
1cca40f
7e9bd9c

@marcoagpinto
Please, verify if it is working as announced and close this issue when you see fit.

@marcoagpinto
Copy link
Member Author

Tiago, I will check it tonight, after the nightly is released.

@TiagoSantos81
Copy link
Contributor

TiagoSantos81 commented Nov 9, 2016

There is no hurry.
Also:
701a794

@marcoagpinto
Copy link
Member Author

@TiagoSantos81
I was just testing the stand-alone tool.

Great work you have done!

I was wondering if you could also add the name of the country in the other "Portuguese" variants?

This is more noticed in the "mother tongue" where there are no flags and one just sees "Portuguese" several times.

Also, is it possible to add a Portuguese pre and post agreement (two identical PT flags with the agreement type between brackets, just like it appears in Thunderbird)?

Thanks!

@TiagoSantos81
Copy link
Contributor

I was wondering if you could also add the name of the country in the other "Portuguese" variants?

You can already test these changes compiling from source.
MessageBundle_pt_PT.properties
and
701a794
Later today I will check if transifex already has these strings to translate.

Also, is it possible to add a Portuguese pre and post agreement (two identical PT flags with the agreement type between brackets, just like it appears in Thunderbird)?

The dictionaries are the same for all pre-AO countries.
There is no country code for pre-AO pt-PT, so this could create bugs related to language recognition in LibreOffice. Until we know of a better solution, this is not feasible.

TiagoSantos81 referenced this issue Nov 11, 2016
* not as elegant as the suggestion on c55232c but still a major
legebility improvement
	- may be reworked later.
* TODO reused to identify scientific names as general exceptions do
spellchecking
* TODO new rule to advice italicizing scientific species names
@TiagoSantos81
Copy link
Contributor

I verified that the translated messages are updated and correct on Transifex. In the next sync, they are fixed.

@danielnaber
Last sync did not update the pt-AO and pt-MZ strings, they appear commented out. This is very minor, but when possible, verify this situation.

@danielnaber
Copy link
Member

@TiagoSantos81 the updates work like this:

  • git (English messages) -> Transifex: automatically, once a day
  • Transifix (translated messages) -> git: manually

The proper solution would be to switch to a better i18n software like https://weblate.org, but that would also take time. Anyway, I've triggered the manual update now.

@TiagoSantos81
Copy link
Contributor

@danielnaber
Despite the update uncommenting the PT-AO and PT-MZ, the string for these variants still appear only as "Portuguese" in my build. I purged the build with mvn clean test, but I keep getting the same results.
This regression appeared after the first Transifex update.
All strings added in MessagesBundle_pt.properties are indentical to 7c6b766. On that commit, changes appear properly in pt-PT language list.
Something else needs to be updated?

@danielnaber
Copy link
Member

danielnaber commented Nov 19, 2016

There are PT-AO, PT-CV, and PT-MZ - they all need to have lowercase pt (e.g. pt-AO).

@TiagoSantos81
Copy link
Contributor

After looking at that too many times, I was unable to spot it. Many thanks! I pushed the fix moments ago.

f-knorr pushed a commit that referenced this issue Nov 22, 2016
* compound rules moved to relevant locales (AO, MZ, PT)
* PT has all compound rules active by default
* closes issue #96
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants