-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate PO and Gettext into separate packages #215
Comments
I actually think Gettext could use a new .po parser |
Agreed on a new parser but that is just part of the work. We can probably re-use everything else. I think Andrea considered a I am also completely happy to go in the direction proposed here. My suggestion is to start adding the new API directly to this repository. We will introduce new modules, move the API over, and once we are happy with the results, we extract it to a separate project. By then, I think we should also create an elixir-gettext org but no need to worry about it for now. What do you think? |
I believe I have a functional .po parser based on nimble_parsec already. I
have to look for it.
…On Fri, 6 Sep 2019 at 17:14, José Valim ***@***.***> wrote:
Agreed on a new parser but that is just part of the work. We can probably
re-use everything else. I think Andrea considered a nimble_parsec-based
PO parser but I am not sure of the status. Contributions are definitely
welcome!
I am also completely happy to go in the direction proposed here. My
suggestion is to start adding the new API directly to this repository. We
will introduce new modules, move the API over, and once we are happy with
the results, we extract it to a separate project. By then, I think we
should also create an elixir-gettext org but no need to worry about it for
now.
What do you think?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#215>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAULSSUVNW2GLTI6YHIHQSTQIJ6XHANCNFSM4IUJDZFA>
.
|
Thanks for the feedback, I'll start work on the functional separation first (decoupling PO and Gettext). Then aim to integrate a new PO parser. And finally package separation, possible aligned with a new As José says, the parsing part is only a bit of what |
As promised, the gettext parser: https://github.com/tmbb/mezzofanti/blob/master/lib/mezzofanti/gettext/gettext_parser.ex Tests here: https://github.com/tmbb/mezzofanti/blob/master/test/mezzofanti/gettext/gettext_parser_test.exs (the parser parses the file into a list of EDIT: It turns out I had already mentioned the parser in this thread: #89 (comment) |
Fantastic! @tmbb would it be an option to swap in the new parser in place of the current parser but keeping the same return API? Then we can migrate it in steps and make sure nothing is lost on the way. Then the next step would be to replace the structs from Gettext.Translation to PO.Translation and similar. Btw, what should we call the project? Is |
Sure, it's just a question of changing this function: https://github.com/tmbb/mezzofanti/blob/master/lib/mezzofanti/gettext/gettext_parser.ex#L28-L31
Expo? |
OTOH, if the |
@kipcole9 How would you deal with the fact that ICU makes directives such as msgid_plural "%d pages read."
msgstr[0] "Eine Seite gelesen wurde."
msgstr[1] "%d Seiten gelesen wurden." useless? Should my |
Parser in its own repo: https://github.com/tmbb/ex_po |
I like the name Expo! One word though. :D
Regarding the structs, we can already change the structs in Gettext. I just
think doing the changes inside gettext first is the easiest and simplest
way to avoid breaking changes.
--
*José Valimwww.plataformatec.com.br
<http://www.plataformatec.com.br/>Founder and Director of R&D*
|
In ICU messages, you encode the singular and the plural in te same string, as well as gammatical gender and other things. For example look at the example here: http://userguide.icu-project.org/formatparse/messages |
I've created some translation structs inside my |
Sounds good to me! Can you please send a PR for us to see how it would look like? Btw, what do you think about naming it |
My thinking has been to separate the storage/serialization layer from the translation data layer from the message formatting and translation layer. Based on my reading of this thread that would be seem to be acceptable? It would look like: Translation data layerThe translation data layer is the structure of translations in an Elixir data structure. In the current Data storage/serialization layerThe layer is reponsible for taking the elixir structs, ie Message formatting and translationThis is the part of the stack that presented a translation API to the developer, does actual translation, message formatting and so on. It also informs, typically at compile time, the translation data layer about new messages that need to be updated, stored or deleted. Impact on current codeGettext would preserve its API to the developer and by default it would use the PO storage layer ( Other libs, as I plan for ConfigurationThe implication here is that for a given Gettext user there is no change required. It should be possible to configure an alternative storage engine in the future. Mix tasksToday the Next steps
Have I captured this correctly? |
I think this is a good plan but I would like to see it done in steps. I
would prefer to see we move to Expo before we start changing storage and so
on. Otherwise we will be facing a thousand lines of code PR which would
slow everything down. :)
--
*José Valimwww.plataformatec.com.br
<http://www.plataformatec.com.br/>Founder and Director of R&D*
|
Sounds great!
--
*José Valimwww.plataformatec.com.br
<http://www.plataformatec.com.br/>Founder and Director of R&D*
|
Some topics could be moved to the Expo repo.
…On Sun, 8 Sep 2019 at 11:31, Kip Cole ***@***.***> wrote:
Make sense José. Should the Expo conversation move to a separate issue to
keep that focus? This would help discussion of potential resolutions to
#210 <#210> and #89
<#89> as well.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#215>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAULSST62BVC45MJ6UVZSBTQITH77ANCNFSM4IUJDZFA>
.
|
I have created an elixir-gettext organization with gettext in there. You have been invited to it. I recommend changing |
Btw, it seems GitHub did not send emails for the invitation but you can accept it here: https://github.com/orgs/elixir-gettext |
I've found my muse and followed through on the There's a few known issues, but I think it's almost ready to be used. (Any feedback for the library or help with the remaining 3 issues would be very welcome.) @josevalim If you're interested, I would like to move this repo into the gettext orga. |
@maennchen and others: it's been 3 years and I lost all context. What's the benefit of Expo over the current parser? Asking from a place of not remembering anything 😄 |
@whatyouhide Sure, multiple reasons:
This is by far not the only way on how to approach things. I would also provide PRs to reintegrate the improvements into gettext. (allthough I would prefer separate packages) |
I have an arm injury, which makes my feedback quite limited, but 👍 from me. The code looks great and I would love to see a migration towards expo. Any remaining question from your side @whatyouhide? |
Oh, I wish you a speedy recovery then @josevalim! If everyone is in agreement, It would be great if I could get the permissions so that I can migrate the repository over to this orga. Either of these will work:
|
@maennchen I won't have time to do this this week, but I'll go with option 1. early next week. In the meantime, before we integrate Expo into Gettext, I'd really love to see some benchmarks on the parsing of nimble_parsec vs yecc. Do we also have examples of error messages and if those get better, worse, or are equivalent? |
I've invited you as an admin to the repository. Feel free to move it :) I've created some benchmarks: elixir-gettext/expo#21 The error messages seem fine, but not awesome to me so far: https://github.com/jshmrtn/expo/blob/main/test/expo/parser/po_test.exs |
Okay, if I’m interpreting the benchmarks right, it seems like the Expo implementation is about 2x slower. The error messages seem fine, right, but not an improvement over the current ones at a first glance. Which brings me to the point: let's keep the yecc-based parser currently in Gettext and port it exactly to Expo before moving Gettext to use Expo, agreed? I think the first step in extracting a dependency from a library as widely used and field-tested as Gettext is to extract the code as is so as to not introduce the chance of anything else changes. Then, improvements can be made iteratively on the extracted library. |
@whatyouhide Ok, I'm on it. |
@whatyouhide I've started to look at the yecc based code a bit more closely. Based on that I do not come to the same conclusion as you: PerformanceI've spent a few minutes disabling features that are not in Since doing that, the comparison of performance is a bit more level between the two: elixir-gettext/expo#21 (comment) While the yecc based parser is still slightly faster, it is by far not twice as fast. I don't think that performance should be the primary deciding factor since parsing normally happens at compile-time and the difference with some improvements is negligible. If performance is really that important, then the new Considering that, I propose to focus on two other factors to decide which implementation should be used: Maintainability and Developer Experience (User Side) Developer Experience (Library User)The only thing improving developer the experience for users of this library I can see is error messages. The error messages (which can still be improved further with some tweaking) are better IMHO. One Example: parsed string: msgid
msgstr "foo" Output yecc parser: Output nimble based parser: I personally prefer the nimble based approach since it gives more detail:
Maintainability
Therefore, I propose to work with the nimble-based approach. |
Two more factors: How big are the parser modules in disk after compiled? Is nimble_parsec faster if we use the |
Here you can see all the changes I did for the performance tests: https://github.com/elixir-gettext/expo/compare/performance_comparisor The Compiled sizes: $ du -h _build/prod/lib/expo/ebin/*.beam
# ...
9.0K _build/prod/lib/expo/ebin/Elixir.Expo.Parser.Mo.beam
185K _build/prod/lib/expo/ebin/Elixir.Expo.Parser.Po.beam
# ... $ du -h _build/prod/lib/gettext/ebin/*.beam
13K _build/prod/lib/gettext/ebin/Elixir.Gettext.PO.beam
9.0K _build/prod/lib/gettext/ebin/Elixir.Gettext.PO.Parser.beam
#...
9.0K _build/prod/lib/gettext/ebin/Elixir.Gettext.PO.Tokenizer.beam
#...
21K _build/prod/lib/gettext/ebin/gettext_po_parser.beam
#... |
@maennchen the thing I am most worried about is this: Gettext's yecc parser has worked reliably in production for 6-7 years now and is thoroughly field tested. None of the reasons mentioned above are enough for me to ignore that honestly. I’m happy to eventually move to a new parser, but I would like to do that gradually (start with just MOs?). The focus of this issue (and this effort IMO) is to have a separate package for PO/MO handling so that users that don't need the whole of Gettext can depend on that. Which parser to use is not really part of that conversation, right? Which is why I would argue to just rip code straight out of Gettext as much as possible and solve this issue first. |
@whatyouhide If we're using Yecc, I will need your help finishing this up. I can't get the full feature set to work: elixir-gettext/expo#25 |
@maennchen is Expo adding any features that don't exist today in Gettext? |
@whatyouhide Yes, the library's goal is to support the complete PO format. That includes the following things: Previous message id / plural
Obsolete translations
|
(I already did the data structures, tests and tokenizer, just can’t get yecc to comply. With the nimble based approach, that was already implemented.) |
@maennchen I’m having trouble following what are the changes, because the PR you linked is adding yecc to Expo which means the diff is useless 😄 Any chance you could backport the changes for previous msgid and obsolete translations to Gettext first so that we know what to look for and know how to help out? |
Btw, I realize that this experience can be frustrating and time consuming. However, as library authors, we have the responsibility to avoid breaking people's code by making rushed changes to library as widely used as Gettext. In libraries like these, we need to be careful about making as few changes at a time as possible, if that makes sense. I hope it's not causing too much trouble 😌 And to be clear: this work is welcome, necessary, and fantastic!! Thanks for all the collaboration and help 💟 |
@whatyouhide I split the PR in two:
Is that good enough or do I need to propose it to gettext directly? I would prefer to tackle #210 only after we integrated expo and not before. |
@maennchen no doing it in Expo is alright don't worry. What's the issue that you're seeing? Also, by looking at the tokenizer changes it looks like you're tokenizing these "special comments" differently. I wonder if that wouldn't be better done after tokenizing + parsing, in |
@whatyouhide Follow up here so that we don't spam everyone: elixir-gettext/expo#30 (comment) |
@josevalim @maennchen is there anything left to do to close this? |
I am working on an ICU Message formatter and localisation processor as the (maybe) last part of the ex_cldr. One requirement is of course to store and manage translations and PO files are clearly the defacto globally standard.
I would like to propose splitting Gettext into as PO package that looks after storage, parsing, merging and extracting. That way different localisation packages can leverage the same infrastructure - which is supported by the
format
tag in a PO file. As you know, today Gettext saves all messages with theformat-elixir
tag.What would need to change:
API and usage for current Gettext users would need to remain identical for backwards compatibility
The extractor would need to be told hold to tag messages so they aren’t all tagged as
format-elixir
The backend interface would be normalised. Instead of
BackendModule.__gettext__(:priv)
for example, it would be better to haveBackendModule.__po__(:priv)
in order to be more specific and not tied to the Gettext API alone4 Compiling PO files would compile messages based upon the message format. Perhaps a behaviour would be introduced so that any message format could be compiled. Instead of
use Gettext, ....
it would becomeuse PO, icu_format: MyMessageFormatter, elxiir_format: Gettext
or some such. Thenuse Gettext, ...
would also includeuse PO, ....
5 Lastly, this may open up the opportunity to have different storage models, like DB, or
:persistent_term
or even a web service that can replace PO files while still retaining the well known format.I’m fine to work on this - its better than creating another PO file parser/merger/extracted as @tmbb as already had to do,
Thoughts?
The text was updated successfully, but these errors were encountered: