Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Obsolete / Previous Translations to Yecc Parser #30

Merged
merged 1 commit into from
Jul 19, 2022

Conversation

maennchen
Copy link
Member

No description provided.

@maennchen maennchen added the enhancement New feature or request label Apr 6, 2022
@maennchen maennchen self-assigned this Apr 6, 2022
@maennchen maennchen mentioned this pull request Apr 6, 2022
3 tasks
@coveralls
Copy link

coveralls commented Apr 6, 2022

Pull Request Test Coverage Report for Build 79807df4fb6c61ff6e1b043f47faf828cd08fb68-PR-30

Details

  • 15 of 15 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.07%) to 99.76%

Totals Coverage Status
Change from base Build 0be8c6394dfc90a0311d201c14c6b391b57bdc26-PR-25: 0.07%
Covered Lines: 416
Relevant Lines: 417

💛 - Coveralls

@maennchen
Copy link
Member Author

maennchen commented Apr 6, 2022

@whatyouhide
Follow up from elixir-gettext/gettext#215 (comment)

Tokenizer

I decided to handle it in the tokenizer because:

  1. Obsolete translations should still be detected as a translation. If it is just a comment, then they will be assigned to the next translation.
  2. Both obsolete and previous can have multiple lines and therefore need to be treated like their normal counterparts.

If I just did parsing manually after the yecc part, I would replicate a large part of what yecc does.

Yecc Issues

No matter how I try to adapt the yrl, I can't get it to actually read the tokens like I want to. I always get syntax error before.

One Attempt (only for obsolete)
Nonterminals grammar translations translation pluralizations pluralization
             strings comments maybe_msgctxt maybe_obsolete_msgctxt obsolete_strings
             obsolete_pluralizations obsolete_pluralization.
Terminals str msgid msgid_plural msgctxt msgstr plural_form comment obsolete.
Rootsymbol grammar.

grammar ->
  translations : '$1'.

% A series of translations. It can be just comments (which are discarded and can
% be empty anyways) or comments followed by a translation followed by other
% translations; in the latter case, comments are attached to the translation
% that follows them.
translations ->
  comments : [{comments, '$1'}].
translations ->
  comments translation translations : [add_comments_to_translation('$2', '$1')|'$3'].

% TODO: Parse previous / obsolete
translation ->
  maybe_msgctxt msgid strings msgstr strings : {translation, #{
    comments       => [],
    msgctxt        => '$1',
    msgid          => '$3',
    msgstr         => '$5',
    po_source_line => extract_line('$2')
  }}.
translation ->
  maybe_obsolete_msgctxt obsolete msgid obsolete_strings obsolete msgstr obsolete_strings : {translation, #{
    comments       => [],
    msgctxt        => '$1',
    msgid          => '$4',
    msgstr         => '$7',
    po_source_line => extract_line('$3')
  }}.
translation ->
  maybe_msgctxt msgid strings msgid_plural strings pluralizations : {plural_translation, #{
    comments       => [],
    msgctxt        => '$1',
    msgid          => '$3',
    msgid_plural   => '$5',
    msgstr         => plural_forms_map_from_list('$6'),
    po_source_line => extract_line('$2')
  }}.
translation ->
  maybe_obsolete_msgctxt obsolete msgid obsolete_strings obsolete msgid_plural obsolete_strings obsolete_pluralizations : {plural_translation, #{
    comments       => [],
    msgctxt        => '$1',
    msgid          => '$4',
    msgid_plural   => '$7',
    msgstr         => plural_forms_map_from_list('$8'),
    po_source_line => extract_line('$3')
  }}.

pluralizations ->
  pluralization : ['$1'].
pluralizations ->
  pluralization pluralizations : ['$1'|'$2'].

pluralization ->
  msgstr plural_form strings : {'$2', '$3'}.

obsolete_pluralizations ->
  obsolete_pluralization : ['$1'].
obsolete_pluralizations ->
  obsolete_pluralization obsolete_pluralizations : ['$1'|'$2'].

obsolete_pluralization ->
  obsolete msgstr plural_form obsolete_strings : {'$2', '$3'}.

strings ->
  str : [extract_simple_token('$1')].
strings ->
  str strings : [extract_simple_token('$1')|'$2'].

obsolete_strings ->
  str : [extract_simple_token('$1')].
obsolete_strings ->
  str obsolete strings : [extract_simple_token('$1')|'$2'].

comments ->
  '$empty' : [].
comments ->
  comment comments : [extract_simple_token('$1')|'$2'].

maybe_msgctxt ->
  '$empty' : nil.
maybe_msgctxt ->
  msgctxt strings : '$2'.

maybe_obsolete_msgctxt ->
  '$empty' : nil.
maybe_obsolete_msgctxt ->
  obsolete msgctxt strings : '$2'.

Erlang code.

extract_simple_token({_Token, _Line, Value}) ->
  Value.

extract_line({_Token, Line}) ->
  Line.

plural_forms_map_from_list(Pluralizations) ->
  Tuples = lists:map(fun extract_plural_form/1, Pluralizations),
  maps:from_list(Tuples).

extract_plural_form({{plural_form, _Line, PluralForm}, String}) ->
  {PluralForm, String}.

add_comments_to_translation({TranslationType, Translation}, Comments) ->
  {TranslationType, maps:put(comments, Comments, Translation)}.

Error with that attempt: {:error, {:parse_error, "syntax error before: msgid", 2}}

(The example here is one of a thousand. I tried rearanging things a lot of times with no better result.)

@maennchen
Copy link
Member Author

@whatyouhide I've some free time on my hands in the next few days. I would appreciate it if you had the time to have a closer look so that I could make some progress on this. (Not trying to stress you, I just wanted you to know that I could use the time more efficiently if you had some time to spare for me.)

I'll also start working on the gettext integration since there it shouldn't matter which parser is used.

@maennchen maennchen marked this pull request as ready for review July 19, 2022 13:31
@maennchen maennchen merged commit 1127ce3 into replace_parser_yecc Jul 19, 2022
@maennchen maennchen deleted the complete_features branch July 19, 2022 13:35
maennchen added a commit that referenced this pull request Jul 19, 2022
* Switch to Yecc Based Parser

* Add Obsolete / Previous Translations to Yecc Parser (#30)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants