Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more restrictive parsing of PluralForm and SelectOrdinalForm #96

Conversation

vogelsgesang
Copy link
Contributor

As mentioned in #95 (comment) plural and selectordinal could be used as a replacement for select.

This pull request removes this (unintentional) flexibility by adjusting the parser grammar to only accept the known PluralForm keys.

I had to touch some unrelated test cases which were relying on the lax parsing behaviour.

Only the valid PluralFrom keys ("one", "few", ...) are
accepted by the parser.
@eemeli
Copy link
Member

eemeli commented Apr 1, 2015

I agree that in practice requiring the plural keys to be from the CLDR defaults would be a good thing, but that's not actually a part of the standard, which defines the following syntax for pluralStyle:

pluralStyle = [offsetValue] (selector '{' message '}')+
offsetValue = "offset:" number
selector = explicitValue | keyword
explicitValue = '=' number  // adjacent, no white space in between
keyword = [^[[:Pattern_Syntax:][:Pattern_White_Space:]]]+

As messageformat.js uses make-plural for the plurals generation, the keys will always be from the default ICU/CLDR set as long as a user doesn't provide their own, but at least technically that's not a requirement.

Perhaps use of such nonstandard keys should trigger a warning of some sort, or we should add some sort of "strict" parser mode?

@vogelsgesang
Copy link
Contributor Author

Ok, I admit that I did not have a look at the standard first.

A strict mode sounds like a better solution to me. If you want, I can implement this strict mode. But I am not sure about the API: should a flag be added to Messageformat.compile, to the MessageFormat constructor or at a different place?

@eemeli
Copy link
Member

eemeli commented Apr 2, 2015

If implemented this way, it should probably be a property of the MessageFormat instance, something with trinary state: ignore/warn/error. Probably implemented by checking somewhere around here for a match against the default categories -- I don't think there's an easy way of customizing the PEG.js parser for this, is there?

On the other hand, when we have an instantiated object, we know the pluralization rules for its language, which include of course the various categories that specific language implements. What we could do is use those as the expected pluralization keys and then console.warn if compile is called with a plural that includes a category that'll never be called. Doing that could make this configuration-free, which would be nice.

Atm the pluralization categories are not revealed by make-plural, but we could fix that too. I'm in the middle of refactoring it into ES6, so now would be a good time to introduce any additional breaking changes.

@vogelsgesang
Copy link
Contributor Author

In my opinion, automatically recognizing the available pluralization categories sounds like a great feature. Hence, I would appreciate it if make-plural reveals them. Maybe this functionality should be guarded by an additional flag, so browsers which are just needing the pure plural function don't need to load additional meta informations

It is possible to adjust the PEG.js grammar for doing the checks against the known list of pluralization categories. The corresponding parser for a binary ignore/error state would look like:

pluralKey
  = i:id & {return !options.pluralKeys || options.pluralKeys.indexOf(i) >= 0}
    { return i; }
  / "=" d:digits { return d; }

And the parser would be called with an additional parameter:

_parse(msg, {pluralKeys: ["one", "few", "other"})

In order to implement a ternary error/warn/ignore behavior we should be able to use a callback:

pluralKey
  = i:id & {return !options.pluralKeyFunc || options.pluralKeyFunc(i, line, column)}
    { return i; }
  / "=" d:digits { return d; }

and pass a callback function as parameter to the parser:

_parse(msg, {pluralKeyFunc: function(key, line, col) {
   if(behavior == "ignore" || pluralKeyIsValid(key)) return true;
   if(behavior == "error") return false;
   if(behavior == "warning") {
      issue warning;
   }
   return true;
})

The main benefit from this solution would be concise error messages: We can get error messages which contain the exact line number and column.

@eemeli
Copy link
Member

eemeli commented Apr 4, 2015

I like where this is going. :) The callbal func form is probably a better way of doing it, as it gives more flexibility to both notify the user while not breaking the spec. I'd call it something like pluralKeyCheck though.

I'm going to take a look at the best way of implementing this in make-plural, and will update here once that's done.

@eemeli
Copy link
Member

eemeli commented Apr 4, 2015

Make-plural 3.0.0-rc4 now includes the categories; we'll want to do require('make-plural/pluralCategories') to get them. I've also updated the messageformat package.json dependencies in 22ed78c so they'll fetch that version from npm.

@vogelsgesang
Copy link
Contributor Author

Thanks! I tried to implement the callback variant but met an obstacle: In order to decide, if a pluralization category is valid, I must know if the pluralKey is part of an plural or an selectordinal but I don't know a way to get this context information from within the pluralKey grammar rule.
My current attempt is online at https://github.com/vogelsgesang/messageformat.js/tree/4e17c6e8b23adb17b96ad0a0ed00426db276b954.

There would be an alternative way to store the current parser position.

Which solution do you consider best? Duplicating some rules (pluralForms -> selectordinalForms, pluralKey -> selectordinalKey, ...), storing the parser position using the way I used for messageformat.codemirror, or do you know a third option which I missed?

@eemeli
Copy link
Member

eemeli commented Apr 9, 2015

I think the pos implementation would be a bit intrusive; using a function or a variable set in the initializer is likely a better choice. That'd make the parser less functional, sure, but it does seem a bit silly to duplicate pluralFormatPattern/pluralForm/pluralKey.

@eemeli eemeli mentioned this pull request Apr 16, 2016
@eemeli
Copy link
Member

eemeli commented Apr 17, 2016

@vogelsgesang, you might be interested in taking a look at #138, which includes changes implementing what's suggested here as well as in #105. As a part of that PR, I've separated the parser into its own repo, at messageformat/parser. You may be interested to check out its contributors listing.

@vogelsgesang
Copy link
Contributor Author

Wow, it's amazing how much process you made since I last looked into this project.
Thanks for including me into the contributors list!
Unfortunately, I am no longer working on my previous project which used messageformat.js and I am currently not using it in any of my projects.
But as soon as I need to build something for an international audience again, I will come back to this project.
Keep up the great work!

@eemeli eemeli closed this Apr 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants