New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more restrictive parsing of PluralForm and SelectOrdinalForm #96
more restrictive parsing of PluralForm and SelectOrdinalForm #96
Conversation
Only the valid PluralFrom keys ("one", "few", ...) are accepted by the parser.
I agree that in practice requiring the plural keys to be from the CLDR defaults would be a good thing, but that's not actually a part of the standard, which defines the following syntax for
As messageformat.js uses make-plural for the plurals generation, the keys will always be from the default ICU/CLDR set as long as a user doesn't provide their own, but at least technically that's not a requirement. Perhaps use of such nonstandard keys should trigger a warning of some sort, or we should add some sort of "strict" parser mode? |
Ok, I admit that I did not have a look at the standard first. A strict mode sounds like a better solution to me. If you want, I can implement this strict mode. But I am not sure about the API: should a flag be added to |
If implemented this way, it should probably be a property of the MessageFormat instance, something with trinary state: ignore/warn/error. Probably implemented by checking somewhere around here for a match against the default categories -- I don't think there's an easy way of customizing the PEG.js parser for this, is there? On the other hand, when we have an instantiated object, we know the pluralization rules for its language, which include of course the various categories that specific language implements. What we could do is use those as the expected pluralization keys and then Atm the pluralization categories are not revealed by make-plural, but we could fix that too. I'm in the middle of refactoring it into ES6, so now would be a good time to introduce any additional breaking changes. |
In my opinion, automatically recognizing the available pluralization categories sounds like a great feature. Hence, I would appreciate it if make-plural reveals them. Maybe this functionality should be guarded by an additional flag, so browsers which are just needing the pure plural function don't need to load additional meta informations It is possible to adjust the PEG.js grammar for doing the checks against the known list of pluralization categories. The corresponding parser for a binary ignore/error state would look like:
And the parser would be called with an additional parameter: _parse(msg, {pluralKeys: ["one", "few", "other"}) In order to implement a ternary error/warn/ignore behavior we should be able to use a callback:
and pass a callback function as parameter to the parser: _parse(msg, {pluralKeyFunc: function(key, line, col) {
if(behavior == "ignore" || pluralKeyIsValid(key)) return true;
if(behavior == "error") return false;
if(behavior == "warning") {
issue warning;
}
return true;
}) The main benefit from this solution would be concise error messages: We can get error messages which contain the exact line number and column. |
I like where this is going. :) The callbal func form is probably a better way of doing it, as it gives more flexibility to both notify the user while not breaking the spec. I'd call it something like I'm going to take a look at the best way of implementing this in make-plural, and will update here once that's done. |
Make-plural 3.0.0-rc4 now includes the categories; we'll want to do |
Thanks! I tried to implement the callback variant but met an obstacle: In order to decide, if a pluralization category is valid, I must know if the pluralKey is part of an There would be an alternative way to store the current parser position. Which solution do you consider best? Duplicating some rules ( |
I think the |
@vogelsgesang, you might be interested in taking a look at #138, which includes changes implementing what's suggested here as well as in #105. As a part of that PR, I've separated the parser into its own repo, at messageformat/parser. You may be interested to check out its contributors listing. |
Wow, it's amazing how much process you made since I last looked into this project. |
As mentioned in #95 (comment)
plural
andselectordinal
could be used as a replacement forselect
.This pull request removes this (unintentional) flexibility by adjusting the parser grammar to only accept the known PluralForm keys.
I had to touch some unrelated test cases which were relying on the lax parsing behaviour.