Skip to content

Conversation

@stasm
Copy link
Contributor

@stasm stasm commented Mar 6, 2018

Migrate empty translations as {""}. See https://bugzil.la/1441942.

Empty plural variants are also kept as {""} to preserve as much of the original intent as possible. Dropping them could result in different behavior when the default variant is displayed instead.

This doesn't address migrating leading and trailing whitespace in non-whitespace-only legacy translations. See https://bugzil.la/1374246.

See #54 (comment).

@stasm stasm force-pushed the empty-plurals branch 3 times, most recently from 4d09c13 to af760a7 Compare March 7, 2018 13:58
@stasm stasm requested a review from Pike March 7, 2018 14:04
@stasm
Copy link
Contributor Author

stasm commented Mar 7, 2018

I refactored all of test_plural.py to use generic terms like One, Few and Many rather than actual translations. Some tests were duplicates so I removed them.

I also changed the behavior of PLURALS for legacy translations which define only a single variant or no variants (key=). Due to changes from 104f885 most of the time they used to result in a SelectExpression with two variants, one taken from the legacy translation and the other one copied to provide the default. I changed it to not insert the SelectionExpression at all.

@stasm
Copy link
Contributor Author

stasm commented Mar 7, 2018

I considered dropping empty legacy variants in PLURALS but in the end I decided against it. It's functionally different to define a variant as {""} and to drop it. The latter scenario results in the default variant being displayed which is different from the behavior defined by the legacy translation.

It would also require more code and tests to properly handle cases where all variants are empty and presumably need to be dropped, or when the default variant is empty, etc.

@stasm
Copy link
Contributor Author

stasm commented Mar 7, 2018

While this PR adds support for migrating empty and all-whitespace values, leading and trailing whitespace in non-whitespace-only legacy translations are still not supported. That's bug 1374246. I have an idea for how to fix it; at the same time I think it's much lower priority than this PR and I didn't want to make it any more complex.

@Pike
Copy link
Contributor

Pike commented Mar 7, 2018

I considered dropping empty legacy variants in PLURALS but in the end I decided against it. It's functionally different to define a variant as {""} and to drop it. The latter scenario results in the default variant being displayed which is different from the behavior defined by the legacy translation.

@flodolo, is this what we want? From what I've seen so far, and from what I've read in conversations in bugs, it seems that localizers read ;; to be "no valid choice in this context"?

Copy link
Contributor

@Pike Pike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good in general, and I like the redo of the tests, too.

I think we should merge bra and { "ket" }, though.

Also having an open question on flod on how to deal with empty variants.

self.assertEqual(
evaluate(self, msg).to_json(),
ftl_message_to_json('''
combined = Hello, world!{""}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the trailing {""} is not what we want.

# And remove empty ones.
if len(text.value) > 0:
pruned.append(text)
def join_adjacent_elements(elements):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should adjoin text elements and text expressions alike. That way, we don't end up with stray {""} in the output (and thus in the tests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with doing this but I think this is opposite to the direction the fix to handle leading/trailing whitespace would want to go. Do you want to deprioritize or maybe even wontfix bug 1374246?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't leading/trailing just be trailing pass in pattern_of, cutting leading and trailing whitespace off of TextElement into a Placeable, or prefixing with an empty placeable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it also needs more handling in CONCAT so that CONCAT(COPY(), COPY()) doesn't produce things like Foo{" "}Bar when the first COPY has trailing whitespace (or the second one has leading whitespace). I'd like to tackle it in a separate PR.

I'll implement your original suggestion to join TextElements and StringExpressions alike. This might result in TextElement(" Foo") in some rare cases. The serializer currently serializes this with the whitespace (although I'd like to change the behavior in the future and throw: bug 1397233) but the parses ignores it, so in the end, the whitespace is lost.

This is actually the status quo for translations with leading or trailing whitespace, like key = \u0020Foo. I'll unskip two tests to make it explicit. Implementing your suggestion now will prepare us to solve the leading/trailing whitespace issue uniformly in all transforms, regardless of whether it was part of the legacy translation or got concatenated from another legacy translation.

@flodolo
Copy link
Contributor

flodolo commented Mar 7, 2018

@flodolo, is this what we want? From what I've seen so far, and from what I've read in conversations in bugs, it seems that localizers read ;; to be "no valid choice in this context"?

Even English does that: we have strings starting with ";" because the message is only used for more than one element (e.g. warning on closing multiple tabs).

I think it's better to not create an empty variant, and fall back to the default. E.g.

tabs.closeWarningMultiple = ;You are about to close #1 tabs. Are you sure you want to continue?

Should be migrated to

tabs-close-warning = 
        { $tabCount ->            
           *[other] You are about to close { $tabCount } tabs. Are you sure you want to continue?
        }

More than

tabs-close-warning = 
        { $tabCount ->            
            [one] {""}
           *[other] You are about to close { $tabCount } tabs. Are you sure you want to continue?
        }

Also, DevTools counter example: if they use English, you'd be creating a bunch of empty variants. With the other approach, you would be creating the first variant available, and fall back to the last one. Both are less than ideal, but the latter is definitely better.

@stasm
Copy link
Contributor Author

stasm commented Mar 7, 2018

tabs-close-warning = 
    { $tabCount ->            
        [one] {""}
       *[other] You are about to close { $tabCount } tabs. Are you sure you want to continue?
    }

I understand that this is not ideal, but it's still a valid FTL and a good translation. OTOH, dropping the empty variant will require handling many more edge-cases. I suggest to either not do it at all or file a follow-up. I don't think I'll have time to work on it before I leave on PTO late next week.

@stasm
Copy link
Contributor Author

stasm commented Mar 7, 2018

I'm trying to find an acceptable version of this PR that I can land tomorrow. Right now, the migration code breaks badly on empty values so I think there's still benefit to landing this soon, even with some limitations.

@stasm
Copy link
Contributor Author

stasm commented Mar 8, 2018

The morning me thinks that maybe it wouldn't be so hard to add after all. I'll try to do it after breakfast. If it takes more than an hour, let's move it into a follow-up bug.

@stasm
Copy link
Contributor Author

stasm commented Mar 8, 2018

It took me a bit longer, although admittedly the feature of dropping plurals did take around an hour to implement. I spent the rest of the day chasing edge-cases related to text normalization. The end result of this rabbit hole exploration is that I've fixed bug 1374246 :)

@stasm stasm force-pushed the empty-plurals branch 2 times, most recently from 3c5cebf to 4b70239 Compare March 8, 2018 16:21
@stasm stasm requested a review from Pike March 8, 2018 16:21
@stasm
Copy link
Contributor Author

stasm commented Mar 8, 2018

@Pike: my plan is to land this as three commits; I'll squash the first two together. I'm keeping them separate for now to make the review easier. The first commit is what you reviewed previously.

Copy link
Contributor

@Pike Pike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, I like how this turned out.

I've got a few nits, the only real question is whether we need a Transform.pattern_of in the case of the single pattern case.

match = re.search(regex, element.value)
if match:
whitespace = match.group('whitespace')
empty_expr = FTL.Placeable(FTL.StringExpression(whitespace))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be white_expr, as it's not empty?

And yes, I still tend to prefer {""} foo to {" "}foo, in which case this would be an empty_expr ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply placeable?

else:
pruned.extend(elems)
return pruned
return None, element
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have this method return lists of variable length ...


if isinstance(elements[0], FTL.TextElement):
ws, text = extract_whitespace(re_leading, elements[0])
elements[:1] = [ws, text]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and just inject that list here ....


if isinstance(elements[-1], FTL.TextElement):
ws, text = extract_whitespace(re_trailing, elements[-1])
elements[-1:] = [text, ws]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and here ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always returning a 2-tuple makes it easy to switch the order in this particular case here. Note that extract_whitespace doesn't know if it's processing the leading or the trailing whitespace.

element
for element in elements
if element is not None
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and then just return elements here.

def pattern_of(*elements):
elements = Transform.flatten_elements(elements)
elements = Transform.normalize_text_content(elements)
elements = Transform.preserve_whitespace(elements)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just sugar. The first elements is a generator, and then it switches to a list. I'm not opposed to the generator, but preserve_whitespace requires elements to actually be a list, so this feels a bit brittle?

Maybe just a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How about I rename flatten_elements to chain_elements which is closer to itertools.chain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also move flatten_elements, normalize_text_content and preserve_whitespace to regular functions defined at the top of the file. I only really care about Transform.pattern_of being part of the official API because it may be useful in custom foreach functions passed to PLURALS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it's better to keep it all namespaced under Transform after all? @Pike do you have an opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Insert a joke about renaming to smoosh_elements.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I have a better idea. I'll push in a few.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8f9bb30 is a refactor of pattern_of. I'm afraid rebasing this into the other three commits will be hard, so I might just land it as an independent commit.

keys_and_variants = zip(keys, variants)
keys_and_variants.sort(key=lambda (k, v): self.DEFAULT_ORDER.index(k))
last_key, last_variant = keys_and_variants[-1]
# Match keys to legacy forms in order they are defined in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar nit, forms in the order, add the the.

# variant. We don't need to insert a SelectExpression for them.
if len(pairs) == 1:
_, only_form = pairs[0]
return evaluate(ctx, self.foreach(only_form))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a pattern_of?

Copy link
Contributor Author

@stasm stasm Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it doesn't. My motivation was that with it, all* transforms return via pattern_of. It makes the code easier to follow for me and also keeps the text normalization logic in one place.

* I noticed one code path in PLURALS which doesn't; I'll fix it.

stasm added 4 commits March 9, 2018 12:15
Empty plural variants are also kept as {""} to preserve as much of the original
intent as possible. This doesn't address migrating leading and trailing
whitespace in non-whitespace-only legacy translations (bug 1374246).
Leadning and trailing whitespace is encoded as {""}, e.g.:

    key = {"    "}Foo
Rather than define variant as {""}, remove them from the migrated result.
Move some of the functions that pattern_of calls out of Transform and inline
others.
@stasm stasm merged commit 6e14e77 into projectfluent:master Mar 9, 2018
@stasm stasm deleted the empty-plurals branch March 9, 2018 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants