Migrate empty translations #54

stasm · 2018-03-06T14:27:25Z

Migrate empty translations as {""}. See https://bugzil.la/1441942.

~~Empty plural variants are also kept as {""} to preserve as much of the original intent as possible. Dropping them could result in different behavior when the default variant is displayed instead.~~

~~This doesn't address migrating leading and trailing whitespace in non-whitespace-only legacy translations. See https://bugzil.la/1374246.~~

See #54 (comment).

stasm · 2018-03-07T14:13:24Z

I refactored all of test_plural.py to use generic terms like One, Few and Many rather than actual translations. Some tests were duplicates so I removed them.

I also changed the behavior of PLURALS for legacy translations which define only a single variant or no variants (key=). Due to changes from 104f885 most of the time they used to result in a SelectExpression with two variants, one taken from the legacy translation and the other one copied to provide the default. I changed it to not insert the SelectionExpression at all.

stasm · 2018-03-07T14:15:34Z

I considered dropping empty legacy variants in PLURALS but in the end I decided against it. It's functionally different to define a variant as {""} and to drop it. The latter scenario results in the default variant being displayed which is different from the behavior defined by the legacy translation.

It would also require more code and tests to properly handle cases where all variants are empty and presumably need to be dropped, or when the default variant is empty, etc.

stasm · 2018-03-07T14:17:48Z

While this PR adds support for migrating empty and all-whitespace values, leading and trailing whitespace in non-whitespace-only legacy translations are still not supported. That's bug 1374246. I have an idea for how to fix it; at the same time I think it's much lower priority than this PR and I didn't want to make it any more complex.

Pike · 2018-03-07T17:04:23Z

I considered dropping empty legacy variants in PLURALS but in the end I decided against it. It's functionally different to define a variant as {""} and to drop it. The latter scenario results in the default variant being displayed which is different from the behavior defined by the legacy translation.

@flodolo, is this what we want? From what I've seen so far, and from what I've read in conversations in bugs, it seems that localizers read ;; to be "no valid choice in this context"?

Pike

This looks good in general, and I like the redo of the tests, too.

I think we should merge bra and { "ket" }, though.

Also having an open question on flod on how to deal with empty variants.

Pike · 2018-03-07T16:48:27Z

tests/migrate/test_concat.py

+        self.assertEqual(
+            evaluate(self, msg).to_json(),
+            ftl_message_to_json('''
+                combined = Hello, world!{""}


I think the trailing {""} is not what we want.

Pike · 2018-03-07T16:50:20Z

fluent/migrate/transforms.py

-                # And remove empty ones.
-                if len(text.value) > 0:
-                    pruned.append(text)
+    def join_adjacent_elements(elements):


I think we should adjoin text elements and text expressions alike. That way, we don't end up with stray {""} in the output (and thus in the tests).

I'm OK with doing this but I think this is opposite to the direction the fix to handle leading/trailing whitespace would want to go. Do you want to deprioritize or maybe even wontfix bug 1374246?

Wouldn't leading/trailing just be trailing pass in pattern_of, cutting leading and trailing whitespace off of TextElement into a Placeable, or prefixing with an empty placeable?

Yes, but it also needs more handling in CONCAT so that CONCAT(COPY(), COPY()) doesn't produce things like Foo{" "}Bar when the first COPY has trailing whitespace (or the second one has leading whitespace). I'd like to tackle it in a separate PR.

I'll implement your original suggestion to join TextElements and StringExpressions alike. This might result in TextElement(" Foo") in some rare cases. The serializer currently serializes this with the whitespace (although I'd like to change the behavior in the future and throw: bug 1397233) but the parses ignores it, so in the end, the whitespace is lost.

This is actually the status quo for translations with leading or trailing whitespace, like key = \u0020Foo. I'll unskip two tests to make it explicit. Implementing your suggestion now will prepare us to solve the leading/trailing whitespace issue uniformly in all transforms, regardless of whether it was part of the legacy translation or got concatenated from another legacy translation.

flodolo · 2018-03-07T17:56:45Z

@flodolo, is this what we want? From what I've seen so far, and from what I've read in conversations in bugs, it seems that localizers read ;; to be "no valid choice in this context"?

Even English does that: we have strings starting with ";" because the message is only used for more than one element (e.g. warning on closing multiple tabs).

I think it's better to not create an empty variant, and fall back to the default. E.g.

tabs.closeWarningMultiple = ;You are about to close #1 tabs. Are you sure you want to continue?

Should be migrated to

tabs-close-warning = 
        { $tabCount ->            
           *[other] You are about to close { $tabCount } tabs. Are you sure you want to continue?
        }

More than

tabs-close-warning = 
        { $tabCount ->            
            [one] {""}
           *[other] You are about to close { $tabCount } tabs. Are you sure you want to continue?
        }

Also, DevTools counter example: if they use English, you'd be creating a bunch of empty variants. With the other approach, you would be creating the first variant available, and fall back to the last one. Both are less than ideal, but the latter is definitely better.

stasm · 2018-03-07T18:18:56Z

tabs-close-warning = 
    { $tabCount ->            
        [one] {""}
       *[other] You are about to close { $tabCount } tabs. Are you sure you want to continue?
    }

I understand that this is not ideal, but it's still a valid FTL and a good translation. OTOH, dropping the empty variant will require handling many more edge-cases. I suggest to either not do it at all or file a follow-up. I don't think I'll have time to work on it before I leave on PTO late next week.

stasm · 2018-03-07T18:24:48Z

I'm trying to find an acceptable version of this PR that I can land tomorrow. Right now, the migration code breaks badly on empty values so I think there's still benefit to landing this soon, even with some limitations.

stasm · 2018-03-08T07:02:58Z

The morning me thinks that maybe it wouldn't be so hard to add after all. I'll try to do it after breakfast. If it takes more than an hour, let's move it into a follow-up bug.

stasm · 2018-03-08T16:13:38Z

It took me a bit longer, although admittedly the feature of dropping plurals did take around an hour to implement. I spent the rest of the day chasing edge-cases related to text normalization. The end result of this rabbit hole exploration is that I've fixed bug 1374246 :)

stasm · 2018-03-08T16:23:13Z

@Pike: my plan is to land this as three commits; I'll squash the first two together. I'm keeping them separate for now to make the review easier. The first commit is what you reviewed previously.

Pike

Sweet, I like how this turned out.

I've got a few nits, the only real question is whether we need a Transform.pattern_of in the case of the single pattern case.

Pike · 2018-03-08T17:41:58Z

fluent/migrate/transforms.py

+            match = re.search(regex, element.value)
+            if match:
+                whitespace = match.group('whitespace')
+                empty_expr = FTL.Placeable(FTL.StringExpression(whitespace))


Can this be white_expr, as it's not empty?

And yes, I still tend to prefer {""} foo to {" "}foo, in which case this would be an empty_expr ;-)

Or simply placeable?

Pike · 2018-03-08T17:42:51Z

fluent/migrate/transforms.py

            else:
-                pruned.extend(elems)
-        return pruned
+                return None, element


I'd prefer to have this method return lists of variable length ...

Pike · 2018-03-08T17:43:13Z

fluent/migrate/transforms.py

+
+        if isinstance(elements[0], FTL.TextElement):
+            ws, text = extract_whitespace(re_leading, elements[0])
+            elements[:1] = [ws, text]


... and just inject that list here ....

Pike · 2018-03-08T17:43:22Z

fluent/migrate/transforms.py

+
+        if isinstance(elements[-1], FTL.TextElement):
+            ws, text = extract_whitespace(re_trailing, elements[-1])
+            elements[-1:] = [text, ws]


... and here ...

Always returning a 2-tuple makes it easy to switch the order in this particular case here. Note that extract_whitespace doesn't know if it's processing the leading or the trailing whitespace.

Pike · 2018-03-08T17:43:57Z

fluent/migrate/transforms.py

+            element
+            for element in elements
+            if element is not None
+        ]


... and then just return elements here.

Pike · 2018-03-08T17:49:35Z

fluent/migrate/transforms.py

+    def pattern_of(*elements):
+        elements = Transform.flatten_elements(elements)
+        elements = Transform.normalize_text_content(elements)
+        elements = Transform.preserve_whitespace(elements)


Just sugar. The first elements is a generator, and then it switches to a list. I'm not opposed to the generator, but preserve_whitespace requires elements to actually be a list, so this feels a bit brittle?

Maybe just a comment?

Good point. How about I rename flatten_elements to chain_elements which is closer to itertools.chain?

I'll also move flatten_elements, normalize_text_content and preserve_whitespace to regular functions defined at the top of the file. I only really care about Transform.pattern_of being part of the official API because it may be useful in custom foreach functions passed to PLURALS.

Or maybe it's better to keep it all namespaced under Transform after all? @Pike do you have an opinion?

(Insert a joke about renaming to smoosh_elements.)

Ah, I think I have a better idea. I'll push in a few.

8f9bb30 is a refactor of pattern_of. I'm afraid rebasing this into the other three commits will be hard, so I might just land it as an independent commit.

Pike · 2018-03-08T17:53:16Z

fluent/migrate/transforms.py

-        keys_and_variants = zip(keys, variants)
-        keys_and_variants.sort(key=lambda (k, v): self.DEFAULT_ORDER.index(k))
-        last_key, last_variant = keys_and_variants[-1]
+        # Match keys to legacy forms in order they are defined in


grammar nit, forms in the order, add the the.

Pike · 2018-03-08T17:54:45Z

fluent/migrate/transforms.py

+        # variant. We don't need to insert a SelectExpression for them.
+        if len(pairs) == 1:
+            _, only_form = pairs[0]
+            return evaluate(ctx, self.foreach(only_form))


Does this need a pattern_of?

Technically, it doesn't. My motivation was that with it, all* transforms return via pattern_of. It makes the code easier to follow for me and also keeps the text normalization logic in one place.

* I noticed one code path in PLURALS which doesn't; I'll fix it.

Empty plural variants are also kept as {""} to preserve as much of the original intent as possible. This doesn't address migrating leading and trailing whitespace in non-whitespace-only legacy translations (bug 1374246).

Leadning and trailing whitespace is encoded as {""}, e.g.: key = {" "}Foo

Rather than define variant as {""}, remove them from the migrated result.

Move some of the functions that pattern_of calls out of Transform and inline others.

stasm force-pushed the empty-plurals branch 3 times, most recently from 4d09c13 to af760a7 Compare March 7, 2018 13:58

stasm requested a review from Pike March 7, 2018 14:04

Pike suggested changes Mar 7, 2018

View reviewed changes

stasm force-pushed the empty-plurals branch 2 times, most recently from 3c5cebf to 4b70239 Compare March 8, 2018 16:21

stasm requested a review from Pike March 8, 2018 16:21

Pike approved these changes Mar 8, 2018

View reviewed changes

Pike approved these changes Mar 9, 2018

View reviewed changes

stasm added 4 commits March 9, 2018 12:15

Bug 1374246 - Support leading and trailing whitespace in migrations

bca5d3e

Leadning and trailing whitespace is encoded as {""}, e.g.: key = {" "}Foo

Drop plural variants if legacy forms are empty

43a1309

Rather than define variant as {""}, remove them from the migrated result.

Refactor Transform.pattern_of

d450b1e

Move some of the functions that pattern_of calls out of Transform and inline others.

stasm force-pushed the empty-plurals branch from 8f9bb30 to d450b1e Compare March 9, 2018 11:18

stasm merged commit 6e14e77 into projectfluent:master Mar 9, 2018

stasm deleted the empty-plurals branch March 9, 2018 11:21

Migrate empty translations #54

Migrate empty translations #54

Uh oh!

Conversation

stasm commented Mar 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stasm commented Mar 7, 2018

Uh oh!

stasm commented Mar 7, 2018

Uh oh!

stasm commented Mar 7, 2018

Uh oh!

Pike commented Mar 7, 2018

Uh oh!

Pike left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flodolo commented Mar 7, 2018

Uh oh!

stasm commented Mar 7, 2018

Uh oh!

stasm commented Mar 7, 2018

Uh oh!

stasm commented Mar 8, 2018

Uh oh!

stasm commented Mar 8, 2018

Uh oh!

stasm commented Mar 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pike left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stasm Mar 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stasm commented Mar 6, 2018 •

edited

Loading

stasm commented Mar 8, 2018 •

edited

Loading

stasm Mar 9, 2018 •

edited

Loading