-
Notifications
You must be signed in to change notification settings - Fork 30
Migrate empty translations #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4d09c13 to
af760a7
Compare
|
I refactored all of I also changed the behavior of |
|
I considered dropping empty legacy variants in It would also require more code and tests to properly handle cases where all variants are empty and presumably need to be dropped, or when the default variant is empty, etc. |
|
While this PR adds support for migrating empty and all-whitespace values, leading and trailing whitespace in non-whitespace-only legacy translations are still not supported. That's bug 1374246. I have an idea for how to fix it; at the same time I think it's much lower priority than this PR and I didn't want to make it any more complex. |
@flodolo, is this what we want? From what I've seen so far, and from what I've read in conversations in bugs, it seems that localizers read |
Pike
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good in general, and I like the redo of the tests, too.
I think we should merge bra and { "ket" }, though.
Also having an open question on flod on how to deal with empty variants.
tests/migrate/test_concat.py
Outdated
| self.assertEqual( | ||
| evaluate(self, msg).to_json(), | ||
| ftl_message_to_json(''' | ||
| combined = Hello, world!{""} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the trailing {""} is not what we want.
fluent/migrate/transforms.py
Outdated
| # And remove empty ones. | ||
| if len(text.value) > 0: | ||
| pruned.append(text) | ||
| def join_adjacent_elements(elements): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should adjoin text elements and text expressions alike. That way, we don't end up with stray {""} in the output (and thus in the tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with doing this but I think this is opposite to the direction the fix to handle leading/trailing whitespace would want to go. Do you want to deprioritize or maybe even wontfix bug 1374246?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't leading/trailing just be trailing pass in pattern_of, cutting leading and trailing whitespace off of TextElement into a Placeable, or prefixing with an empty placeable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it also needs more handling in CONCAT so that CONCAT(COPY(), COPY()) doesn't produce things like Foo{" "}Bar when the first COPY has trailing whitespace (or the second one has leading whitespace). I'd like to tackle it in a separate PR.
I'll implement your original suggestion to join TextElements and StringExpressions alike. This might result in TextElement(" Foo") in some rare cases. The serializer currently serializes this with the whitespace (although I'd like to change the behavior in the future and throw: bug 1397233) but the parses ignores it, so in the end, the whitespace is lost.
This is actually the status quo for translations with leading or trailing whitespace, like key = \u0020Foo. I'll unskip two tests to make it explicit. Implementing your suggestion now will prepare us to solve the leading/trailing whitespace issue uniformly in all transforms, regardless of whether it was part of the legacy translation or got concatenated from another legacy translation.
Even English does that: we have strings starting with ";" because the message is only used for more than one element (e.g. warning on closing multiple tabs). I think it's better to not create an empty variant, and fall back to the default. E.g. tabs.closeWarningMultiple = ;You are about to close #1 tabs. Are you sure you want to continue?Should be migrated to tabs-close-warning =
{ $tabCount ->
*[other] You are about to close { $tabCount } tabs. Are you sure you want to continue?
}
More than tabs-close-warning =
{ $tabCount ->
[one] {""}
*[other] You are about to close { $tabCount } tabs. Are you sure you want to continue?
}
Also, DevTools counter example: if they use English, you'd be creating a bunch of empty variants. With the other approach, you would be creating the first variant available, and fall back to the last one. Both are less than ideal, but the latter is definitely better. |
tabs-close-warning =
{ $tabCount ->
[one] {""}
*[other] You are about to close { $tabCount } tabs. Are you sure you want to continue?
}I understand that this is not ideal, but it's still a valid FTL and a good translation. OTOH, dropping the empty variant will require handling many more edge-cases. I suggest to either not do it at all or file a follow-up. I don't think I'll have time to work on it before I leave on PTO late next week. |
|
I'm trying to find an acceptable version of this PR that I can land tomorrow. Right now, the migration code breaks badly on empty values so I think there's still benefit to landing this soon, even with some limitations. |
|
The morning me thinks that maybe it wouldn't be so hard to add after all. I'll try to do it after breakfast. If it takes more than an hour, let's move it into a follow-up bug. |
|
It took me a bit longer, although admittedly the feature of dropping plurals did take around an hour to implement. I spent the rest of the day chasing edge-cases related to text normalization. The end result of this rabbit hole exploration is that I've fixed bug 1374246 :) |
3c5cebf to
4b70239
Compare
|
@Pike: my plan is to land this as three commits; I'll squash the first two together. I'm keeping them separate for now to make the review easier. The first commit is what you reviewed previously. |
Pike
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, I like how this turned out.
I've got a few nits, the only real question is whether we need a Transform.pattern_of in the case of the single pattern case.
fluent/migrate/transforms.py
Outdated
| match = re.search(regex, element.value) | ||
| if match: | ||
| whitespace = match.group('whitespace') | ||
| empty_expr = FTL.Placeable(FTL.StringExpression(whitespace)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be white_expr, as it's not empty?
And yes, I still tend to prefer {""} foo to {" "}foo, in which case this would be an empty_expr ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or simply placeable?
fluent/migrate/transforms.py
Outdated
| else: | ||
| pruned.extend(elems) | ||
| return pruned | ||
| return None, element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have this method return lists of variable length ...
fluent/migrate/transforms.py
Outdated
|
|
||
| if isinstance(elements[0], FTL.TextElement): | ||
| ws, text = extract_whitespace(re_leading, elements[0]) | ||
| elements[:1] = [ws, text] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and just inject that list here ....
fluent/migrate/transforms.py
Outdated
|
|
||
| if isinstance(elements[-1], FTL.TextElement): | ||
| ws, text = extract_whitespace(re_trailing, elements[-1]) | ||
| elements[-1:] = [text, ws] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and here ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always returning a 2-tuple makes it easy to switch the order in this particular case here. Note that extract_whitespace doesn't know if it's processing the leading or the trailing whitespace.
fluent/migrate/transforms.py
Outdated
| element | ||
| for element in elements | ||
| if element is not None | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and then just return elements here.
fluent/migrate/transforms.py
Outdated
| def pattern_of(*elements): | ||
| elements = Transform.flatten_elements(elements) | ||
| elements = Transform.normalize_text_content(elements) | ||
| elements = Transform.preserve_whitespace(elements) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sugar. The first elements is a generator, and then it switches to a list. I'm not opposed to the generator, but preserve_whitespace requires elements to actually be a list, so this feels a bit brittle?
Maybe just a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. How about I rename flatten_elements to chain_elements which is closer to itertools.chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also move flatten_elements, normalize_text_content and preserve_whitespace to regular functions defined at the top of the file. I only really care about Transform.pattern_of being part of the official API because it may be useful in custom foreach functions passed to PLURALS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it's better to keep it all namespaced under Transform after all? @Pike do you have an opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Insert a joke about renaming to smoosh_elements.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I have a better idea. I'll push in a few.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8f9bb30 is a refactor of pattern_of. I'm afraid rebasing this into the other three commits will be hard, so I might just land it as an independent commit.
fluent/migrate/transforms.py
Outdated
| keys_and_variants = zip(keys, variants) | ||
| keys_and_variants.sort(key=lambda (k, v): self.DEFAULT_ORDER.index(k)) | ||
| last_key, last_variant = keys_and_variants[-1] | ||
| # Match keys to legacy forms in order they are defined in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar nit, forms in the order, add the the.
fluent/migrate/transforms.py
Outdated
| # variant. We don't need to insert a SelectExpression for them. | ||
| if len(pairs) == 1: | ||
| _, only_form = pairs[0] | ||
| return evaluate(ctx, self.foreach(only_form)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a pattern_of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, it doesn't. My motivation was that with it, all* transforms return via pattern_of. It makes the code easier to follow for me and also keeps the text normalization logic in one place.
* I noticed one code path in PLURALS which doesn't; I'll fix it.
Empty plural variants are also kept as {""} to preserve as much of the original
intent as possible. This doesn't address migrating leading and trailing
whitespace in non-whitespace-only legacy translations (bug 1374246).
Leadning and trailing whitespace is encoded as {""}, e.g.:
key = {" "}Foo
Rather than define variant as {""}, remove them from the migrated result.
Move some of the functions that pattern_of calls out of Transform and inline others.
Migrate empty translations as {""}. See https://bugzil.la/1441942.
Empty plural variants are also kept as {""} to preserve as much of the original intent as possible. Dropping them could result in different behavior when the default variant is displayed instead.This doesn't address migrating leading and trailing whitespace in non-whitespace-only legacy translations. See https://bugzil.la/1374246.See #54 (comment).