Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve forward-compatibility of string extraction w.r.t previous trains #80

Closed
rfk opened this issue Oct 19, 2015 · 15 comments
Closed

Improve forward-compatibility of string extraction w.r.t previous trains #80

rfk opened this issue Oct 19, 2015 · 15 comments

Comments

@rfk
Copy link
Member

@rfk rfk commented Oct 19, 2015

IIUC, we're currently in a situation where we can't extract strings for the upcoming train until the previous train has shipped to production. If we do so, there's a risk that strings required by the previous train will get removed or overwritten.

Let's change the process the ensure forward-compatibility with previous trains. The simplest option is probably just to keep all old strings around forever, at least to start with. We can see how much of a problem that might cause in practice and implement e.g. some sort of pruning scheme at a later time.

@rfk rfk added this to the FxA-0: quality milestone Oct 19, 2015
@rfk
Copy link
Member Author

@rfk rfk commented Oct 19, 2015

(/cc @shane-tomlinson based on discussions last week; please feel free to close this out if you've got more context captured elsewhere)

@vladikoff
Copy link
Contributor

@vladikoff vladikoff commented Oct 19, 2015

@shane-tomlinson and @vladikoff, also maybe @zaach to talk about this !

@rfk rfk self-assigned this Oct 27, 2015
@rfk
Copy link
Member Author

@rfk rfk commented Oct 27, 2015

@shane-tomlinson says we may be able to use "compendium" feature from here: https://www.mankier.com/1/msgmerge#-C

@vladikoff
Copy link
Contributor

@vladikoff vladikoff commented Nov 2, 2015

@vladikoff and @shane-tomlinson will figure this out :D

@vladikoff vladikoff self-assigned this Nov 2, 2015
@shane-tomlinson
Copy link
Member

@shane-tomlinson shane-tomlinson commented Nov 4, 2015

@jrgm, can you refresh my memory, why do strings need to be forward compatible?

@vladikoff
Copy link
Contributor

@vladikoff vladikoff commented Nov 4, 2015

@zaach or @rfk , Shane and I talked about this and we got confused. Could you explain this:

there's a risk that strings required by the previous train will get removed or overwritten.

Is it enough to just tag the string versions (choose SHA)? Instead of using HEAD?

We are not sure what the problem is exactly.
Please help 🍰

@rfk
Copy link
Member Author

@rfk rfk commented Nov 4, 2015

Two reasons IIRC.

First, it just makes more work for @jrgm, since he has to carefully select "the latest sha before the new strings were cut" rather than just picking up the latest sha. The more our process depends on human beings doing the right thing, the more opportunity for things to go wrong. (no offense @jrgm ;-)

Second, it reduces the window in which translators can have their strings pick up for deploy. Imagine the following sequence of events:

  • french translations for train-50 get merged
  • we cut the strings for train-51
  • german translations for train-50 get merged
  • train-50 is deployed

Without forward-compatibility, we may not be able to ship the new german translations with train-50.

@shane-tomlinson
Copy link
Member

@shane-tomlinson shane-tomlinson commented Nov 5, 2015

First, it just makes more work for @jrgm, since he has to carefully select "the latest sha before the new strings were cut" rather than just picking up the latest sha. The more our process depends on human beings doing the right thing, the more opportunity for things to go wrong. (no offense @jrgm ;-)

And this is where git tags would be useful.

Without forward-compatibility, we may not be able to ship the new german translations with train-50.

I think there was a more pressing reason, like emails were sent with uninterpolated variables, but I do not remember.

I'm feel crotchety w.r.t. prioritizing this work, and I'm not totally buying the arguments. I can see this turn into 3+ days of work.

Our process is such that steps 2 and 4 are should not happen in that order, train-51 strings are cut after train-50 is deployed unless @jrgm says doing otherwise is OK. This reduces flexibility, but it has generally worked well. The process falls down with late deploys, for example, if a train deploys a week late and cutting strings after the deployment would only give the l10n crew 1 weekend to translate.

The worst that should happen is English strings are used, with the appropriate variables interpolated. If English strings are/were sent with uninterpolated variables, I view that as the most pressing issue that should be fixed before future proofing. My reasoning is based on a shaky premise, input about the original problem would be helpful.

@rfk
Copy link
Member Author

@rfk rfk commented Nov 6, 2015

I'm feel crotchety w.r.t. prioritizing this work [...] I can see this turn into 3+ days of work.

Fair, I don't think we realistically have three full days to spare on this work this train.

So AFAIK what you state above is true, the worst that should happen is some English strings are used, but used properly. We can avoid this by ensuring that train-X uses the latest sha from before the train-X+1 strings were cut. And this bug is about making that process less fragile, reducing the need for coordination and the risk of short translation windows, and generally making us all less scared of this procedure. (Although I think we're all a bit less scared after having talked it through).

@vladikoff vladikoff removed their assignment Nov 11, 2015
@vladikoff
Copy link
Contributor

@vladikoff vladikoff commented Nov 11, 2015

After talking to @shane-tomlinson we have a plan to move unused (commented out ) strings into some sort of an archived file. If those strings were translated in the past we should be able to restore them from that archived file.

@shane-tomlinson
Copy link
Member

@shane-tomlinson shane-tomlinson commented Nov 11, 2015

My thoughts are still hazy, but I think this can work like so:

Keep two sets of translations:

  1. "current translations"
  2. "all translations that have ever happened"

This process can all happen on string extraction before sending the .po files to the l10n community.

  1. Add any translations in the "current translations" that are not in "all translations" to "all translations." If there are any translation mismatches in "all translations", overwrite the version in "all translations" with the version from "current translations."
  2. Extract the new strings. Check each untranslated string. If there is a match for the untranslated string in "all translations", automatically import the translation from "all translations" back to "current translations."

"all translations" could either be a JSON file, or it could be a .PO file that is used as a "compendium" whenever merging strings w/ msgmerge.

@shane-tomlinson
Copy link
Member

@shane-tomlinson shane-tomlinson commented Nov 13, 2015

Now that I've written my thoughts on how to make forward compatibility possible, I'm moving this back to the backlog.

@vladikoff
Copy link
Contributor

@vladikoff vladikoff commented Nov 20, 2015

Talking to @zaach the "2 sets of translation" approach will probably not work.

We talked about few ideas, one of them is keeping the strings that we delete in "strings.js" file. That way they did not get extract.

@shane-tomlinson
Copy link
Member

@shane-tomlinson shane-tomlinson commented Dec 7, 2015

Talking to @zaach the "2 sets of translation" approach will probably not work.

For my own (and any other passers-by who care) edification, what problems did you identify?

@vladikoff
Copy link
Contributor

@vladikoff vladikoff commented Nov 8, 2016

Closing this for now

there's a risk that strings required by the previous train will get removed or overwritten.

We are on a good schedule for extraction now and not that many strings get removed...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants