Improve forward-compatibility of string extraction w.r.t previous trains #80
Comments
|
(/cc @shane-tomlinson based on discussions last week; please feel free to close this out if you've got more context captured elsewhere) |
|
@shane-tomlinson and @vladikoff, also maybe @zaach to talk about this ! |
|
@shane-tomlinson says we may be able to use "compendium" feature from here: https://www.mankier.com/1/msgmerge#-C |
|
@vladikoff and @shane-tomlinson will figure this out :D |
|
@jrgm, can you refresh my memory, why do strings need to be forward compatible? |
|
@zaach or @rfk , Shane and I talked about this and we got confused. Could you explain this:
Is it enough to just tag the string versions (choose SHA)? Instead of using HEAD? We are not sure what the problem is exactly. |
|
Two reasons IIRC. First, it just makes more work for @jrgm, since he has to carefully select "the latest sha before the new strings were cut" rather than just picking up the latest sha. The more our process depends on human beings doing the right thing, the more opportunity for things to go wrong. (no offense @jrgm ;-) Second, it reduces the window in which translators can have their strings pick up for deploy. Imagine the following sequence of events:
Without forward-compatibility, we may not be able to ship the new german translations with train-50. |
And this is where git tags would be useful.
I think there was a more pressing reason, like emails were sent with uninterpolated variables, but I do not remember. I'm feel crotchety w.r.t. prioritizing this work, and I'm not totally buying the arguments. I can see this turn into 3+ days of work. Our process is such that steps 2 and 4 are should not happen in that order, train-51 strings are cut after train-50 is deployed unless @jrgm says doing otherwise is OK. This reduces flexibility, but it has generally worked well. The process falls down with late deploys, for example, if a train deploys a week late and cutting strings after the deployment would only give the l10n crew 1 weekend to translate. The worst that should happen is English strings are used, with the appropriate variables interpolated. If English strings are/were sent with uninterpolated variables, I view that as the most pressing issue that should be fixed before future proofing. My reasoning is based on a shaky premise, input about the original problem would be helpful. |
Fair, I don't think we realistically have three full days to spare on this work this train. So AFAIK what you state above is true, the worst that should happen is some English strings are used, but used properly. We can avoid this by ensuring that train-X uses the latest sha from before the train-X+1 strings were cut. And this bug is about making that process less fragile, reducing the need for coordination and the risk of short translation windows, and generally making us all less scared of this procedure. (Although I think we're all a bit less scared after having talked it through). |
|
After talking to @shane-tomlinson we have a plan to move unused (commented out ) strings into some sort of an archived file. If those strings were translated in the past we should be able to restore them from that archived file. |
|
My thoughts are still hazy, but I think this can work like so: Keep two sets of translations:
This process can all happen on string extraction before sending the .po files to the l10n community.
"all translations" could either be a JSON file, or it could be a .PO file that is used as a "compendium" whenever merging strings w/ msgmerge. |
|
Now that I've written my thoughts on how to make forward compatibility possible, I'm moving this back to the backlog. |
|
Talking to @zaach the "2 sets of translation" approach will probably not work. We talked about few ideas, one of them is keeping the strings that we delete in "strings.js" file. That way they did not get extract. |
For my own (and any other passers-by who care) edification, what problems did you identify? |
|
Closing this for now
We are on a good schedule for extraction now and not that many strings get removed... |

IIUC, we're currently in a situation where we can't extract strings for the upcoming train until the previous train has shipped to production. If we do so, there's a risk that strings required by the previous train will get removed or overwritten.
Let's change the process the ensure forward-compatibility with previous trains. The simplest option is probably just to keep all old strings around forever, at least to start with. We can see how much of a problem that might cause in practice and implement e.g. some sort of pruning scheme at a later time.
The text was updated successfully, but these errors were encountered: