Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull from English stream overwrites some translated files like if there weren't conflict #348

Closed
joaquinelio opened this issue Aug 12, 2020 · 20 comments

Comments

@joaquinelio
Copy link
Member

Pulling from English upstream
tries to OVERWRITE SOME traslated files.
No idea why, NO conflicts shown.

I have 3 remotes, origin, upstrem (es) enstream (en)
merge upstream, ok

then

merge enstream, lots of conflicts, ok...
but some files are replaced by english version with no conflict at all

example
D:\js\es.javascript.info\1-js\99-js-misc\04-reference-type\3-why-this\task.md
D:\js\es.javascript.info\1-js\99-js-misc\04-reference-type\3-why-this\solution.md

task.md , ok - (cant remember, either goes staged or shows conflict, but it IS OK).
solution.md - goes straight to the merge like no 3way was there, Spanish is lost.

Can't say way

@joaquinelio
Copy link
Member Author

joaquinelio commented Aug 14, 2020

Pulling from English upstream tries to OVERWRITE SOME traslated files. NO conflicts shown.
but some files are replaced by english version with no conflict at all

@joaquinelio I think I saw it too once, but don't remember what's causing it. I'd like to check out an example. @iliakan

I am working on MY fork, so I have 2 upstreams

enstream https://github.com/javascript-tutorial/en.javascript.info.git (fetch)
enstream https://github.com/javascript-tutorial/en.javascript.info.git (push)
origin https://github.com/joaquinelio/es.javascript.info.git (fetch)
origin https://github.com/joaquinelio/es.javascript.info.git (push)
upstream https://github.com/javascript-tutorial/es.javascript.info.git (fetch)
upstream https://github.com/javascript-tutorial/es.javascript.info.git (push)

fetch, then

merge upstream/master (spanish)
1 github hist
2 gitlens enstream

merge enstream/master (english) (on my master, dont mind)

I am willing to deal with 300+ files... but not this: (task: ok, solution: overwritten)

3 task is ok -conflicts
4 solution not ok  direct merge
5 merged changes -  added by them

so, merge reset

I coudnt help them about cherrypicking from EN, but the job is done so I put myself out of the way.
The problem still is, I think, it is not a diverge branch but a new file... with no history.
Git should see the conflict anyway. It doesnt.

I didnt try "ours", I dont think it would work as git considers there is no conflict to work with

I think the difference between task and sol is: task has "new" commits (from where? EN-EN?) and shows conflict in the line commited alone
yet I coudnt figure why solution does no checks at all.

I couldn teach myself more, I couldn't help.
But I dont think gitlens + manually picking is the right way. It is too hard.

@iliakan
Copy link
Member

iliakan commented Aug 14, 2020

After looking into the history, I guess git lost track of the content due to renames.

That's why it considers the English version to be a full rewrite.

P.S. The command to show log with changes and follow renames is:

git log -p --follow 1-js/99-js-misc/04-reference-type/3-why-this/solution.md

@joaquinelio
Copy link
Member Author

We agreed the "guess why"

The problem still is, I think, it is not a diverge branch but a new file... with no history.

Still a bug I think, the full rewrite should only be done if they had the same, unchanged seed.

So. How to fix.
It'll be harder, not only conflicts but carefully search all the losses and get the spanish files and re-rewrite them...
How.
If I cut&paste the files, I'll lose (and steal) my mates credits.

Besides, One time job? Is it a real fix? I dont know how it works.

I was trying because future syncs would be far easier to do.

@iliakan
Copy link
Member

iliakan commented Aug 14, 2020

Surely that's important. And, as we can see, git doesn't help here. We need an alternative mechanism of merging/updating from upstream.

@joaquinelio
Copy link
Member Author

I'm too new in git...
But I still see it as a git bug; two different files ,same location:

if: one is unchanged, the other is newer (commits ahead), newest wins
if: the two have changed AND there are not conflicting lines... smart merge (? not sure)
if: the two have changed AND there are conflicted lines, ask the HUMAN to do the dirty job
if: the two are completely unrelated, no stamp in common, it should ask the human too.

What gives me hope is the inconsistency of the behavior. sol.md is overwritten, task.md is not. So, bug.
You could put a little change in sol.md to test this... not a fix, just a test. No idea about the meaning of it, so forget it.

I can ask Linus... 🤣🤣🤣

@iliakan
Copy link
Member

iliakan commented Aug 15, 2020

@joaquinelio Here's the story.

TL;DR: the file was deleted in one commit, and then added independently in the next one, so git merge doesn't consider this a rename and can't merge right.

TL;DR 2: Please make upstream merges in 1 commit, to avoid such a split.

TL;DR 3: The recipe is at the end.

=======

When Git merges a file, the 3-way merge has 3 sources.

  1. the file on "our" branch.
  2. the file on "their" branch.
  3. the "merge base" file - taken at the commit where branches diverged.

Let's get the merge base:

/js/es.javascript.info master 
 git merge-base master enstream/master
74e6955587c79199fea1d11af83805efc3e1b657

That's the commit where the Spanish master (I'm on the "es" repo) and enstream/master diverged.

Now let's see what was the file at that time:

/js/es.javascript.info master ⇣
❯ git show 74e6955587c79199fea1d11af83805efc3e1b657:1-js/99-js-misc/04-reference-type/3-why-this/solution.md
fatal: path '1-js/99-js-misc/04-reference-type/3-why-this/solution.md' exists on disk, but not in '74e6955587c79199fea1d11af83805efc3e1b657'

Nothing! There was no such file.

But Git doesn't give up yet, it tries to find if the file was renamed.

If a new file is more than 50% (the number can be configured) similar with an another one that was deleted, then it's considered a rename. So git merge can find the original file and do the 3-way merge.

Although here, if we trace the file history, we can find two commits:

  1. 7730834510925be094888de62c3db1b90473321 - the file 1-js/99-js-misc/04-reference-type/3-why-this/solution.md is created.
  2. 1681ca90c9ba2baa68f21edb3974cf1690ea05fd- the file 1-js/04-object-basics/04-object-methods/3-why-this/solution.md is deleted.

That's the command that I used to track it:

git log -p --follow 1-js/99-js-misc/04-reference-type/3-why-this/solution.md

My guess is that git merge fails to detect the original file, because the rename is split between 2 commits as "delete" → "create".

Here's what it shows in the git merge enstream/master log:

CONFLICT (rename/delete): 1-js/04-object-basics/04-object-methods/3-why-this/solution.md deleted in HEAD 
  and renamed to 1-js/99-js-misc/04-reference-type/3-why-this/solution.md in enstream/master. 
  Version enstream/master of 1-js/99-js-misc/04-reference-type/3-why-this/solution.md left in tree.

That means, it considers the file to be:

  • deleted in one branch (Spanish master, file was created and deleted in different commits),
  • renamed in English branch (it can detect rename correctly there, as it was just a rename)
  • so it takes the English version (renamed one) into the result.

Recipe

Two possible ways to fix the situation.

  1. Watch the log for rename/delete conflicts and resolve them manually (take a look at the file versions in both branches, using github or git show).
  2. Make git merge less smart about renames:
git merge -X no-renames enstream/master

The latter will take both Spanish and English versions into the merged file. Maybe better for rename/delete conflicts, but maybe worse for cases when renames should be handled automatically.

I'd suggest to take the 1st way.

Thanks for the reading, I hope it helps =)

@joaquinelio
Copy link
Member Author

joaquinelio commented Aug 15, 2020

That's why you showed me " --follow " sorry I was too dumb to follow. Still hard to get.

Could you take a look to the next lines and watch for oddities ? I'll try to stay focused.

So:
I quit considering it as a bug. You want git smart, it solved that way. I follow you, I'll leave it smart.

Git lost track because of:
delete/add in different commits, instead a single rename in an unique commit.
It's odd, but that's what happened. I logged 90 of them.

Then 240 "regular" conflicted files to fix. In a single merge.

After fixing, merges will be safe provided future renames are in the same commit.

Then FIX...
Do you mean in the pending merge?
Overwrite the overwritten, this should be seen as a new mod over the ENG file. Is that so? This will loose spanish commits history. And credits.

I want to leave the repo with a short, clean and safe process for EASY updating. Already exists: "pull".
The longer we avoid it the more the manual work will be. And errors.

I'll skip the weekend to take a breath. Any suggestion...

Edit:
It is not that odd. It wasnt just renames, it was path renames. That was wrong, but it makes sense. Different articles, different fixes, different times.
So, if we rely renaming to "git pull", it wont happen again.

@iliakan
Copy link
Member

iliakan commented Aug 16, 2020

The history is already screwed up by these multi-commit renames. Btw, for git "only name rename" and "path rename" is the same thing - just a rename.

Internally, a rename is deletion/creation of a same/very similar file, I guess, if done in 1 commit, git tracks it better.

So you can modify the files properly and commit.

P.S. Another option is not to use git merge at all, but just get a diff, like

git diff master...enstream/master '*.md'
// shows what's new in enstream/master since the branches diverged
// three dots "..." in the command, not two
// shows only differences in md files, *.png/svg files can be just copied over

This shows all the changes, you can integrate them without any merge and be free of git artifacts.

Looks like there was no sync for 1.5 years, that's why there are many changes. Many of them are renames/minor fixes though.

@joaquinelio
Copy link
Member Author

joaquinelio commented Aug 16, 2020

Btw, for git "only name rename" and "path rename" is the same thing - just a rename.

Got that. I just wanted to understand what happened. Knowing the "why" helps to avoid bad things in the future.
Still, "Many of them are renames/minor fixes though." different commits, cant say why.

Another option is not to use git merge at all, but just get a diff, like

You mean to stay this way? Correct me:
I'll loose the magic of pull:
The diff will show all the changes from the base commit (and forever)
A healthy pull will show only diffs affecting only EN commits since the last healthy pull. --(ugh, I wish I wrote it right)

Magic. I don't get how commit stamps work.
When we do a pull, we get a conflict in the ENG altered lines only, leaving the rest of Spanish text unaltered.

I wish I knew before. We were, we are, a hard working team doing our best with what we had.

Fixing.
I got a log, I can extract a list.
Can I make a cherrypick merge of the state of specific files rather than commits?
does it worth it, the fix? I may be wrong about the magic.

@iliakan
Copy link
Member

iliakan commented Aug 16, 2020

  1. You don't use "git merge" command at all.
  2. diff master...enstream/master shows all commits on enstream from where the branches diverged. That is: the new stuff only.

I'm not sure what kind of "pull magic" you're talking about, that you're going to loose, so I can't answer well enough, sorry.

  • The git pull is git fetch plus git merge.
  • The diff command shows from the last successful merge, that's what you need.

So what I'm suggesting is that you manually examine the diff, update the Spanish version. Without any merges.

@joaquinelio
Copy link
Member Author

Sorry if I bother you again,
I just want to understand if what you are suggesting is a real fix.

  1. You don't use "git merge" command at all.

ever?

  1. diff master...enstream/master shows all commits on enstream from where the branches diverged. That is: the new stuff only.

new stuff only... from 1,5 years ago

and then the same next diff. 1,6 years...

I'm not sure what kind of "pull magic" you're talking about, that you're going to loose, so I can't answer well enough, sorry.

that's important.
If we keep updating "Without any merges", diff will be harder and harder with time.
I would leave the project with a dirty updating process
pull magic is,
if I fix merges now,
next merge will show only the very last updates from ENG

  • The git pull is git fetch plus git merge.

I know that, sorry I added the new word
I mean fetch+merge magic

  • The diff command shows from the last successful merge, that's what you need.
    So what I'm suggesting is that you manually examine the diff, update the Spanish version. Without any merges.

update with the help of the diff,
like it has been done until now. We can do that.
and If anyone tries a pull in the future, it will ruin the repo because there is no warning.
(git magic can revert it, ok, no permanent damage)

so The real question here is
you suggest we keep doing the same?

I dont wanna bother you, I just wanted to help.
If I cant understand then I'd better stay doing reviews and leave the sync to others.

...The very hard way.

@joaquinelio
Copy link
Member Author

This magic:
Two completely different files,
Still the only conflict is the one line recently modified,
ready for a quick fix

image

@iliakan
Copy link
Member

iliakan commented Aug 16, 2020

Try -X no-renames ?

@iliakan
Copy link
Member

iliakan commented Aug 16, 2020

Indeed so, rename detection works if the rename can be traced by git.

And the line is detected correctly if line spacings are the same.

@joaquinelio
Copy link
Member Author

oh, the one you suggested not to. : )
sorry, I didnt study it.
yet

@joaquinelio
Copy link
Member Author

Yes, we keep line numbers and spacing in translations
And the magic only works if I fix merge I think
so close...

I need to sleep
I'll look at it.

@joaquinelio
Copy link
Member Author

-norename, 406 file changed up today.
395 files to fix
No one would want to review that massive commit anyway, Need to split somehow.
Cherrypicking seems the worst solution.

Can I edit a merge? drop changes (like I could do if they were not from merge but mine)? So the resolve/commit are done on a reduce set of files.

To be clear on magic.
It's a commit history magic. In a perfect world, with a single change:
while a diff would report a 100% diff, EN-ES
a merge would point to a single file, single line as a conflict to resolve.

@iliakan
Copy link
Member

iliakan commented Aug 20, 2020

I suppose you could merge partially, e.g. up to the state like 50 commits ago, and then continue merging... Never tried that though.

@joaquinelio
Copy link
Member Author

That... seems possible. I would never do that with real code. This is just text, no version conflicts expected.
50 commits? Too short I guess.
I'll become n historian then.
Later maybe

@joaquinelio
Copy link
Member Author

It worked
400+ files reviewed, hope few to none errors.
Then merged Spanish stream,
Then merged English stream again, normal merge with NO issues found,
7 files with conflicts and 10 automerged, (needed to review too, may have new english inserted on spanish)
Then PR.
The odd, PR claims only 75 files changed... The number I lazily marked as "ours" to solve conflicts.
I expected fewer than 400 but I would swear it must be two hundred at least plus those 75. It must be wrong.

EDIT: 75 marked "ours" wont be modified, 75 solved manually (instead of 200) seems possible, the number 75 just a coincidence.

Anyway, no matter if PR survives or not, if it doesnt get a review, if it has a review then rejected because it is too bugged...

... the issue is solved. (with --no-rename just once)
Thanks for the schooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants