New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge: reload index before git_merge #4407

Open
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@Etiene

Etiene commented Nov 11, 2017

When the index in memory diverges from the index in disk git_merge
aborts with GIT_ECONFLICT. More details about this are found in the
issue #4203, which this PR attempts to fix.

A further discussion is needed wether this behaviour is actually desired.
(Suppose someone actually is writing something to the index in memory
and a concurrent task is merging. A possible argument is that the index
should be written to disk then.)

Closes #4203

@pks-t

Some stylistic nits. I won't comment about that regarding whats the right thing to do here before having a look at it with fresh eyes tomorrow.

Show outdated Hide outdated src/merge.c Outdated
Show outdated Hide outdated src/merge.c Outdated
Show outdated Hide outdated tests/merge/workdir/simple.c Outdated
@ethomson

This comment has been minimized.

Show comment
Hide comment
@ethomson

ethomson Nov 11, 2017

Member

One note: @pks-t and I have been discussing correct behavior here and whether we should always reload the index from disk. (Like checkout does.)

It's worth noting that merge's behavior is to create a new index (that represents the merge results) and check that out, setting it as the new index.

During discussion, we were concerned that somebody may want to git_index_add an entry and expect it to be preserved even though they had not done a git_index_write. This change might remove that functionality.

After further thoughts, it does not. AFAICT, there is no case where somebody could git_index_add something, and not write the changes to the index, and have those preserved. That's because the on-disk index is overwritten with the merge results index. So unsaved index changes are lost (today) and this does nothing to change that behavior one way or another.

Member

ethomson commented Nov 11, 2017

One note: @pks-t and I have been discussing correct behavior here and whether we should always reload the index from disk. (Like checkout does.)

It's worth noting that merge's behavior is to create a new index (that represents the merge results) and check that out, setting it as the new index.

During discussion, we were concerned that somebody may want to git_index_add an entry and expect it to be preserved even though they had not done a git_index_write. This change might remove that functionality.

After further thoughts, it does not. AFAICT, there is no case where somebody could git_index_add something, and not write the changes to the index, and have those preserved. That's because the on-disk index is overwritten with the merge results index. So unsaved index changes are lost (today) and this does nothing to change that behavior one way or another.

Show outdated Hide outdated src/merge.c Outdated
@pks-t

This comment has been minimized.

Show comment
Hide comment
@pks-t

pks-t Nov 11, 2017

Member

@ethomson Unfortunately, I don't think that's actually true. First, there's the GIT_CHECKOUT_DONT_WRITE_INDEX flag, which gets evaluated by the index writer. Second, it's not really about the writing but about the reading of the index from disk. If we have a modified index, the indexwriter will simply use that, and necessarily so. If there are changes which are not written to disk, then currently the index writer will not re-read that from the index but will simply do a git_repository_index__weakptr and get the currently modified one.

So re-reading the index from disk does change behaviour in some cases. Whether those cases are cases are sane or not is an entirely different topic, though.

Member

pks-t commented Nov 11, 2017

@ethomson Unfortunately, I don't think that's actually true. First, there's the GIT_CHECKOUT_DONT_WRITE_INDEX flag, which gets evaluated by the index writer. Second, it's not really about the writing but about the reading of the index from disk. If we have a modified index, the indexwriter will simply use that, and necessarily so. If there are changes which are not written to disk, then currently the index writer will not re-read that from the index but will simply do a git_repository_index__weakptr and get the currently modified one.

So re-reading the index from disk does change behaviour in some cases. Whether those cases are cases are sane or not is an entirely different topic, though.

Etiene and others added some commits Nov 11, 2017

tests: add test case for index reloads on merge
Adds a test case for the issue #4203, when diverging indexes
on memory and disk cause git merge to abort with GIT_ECONFLICT
merge: reload index before git_merge
If the index in memory is different from the index on the disk,
previously merge would abort with GIT_ECONFLICT.
Reload the index before merging to fix this.

Fixes #4203
merge: add error handling for index reload
Cleans up should git_repository_index or git_index_read fail
@Etiene

This comment has been minimized.

Show comment
Hide comment
@Etiene

Etiene Nov 11, 2017

@pks-t please re-review the code style? :)

@ethomson is this where you wanted it moved? :)

Concerning the desired behaviour, let me know when the council has made a decision ;)

Etiene commented Nov 11, 2017

@pks-t please re-review the code style? :)

@ethomson is this where you wanted it moved? :)

Concerning the desired behaviour, let me know when the council has made a decision ;)

@ethomson

This comment has been minimized.

Show comment
Hide comment
@ethomson

ethomson Nov 11, 2017

Member

So re-reading the index from disk does change behaviour in some cases. Whether those cases are cases are sane or not is an entirely different topic, though.

Correct. I was totally mistaken about that.

I'm going to introduce a test during index refresh paths that ensures that we do not have unwritten index entries (ie someone has called git_index_add but not git_index_write) to ensure that we do not lose their changes and can refresh safely.

Member

ethomson commented Nov 11, 2017

So re-reading the index from disk does change behaviour in some cases. Whether those cases are cases are sane or not is an entirely different topic, though.

Correct. I was totally mistaken about that.

I'm going to introduce a test during index refresh paths that ensures that we do not have unwritten index entries (ie someone has called git_index_add but not git_index_write) to ensure that we do not lose their changes and can refresh safely.

@Etiene

This comment has been minimized.

Show comment
Hide comment
@Etiene

Etiene Nov 11, 2017

Summary of future improvements, then, based on IRL discussion:

  • erring on dirty index
  • still reloading on stale

Etiene commented Nov 11, 2017

Summary of future improvements, then, based on IRL discussion:

  • erring on dirty index
  • still reloading on stale

@pks-t pks-t added the bloomberg label Nov 12, 2017

@ethomson

This comment has been minimized.

Show comment
Hide comment
@ethomson

ethomson Nov 17, 2017

Member

I've been working on adding a guard to the index so that you can't run a command that reloads the index when the index is "dirty" - which is to say that you've made changes via git_index_add and friends but you haven't saved those changes (with git_index_write) or aborted them (with git_index_read).

Regrettably, we have a bunch of tests that believe that they can work in the old way and not flush the index to disk. Sensibly, because why should they? That would just slow our test runs down.

I've mostly got everything fixed here, but there are still a few tests failing. I'm chipping away at these, and once that's done, I think we'll be good to merge this.

Member

ethomson commented Nov 17, 2017

I've been working on adding a guard to the index so that you can't run a command that reloads the index when the index is "dirty" - which is to say that you've made changes via git_index_add and friends but you haven't saved those changes (with git_index_write) or aborted them (with git_index_read).

Regrettably, we have a bunch of tests that believe that they can work in the old way and not flush the index to disk. Sensibly, because why should they? That would just slow our test runs down.

I've mostly got everything fixed here, but there are still a few tests failing. I'm chipping away at these, and once that's done, I think we'll be good to merge this.

@Etiene

This comment has been minimized.

Show comment
Hide comment
@Etiene

Etiene Nov 17, 2017

@ethomson wonderful! Do you need help in that?

Etiene commented Nov 17, 2017

@ethomson wonderful! Do you need help in that?

@ethomson

This comment has been minimized.

Show comment
Hide comment
@ethomson

ethomson Jun 30, 2018

Member

Now that #4536 has landed, I think we can move forward with this. I have one code review point. And we should add some data to the changelog, something like:

Merge now reloads the index before proceeding, to ensure that it has the latest changes that were made by other clients. In the unlikely event that you have made changes to the index and not written them before calling git_merge, those changes may now be lost. You should ensure that your changes are saved by calling git_index_write. If you would like an error in the case where your index entries would be overwritten, you can set the GIT_OPT_ENABLE_UNSAVED_INDEX_SAFETY option.

Member

ethomson commented Jun 30, 2018

Now that #4536 has landed, I think we can move forward with this. I have one code review point. And we should add some data to the changelog, something like:

Merge now reloads the index before proceeding, to ensure that it has the latest changes that were made by other clients. In the unlikely event that you have made changes to the index and not written them before calling git_merge, those changes may now be lost. You should ensure that your changes are saved by calling git_index_write. If you would like an error in the case where your index entries would be overwritten, you can set the GIT_OPT_ENABLE_UNSAVED_INDEX_SAFETY option.

@@ -3255,6 +3255,10 @@ int git_merge(
&checkout_strategy)) < 0)
goto done;
if ((error = git_repository_index(&index, repo) < 0) ||
(error = git_index_read(index, 0) < 0))

This comment has been minimized.

@ethomson

ethomson Jun 30, 2018

Member

We're leaking this index now. I would encourage us to use a separate variable here, eg:

if ((error = git_repository_index(&repo_index, repo) < 0) ||
    (error = git_index_read(repo_index, 0) < 0))

And be sure to git_index_free this repo_index.

@ethomson

ethomson Jun 30, 2018

Member

We're leaking this index now. I would encourage us to use a separate variable here, eg:

if ((error = git_repository_index(&repo_index, repo) < 0) ||
    (error = git_index_read(repo_index, 0) < 0))

And be sure to git_index_free this repo_index.

@ethomson

This comment has been minimized.

Show comment
Hide comment
@ethomson

ethomson Oct 19, 2018

Member

@Etiene do you have interest in finishing this up? Or would you like me to fix those two last little bits?

Member

ethomson commented Oct 19, 2018

@Etiene do you have interest in finishing this up? Or would you like me to fix those two last little bits?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment