Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "dirty" state to the index when it has unsaved changes #4536

Merged
merged 13 commits into from
Jun 30, 2018

Conversation

ethomson
Copy link
Member

Add a new "dirty" state to the index when it has unsaved changes.

At present, you can load the index and make changes to it without saving those changes. However various functions in the library behave different when the repository's index has unsaved changes. For example, checkout will reload the index (by default), but merge will not.

This asymmetry is obviously a problem.

It was noted during an attempt to standardize this behavior that some people may be relying on the prior behavior that merge does not reload the index.

The discussion from this was that we should add a warning in the unlikely event that someone does rely on this behavior. This PR adds a "dirty" state to the PR - now if a consumer tries to act on the index in a way that changes would be discarded, they are warned. As a result, consumers must explicitly either commit their changes (using git_index_write) or discard their changes (through any of a variety of mechanisms, including git_index_read) before executing a function that will otherwise mutate the index (eg git_checkout, git_merge, etc).

The first few commits in this PR update tests that are incorrect regardless of the changes to add these restrictions and include a "dirty" state to the index. The remainder add this change and update several tests to explicitly commit the index before further mutating it.

This should (finally) pave the way for #4407.

@ethomson
Copy link
Member Author

Rebased.

@ethomson
Copy link
Member Author

ethomson commented May 9, 2018

Any thoughts on this before I 🚢 or is it self-evident?

Copy link
Contributor

@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few high-level comments, as I don't feel competent enough with The Index to really know what is going on (apart from the must-not-lose-staged-changes part of it 😉).

src/index.c Outdated
@@ -668,6 +674,9 @@ int git_index_read(git_index *index, int force)
if (!error)
git_futils_filestamp_set(&index->stamp, &stamp);

if (!error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe merge this with the if (!error) above ?

@@ -527,6 +529,7 @@ int git_index_clear(git_index *index)

assert(index);

index->dirty = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure, but this seems the inverse of the commit's message rationale ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. Clearing the index here without saving naturally has to leave the index dirty. I agree though that the commit message is confusing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; I think that originally I thought that clearing the index would make it not dirty and updated that later, but neglected the commit message. I've fixed it.

pks-t
pks-t previously requested changes May 18, 2018
Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my late feedback 🙄

@@ -25,6 +25,42 @@ void test_index_names__cleanup(void)
cl_git_sandbox_cleanup();
}

static void index_add_main(void)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit message you talk of "This test", but this commit doesn't do anything but add a new function which is never being called. The actual caller is only being added in the last commit of this series, which is a confusing when reading commits one by one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, one more thing: I don't really get what "main" is referring to in this context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; this got jumbled while rebasing. I fixed it, and I also changed the name to add_conflicts. Regarding "main", I was referring to the "main index" (as opposed to the other sections like the tree cache) but I updated it to avoid confusion.

const char **conflict;
size_t i;

for (i = 0; i < 3; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i < ARRAY_SIZE(paths)?

@@ -2397,20 +2397,20 @@ static int checkout_data_init(
GIT_DIR_MODE, GIT_MKDIR_VERIFY_DIR)) < 0)
goto cleanup;

if ((error = git_repository_index(&data->index, data->repo)) < 0)
goto cleanup;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, previously we were not loading the index into data->index when GIT_CHECKOUT_NO_REFRESH was set. This leaves me wondering about line 2441, where we unconditionally dereference data->index. Was this code broken previously and would segfault in case GIT_CHECKOUT_NO_REFRESH was set or is the index already loaded at this point in time, resulting in a memory leak due to an additional reference to the index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would have dereferenced that NULL index. the NO_REFRESH bit is really not a great pattern to use, so I'm not surprised that nobody's ever noticed this. However, I added a test to make sure that we don't regress again.

src/stash.c Outdated
@@ -159,7 +164,37 @@ struct stash_update_rules {
bool include_ignored;
};

/* similar to git_index_add_bypath but able to operate on any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-line comments should start with an empty /* line and upper-case

src/stash.c Outdated
int error;

if (!git_repository_is_bare(repo) &&
(error = git_repository_index__weakptr(&repo_index, repo)) < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to finally settle on a common style when it comes to multi-line conditionals. We currently have two different styles in our code base. The first one aligned with a tab

if (!git_repository_is_bare(repo) &&
	(error = git_repository_index__weakptr(&repo_index, repo)) < 0)
	return error;

and the same thing aligned with four spaces instead

if (!git_repository_is_bare(repo) &&
    (error = git_repository_index__weakptr(&repo_index, repo)) < 0)
	return error;

I'd say the second one is superior, as you can clearly distinguish conditionals from the body. Especially when there are multiple more than two lines in a conditional, this becomes rather valuable and makes code a lot easier to read, in my opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to finally settle on a common style when it comes to multi-line conditionals. We currently have two different styles in our code base

I... guess? I agree that we shouldn't have two different styles, and yes, some things have slipped in here because it really doesn't matter that much (more on that below). But it feels like this was a solved decision. We took git's style, which, from a cursory grep, is typically indenting these with tabs. And we typically indent with tabs.

It seems like anything that's slipped in has done just that - slipped in.

I'd say the second one is superior, as you can clearly distinguish conditionals from the body. Especially when there are multiple more than two lines in a conditional, this becomes rather valuable and makes code a lot easier to read, in my opinion.

I agree, but that's why I have my tabs set to four spaces.

But for the broader point: it's incredibly frustrating for contributors when they have to wait several weeks for a code review and then have that code review consist primarily of complaints about whitespace. (This one was open three months ago! Three months is a long time, and yes, I realize that the complaints are not only about whitespace.)

Does style matter? Sure. But they don't matter nearly enough to stop the velocity of the project, or to turn people off from contributing.

Can you imagine waiting weeks to get a code review then it's picking nits at whitespace changes? This is not the sign of a healthy community, and that person will probably never contribute again.

We need to fix this when it comes to reviewing pull requests. This is not a complaint at you, individually; I do mean we, the project.

As for the whitespace issues: if you really care about this, then add an automated step to the CI builds and give people something easy that they can run locally. Otherwise, we need to relax, IMHO.

Copy link
Member

@pks-t pks-t May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to finally settle on a common style when it comes to multi-line conditionals. We currently have two different styles in our code base
I... guess? I agree that we shouldn't have two different styles, and yes, some things have slipped in here because it really doesn't matter that much (more on that below). But it feels like this was a solved decision. We took git's style, which, from a cursory grep, is typically indenting these with tabs. And we typically indent with tabs.

It seems like anything that's slipped in has done just that - slipped in.

I might not have brought my intention along crlearly enough, but this was not a request for change but instead a clarification for what our agreed-upon style is. I had asked that specific question about how to indent multiline conditionals in the past and never got an answer (or at least I couldn't recall). So while I feel like the latter option is superior, I don't care which of both styles we use as long as it is consistent.

I agree, but that's why I have my tabs set to four spaces.

Weird. I always thought the projects recommendation was eight spaces, and that's why I always had a specific setting for libgit2 to display as eight spaces. But in fact, we do recommend to set it to four spaces.

But for the broader point: it's incredibly frustrating for contributors when they have to wait several weeks for a code review and then have that code review consist primarily of complaints about whitespace. (This one was open three months ago! Three months is a long time, and yes, I realize that the complaints are not only about whitespace.)

I know. And that's why I said in the past that I only add style-related critic in case there are other complaints about a PR.

Does style matter? Sure. But they don't matter nearly enough to stop the velocity of the project, or to turn people off from contributing.

Can you imagine waiting weeks to get a code review then it's picking nits at whitespace changes? This is not the sign of a healthy community, and that person will probably never contribute again.
We need to fix this when it comes to reviewing pull requests. This is not a complaint at you, individually; I do mean we, the project.

As for the whitespace issues: if you really care about this, then add an automated step to the CI builds and give people something easy that they can run locally. Otherwise, we need to relax, IMHO.

I'm definitely in favor of having an automated job to do this. While I agree that style shouldn't be the main concern, I think it in fact is important for a project. I've seen enough projects where I wanted to contribute to, but after taking a quick peek into the implementation immediately bailed out as the code was so inconsistently formatted that I had problems to see what was going on. So it in fact is a double-edged sword: you need to care enough about style to not let the code base deteriorate and get hard to read, but you shouldn't be as strict to drive away contributors.

As you mentioned, I think the best way forward would be to have an automated job that does this immediately after a PR was submitted. If properly implemented, this would help both roles: the contributor can quickly fix up stuff while he is still familiar with the topic, and the maintainer can focus on the important parts.

src/index.c Outdated
@@ -3185,6 +3212,7 @@ static int git_index_read_iterator(

clear_uptodate(index);

index->dirty = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? I cannot see us saving the index to disk here, so I'd consider it dirty

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I've changed it.

@@ -3605,6 +3616,7 @@ int git_indexwriter_commit(git_indexwriter *writer)
return -1;
}

writer->index->dirty = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it belongs to the previous commit, where you introduce the dirty bit?

assert_status_entrycount(g_repo, 1);
cl_git_pass(git_checkout_tree(g_repo, obj, &g_opts));

cl_git_pass(git_reset(g_repo, obj, GIT_RESET_HARD, NULL));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this change what we're actually testing? Previously, we've checked that git_checkout_tree resets the mode. Now we check whether git_reset resets the mode.

@@ -289,6 +290,10 @@ void test_checkout_typechange__checkout_with_conflicts(void)
cl_assert_equal_i(cts.updates, 0);
cl_assert_equal_i(cts.ignored, 0);

cl_git_pass(git_repository_index(&index, g_repo));
cl_git_pass(git_index_read(index, 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These interspersed changes of git_index_read leave me a bit uneasy. Does it mean that users of libgit2 may now face places where they have to add git_index_read or face errors otherwise? If so, we at least have to provide an update guide detailing in which exact situations this is now necessary, and even then people will complain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. This affects what happens when you are replacing the contents of the index (eg, checkout, merge) and you have unsaved changes, then you will need to save them (git_index_write) or revert them (git_index_read). If you do not, this new code path will raise an error.

I don't think that this is a common scenario (except in our tests, since they play fast and loose with things). But there is a very valid case that we would be breaking: checking out with the FORCE flag or a reset with the HARD flag when you have unsaved changes in the index. At the moment, you can do:

git_index_add(...);
/* no save */
git_reset(..., GIT_RESET_HARD);

And your unsaved changes are ignored - the index is updated to match the checkout / reset target. With this patch, this will be an error.

Increasingly, I think this is incorrect and that we should allow git_checkout / git_reset to operate when forcing. I think that this would fix most of the intentional use cases of this sort of behavior for people who were in a bad place but reset their way out of it, knowingly.

I'd also be happy to add a configurable option here for people who need an escape hatch to opt-in to the prior behavior. It's also possible that this is too heavy handed and we should keep the default as-is.

I'll follow up a commit that updates the reset / checkout behavior to allow forcing in this instance.

@@ -276,6 +276,7 @@ void test_cherrypick_workdir__conflict_use_ours(void)
opts.merge_opts.file_favor = GIT_MERGE_FILE_FAVOR_OURS;

cl_git_pass(git_reset(repo, (git_object *)head, GIT_RESET_HARD, NULL));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated whitespace change

@ethomson
Copy link
Member Author

I added a change so that when checking out with FORCE, we do not halt on a dirty index. This seems like a very correct change that is very much in keeping with the way arrbee had originally designed and specified GIT_CHECKOUT_FORCE, and how I would have expected things to work. That also simplifies the tests so that they now don't need to do the git_index_read dance to discard their changes.

@ethomson
Copy link
Member Author

So this leaves one realistic use case that we're blocking here now, which is GIT_CHECKOUT_SAFE with a dirty index. Previously we would have silently allowed this (obliterating the unsaved changes in the index), now we error on this. I doubt anybody was doing this before, but I appreciate your concern.

Proposal: add a git_libgit2_opt that controls this behavior, default to the current behavior (silently discarding dirty indexes before safe checkout and friends), document that we'll upgrade this to a warning or error in the future?

@pks-t
Copy link
Member

pks-t commented Jun 22, 2018

Proposal: add a git_libgit2_opt that controls this behavior, default to the current behavior (silently discarding dirty indexes before safe checkout and friends), document that we'll upgrade this to a warning or error in the future?

That sounds like a perfectly reasonable plan. I'm sold

Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new option I'm feeling a lot safer. I like how we start to be a lot more reasonable when it comes to interface and behaviour changes. Thanks!

@@ -55,6 +55,7 @@ typedef enum {
GIT_ITEROVER = -31, /**< Signals end of iteration with iterator */
GIT_RETRY = -32, /**< Internal only */
GIT_EMISMATCH = -33, /**< Hashsum mismatch in object */
GIT_EINDEXDIRTY = -34, /**< Unsaved changes in the index would be overwritten */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This time I'm fast to review, so I'm allowed to nag at the whitespace error here :P

* > Ensure that there are no unsaved changes in the index before
* > beginning any operation that reloads the index from disk (eg,
* > checkout). If there are unsaved changes, the instruction will
* > fail.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe mention that this does not apply when forcing operations?

index but you have not written the index to disk.

At present, this defaults to off, but we intend to enable this more
broadly in the future, as a warning or error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we urge consumers to review how they handle dirty indices and then enable the option?

We add entries into the main index to correspond with the NAME entries
that we're going to test.  NAME entries store the results of conflicts
occuring with rename detection during merge, and they must correspond to
conflicts in the index.

This test was mistakenly adding regular entries.  The checkout
validation failed, since it requires NAME entries to correspond to
high-stage (conflict) entries.  Correct the test to actually create
conflicts.
The index::reuc tests must test that the checkout itself succeeds,
otherwise subsequent tests are not valid.

In fact, the checkouts were failing because when checking out `SAFE`,
they cannot update the files that are in conflict.  Change the checkout
level to `FORCE` to ensure that they get updated correctly.
When running `git_index_add_all`, we should write the index to disk so
that we can re-read it safely during status.
Always set the `index` in the `checkout_data`, even in the case that we
are not reloading the index.  Other functionality in checkout examines
the index (for example: determining whether the workdir is modified) and
we need it even in the (uncommon) case that we are not reloading.
@ethomson
Copy link
Member Author

Cool. I've updated this with your suggestions, @pks-t. Thanks!

Copy link
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't have much to add to this PR. I didn't see anything that catched my eye in particular. I've been briefly asking myself whether git_index_read_safely should be exposed to users or whether it should be the default of git_index_read with force set to false. But to be honest I'm not particularly well versed in this area to have a real grasp of that.

cl_git_pass(git_repository_open(&other_repo, git_repository_workdir(g_repo)));
cl_git_pass(git_repository_index(&other_index, other_repo));

cl_git_pass(git_oid_fromstr(&entry.id, "1385f264afb75a56a5bec74243be9b367ba4ca08"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case you're re-rolling: whitespace issue

@pks-t pks-t dismissed their stale review June 29, 2018 12:25

Comments were adressed

Add tests that ensure that we re-read the on-disk image by default
during checkout, but when the `GIT_CHECKOUT_NO_REFRESH` option is
specified, we do _not_ re-read the index.
Don't manipulate the repository's index during stash; instead,
manipulate a temporary index and check it out.

This allows us to use the checkout mechanism to update the workdir and
the repository's index, and allows checkout to use its common mechanisms
to write data and handle errors.
Teach the index when it is "dirty", and has unsaved changes.  Consider
the index dirty whenever a caller has added or removed an entry from the
main index, REUC or NAME section, including when the index is completely
cleared.  Similarly, consider the index _not_ dirty immediately after it
is written, or when it is read from the on-disk index.

This allows us to ensure that unsaved changes are not lost when we
automatically refresh the index.
Test that any changes to the index will mark the index as dirty.  Also
ensure that when we initialize a new index, read the index contents
from disk, or write the index contents to disk that we reset the dirty
flag to zero.  Further ensure that an unforced read with dirty contents
(when the on-disk index has not changed) does _not_ reset the dirty
flag as we have not updated the contents of our index and our unsaved
contents remain intact.
Now that the index has a "dirty" state, where it has changes that have
not yet been committed or rolled back, our tests need to be adapted to
actually commit or rollback the changes instead of assuming that the
index can be operated on in its indeterminate state.
If the index is dirty, allow `GIT_CHECKOUT_FORCE` to obliterate unsaved
changes.  This is in keeping with its name and description.
When the index is dirty, return GIT_EINDEXDIRTY so that consumers can
identify the exact problem programatically.
Add the `GIT_OPT_ENABLE_UNSAVED_INDEX_SAFETY` option, which will cause
commands that reload the on-disk index to fail if the current
`git_index` has changed that have not been saved.  This will prevent
users from - for example - adding a file to the index then calling a
function like `git_checkout` and having that file be silently removed
from the index since it was re-read from disk.

Now calls that would re-read the index will fail if the index is
"dirty", meaning changes have been made to it but have not been written.
Users can either `git_index_read` to discard those changes explicitly,
or `git_index_write` to write them.
@ethomson
Copy link
Member Author

I've been briefly asking myself whether git_index_read_safely should be exposed to users or whether it should be the default of git_index_read with force set to false.

I don't think so. But I'll continue to give this some thought and do some testing with this functionality enabled before we make change changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants