-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a "dirty" state to the index when it has unsaved changes #4536
Conversation
a07ddf9
to
1767023
Compare
1767023
to
934486c
Compare
Rebased. |
Any thoughts on this before I 🚢 or is it self-evident? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few high-level comments, as I don't feel competent enough with The Index to really know what is going on (apart from the must-not-lose-staged-changes part of it 😉).
src/index.c
Outdated
@@ -668,6 +674,9 @@ int git_index_read(git_index *index, int force) | |||
if (!error) | |||
git_futils_filestamp_set(&index->stamp, &stamp); | |||
|
|||
if (!error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe merge this with the if (!error)
above ?
@@ -527,6 +529,7 @@ int git_index_clear(git_index *index) | |||
|
|||
assert(index); | |||
|
|||
index->dirty = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure, but this seems the inverse of the commit's message rationale ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well. Clearing the index here without saving naturally has to leave the index dirty. I agree though that the commit message is confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; I think that originally I thought that clearing the index would make it not dirty and updated that later, but neglected the commit message. I've fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my late feedback 🙄
tests/index/names.c
Outdated
@@ -25,6 +25,42 @@ void test_index_names__cleanup(void) | |||
cl_git_sandbox_cleanup(); | |||
} | |||
|
|||
static void index_add_main(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the commit message you talk of "This test", but this commit doesn't do anything but add a new function which is never being called. The actual caller is only being added in the last commit of this series, which is a confusing when reading commits one by one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, one more thing: I don't really get what "main" is referring to in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; this got jumbled while rebasing. I fixed it, and I also changed the name to add_conflicts
. Regarding "main", I was referring to the "main index" (as opposed to the other sections like the tree cache) but I updated it to avoid confusion.
tests/index/names.c
Outdated
const char **conflict; | ||
size_t i; | ||
|
||
for (i = 0; i < 3; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i < ARRAY_SIZE(paths)
?
@@ -2397,20 +2397,20 @@ static int checkout_data_init( | |||
GIT_DIR_MODE, GIT_MKDIR_VERIFY_DIR)) < 0) | |||
goto cleanup; | |||
|
|||
if ((error = git_repository_index(&data->index, data->repo)) < 0) | |||
goto cleanup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, previously we were not loading the index into data->index
when GIT_CHECKOUT_NO_REFRESH
was set. This leaves me wondering about line 2441, where we unconditionally dereference data->index
. Was this code broken previously and would segfault in case GIT_CHECKOUT_NO_REFRESH
was set or is the index already loaded at this point in time, resulting in a memory leak due to an additional reference to the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would have dereferenced that NULL
index. the NO_REFRESH
bit is really not a great pattern to use, so I'm not surprised that nobody's ever noticed this. However, I added a test to make sure that we don't regress again.
src/stash.c
Outdated
@@ -159,7 +164,37 @@ struct stash_update_rules { | |||
bool include_ignored; | |||
}; | |||
|
|||
/* similar to git_index_add_bypath but able to operate on any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi-line comments should start with an empty /*
line and upper-case
src/stash.c
Outdated
int error; | ||
|
||
if (!git_repository_is_bare(repo) && | ||
(error = git_repository_index__weakptr(&repo_index, repo)) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to finally settle on a common style when it comes to multi-line conditionals. We currently have two different styles in our code base. The first one aligned with a tab
if (!git_repository_is_bare(repo) &&
(error = git_repository_index__weakptr(&repo_index, repo)) < 0)
return error;
and the same thing aligned with four spaces instead
if (!git_repository_is_bare(repo) &&
(error = git_repository_index__weakptr(&repo_index, repo)) < 0)
return error;
I'd say the second one is superior, as you can clearly distinguish conditionals from the body. Especially when there are multiple more than two lines in a conditional, this becomes rather valuable and makes code a lot easier to read, in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to finally settle on a common style when it comes to multi-line conditionals. We currently have two different styles in our code base
I... guess? I agree that we shouldn't have two different styles, and yes, some things have slipped in here because it really doesn't matter that much (more on that below). But it feels like this was a solved decision. We took git's style, which, from a cursory grep, is typically indenting these with tabs. And we typically indent with tabs.
It seems like anything that's slipped in has done just that - slipped in.
I'd say the second one is superior, as you can clearly distinguish conditionals from the body. Especially when there are multiple more than two lines in a conditional, this becomes rather valuable and makes code a lot easier to read, in my opinion.
I agree, but that's why I have my tabs set to four spaces.
But for the broader point: it's incredibly frustrating for contributors when they have to wait several weeks for a code review and then have that code review consist primarily of complaints about whitespace. (This one was open three months ago! Three months is a long time, and yes, I realize that the complaints are not only about whitespace.)
Does style matter? Sure. But they don't matter nearly enough to stop the velocity of the project, or to turn people off from contributing.
Can you imagine waiting weeks to get a code review then it's picking nits at whitespace changes? This is not the sign of a healthy community, and that person will probably never contribute again.
We need to fix this when it comes to reviewing pull requests. This is not a complaint at you, individually; I do mean we, the project.
As for the whitespace issues: if you really care about this, then add an automated step to the CI builds and give people something easy that they can run locally. Otherwise, we need to relax, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to finally settle on a common style when it comes to multi-line conditionals. We currently have two different styles in our code base
I... guess? I agree that we shouldn't have two different styles, and yes, some things have slipped in here because it really doesn't matter that much (more on that below). But it feels like this was a solved decision. We took git's style, which, from a cursory grep, is typically indenting these with tabs. And we typically indent with tabs.It seems like anything that's slipped in has done just that - slipped in.
I might not have brought my intention along crlearly enough, but this was not a request for change but instead a clarification for what our agreed-upon style is. I had asked that specific question about how to indent multiline conditionals in the past and never got an answer (or at least I couldn't recall). So while I feel like the latter option is superior, I don't care which of both styles we use as long as it is consistent.
I agree, but that's why I have my tabs set to four spaces.
Weird. I always thought the projects recommendation was eight spaces, and that's why I always had a specific setting for libgit2 to display as eight spaces. But in fact, we do recommend to set it to four spaces.
But for the broader point: it's incredibly frustrating for contributors when they have to wait several weeks for a code review and then have that code review consist primarily of complaints about whitespace. (This one was open three months ago! Three months is a long time, and yes, I realize that the complaints are not only about whitespace.)
I know. And that's why I said in the past that I only add style-related critic in case there are other complaints about a PR.
Does style matter? Sure. But they don't matter nearly enough to stop the velocity of the project, or to turn people off from contributing.
Can you imagine waiting weeks to get a code review then it's picking nits at whitespace changes? This is not the sign of a healthy community, and that person will probably never contribute again.
We need to fix this when it comes to reviewing pull requests. This is not a complaint at you, individually; I do mean we, the project.As for the whitespace issues: if you really care about this, then add an automated step to the CI builds and give people something easy that they can run locally. Otherwise, we need to relax, IMHO.
I'm definitely in favor of having an automated job to do this. While I agree that style shouldn't be the main concern, I think it in fact is important for a project. I've seen enough projects where I wanted to contribute to, but after taking a quick peek into the implementation immediately bailed out as the code was so inconsistently formatted that I had problems to see what was going on. So it in fact is a double-edged sword: you need to care enough about style to not let the code base deteriorate and get hard to read, but you shouldn't be as strict to drive away contributors.
As you mentioned, I think the best way forward would be to have an automated job that does this immediately after a PR was submitted. If properly implemented, this would help both roles: the contributor can quickly fix up stuff while he is still familiar with the topic, and the maintainer can focus on the important parts.
src/index.c
Outdated
@@ -3185,6 +3212,7 @@ static int git_index_read_iterator( | |||
|
|||
clear_uptodate(index); | |||
|
|||
index->dirty = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? I cannot see us saving the index to disk here, so I'd consider it dirty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I've changed it.
@@ -3605,6 +3616,7 @@ int git_indexwriter_commit(git_indexwriter *writer) | |||
return -1; | |||
} | |||
|
|||
writer->index->dirty = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it belongs to the previous commit, where you introduce the dirty bit?
tests/checkout/tree.c
Outdated
assert_status_entrycount(g_repo, 1); | ||
cl_git_pass(git_checkout_tree(g_repo, obj, &g_opts)); | ||
|
||
cl_git_pass(git_reset(g_repo, obj, GIT_RESET_HARD, NULL)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this change what we're actually testing? Previously, we've checked that git_checkout_tree
resets the mode. Now we check whether git_reset
resets the mode.
tests/checkout/typechange.c
Outdated
@@ -289,6 +290,10 @@ void test_checkout_typechange__checkout_with_conflicts(void) | |||
cl_assert_equal_i(cts.updates, 0); | |||
cl_assert_equal_i(cts.ignored, 0); | |||
|
|||
cl_git_pass(git_repository_index(&index, g_repo)); | |||
cl_git_pass(git_index_read(index, 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These interspersed changes of git_index_read
leave me a bit uneasy. Does it mean that users of libgit2 may now face places where they have to add git_index_read
or face errors otherwise? If so, we at least have to provide an update guide detailing in which exact situations this is now necessary, and even then people will complain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. This affects what happens when you are replacing the contents of the index (eg, checkout, merge) and you have unsaved changes, then you will need to save them (git_index_write
) or revert them (git_index_read
). If you do not, this new code path will raise an error.
I don't think that this is a common scenario (except in our tests, since they play fast and loose with things). But there is a very valid case that we would be breaking: checking out with the FORCE
flag or a reset with the HARD
flag when you have unsaved changes in the index. At the moment, you can do:
git_index_add(...);
/* no save */
git_reset(..., GIT_RESET_HARD);
And your unsaved changes are ignored - the index is updated to match the checkout / reset target. With this patch, this will be an error.
Increasingly, I think this is incorrect and that we should allow git_checkout
/ git_reset
to operate when forcing. I think that this would fix most of the intentional use cases of this sort of behavior for people who were in a bad place but reset
their way out of it, knowingly.
I'd also be happy to add a configurable option here for people who need an escape hatch to opt-in to the prior behavior. It's also possible that this is too heavy handed and we should keep the default as-is.
I'll follow up a commit that updates the reset
/ checkout
behavior to allow forcing in this instance.
tests/cherrypick/workdir.c
Outdated
@@ -276,6 +276,7 @@ void test_cherrypick_workdir__conflict_use_ours(void) | |||
opts.merge_opts.file_favor = GIT_MERGE_FILE_FAVOR_OURS; | |||
|
|||
cl_git_pass(git_reset(repo, (git_object *)head, GIT_RESET_HARD, NULL)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated whitespace change
934486c
to
5c8a730
Compare
I added a change so that when checking out with |
So this leaves one realistic use case that we're blocking here now, which is Proposal: add a |
That sounds like a perfectly reasonable plan. I'm sold |
750c659
to
9ef37a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new option I'm feeling a lot safer. I like how we start to be a lot more reasonable when it comes to interface and behaviour changes. Thanks!
include/git2/errors.h
Outdated
@@ -55,6 +55,7 @@ typedef enum { | |||
GIT_ITEROVER = -31, /**< Signals end of iteration with iterator */ | |||
GIT_RETRY = -32, /**< Internal only */ | |||
GIT_EMISMATCH = -33, /**< Hashsum mismatch in object */ | |||
GIT_EINDEXDIRTY = -34, /**< Unsaved changes in the index would be overwritten */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time I'm fast to review, so I'm allowed to nag at the whitespace error here :P
include/git2/common.h
Outdated
* > Ensure that there are no unsaved changes in the index before | ||
* > beginning any operation that reloads the index from disk (eg, | ||
* > checkout). If there are unsaved changes, the instruction will | ||
* > fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe mention that this does not apply when forcing operations?
docs/changelog.md
Outdated
index but you have not written the index to disk. | ||
|
||
At present, this defaults to off, but we intend to enable this more | ||
broadly in the future, as a warning or error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we urge consumers to review how they handle dirty indices and then enable the option?
9ef37a4
to
d619cce
Compare
We add entries into the main index to correspond with the NAME entries that we're going to test. NAME entries store the results of conflicts occuring with rename detection during merge, and they must correspond to conflicts in the index. This test was mistakenly adding regular entries. The checkout validation failed, since it requires NAME entries to correspond to high-stage (conflict) entries. Correct the test to actually create conflicts.
The index::reuc tests must test that the checkout itself succeeds, otherwise subsequent tests are not valid. In fact, the checkouts were failing because when checking out `SAFE`, they cannot update the files that are in conflict. Change the checkout level to `FORCE` to ensure that they get updated correctly.
When running `git_index_add_all`, we should write the index to disk so that we can re-read it safely during status.
Always set the `index` in the `checkout_data`, even in the case that we are not reloading the index. Other functionality in checkout examines the index (for example: determining whether the workdir is modified) and we need it even in the (uncommon) case that we are not reloading.
d619cce
to
a670454
Compare
Cool. I've updated this with your suggestions, @pks-t. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't have much to add to this PR. I didn't see anything that catched my eye in particular. I've been briefly asking myself whether git_index_read_safely
should be exposed to users or whether it should be the default of git_index_read
with force
set to false
. But to be honest I'm not particularly well versed in this area to have a real grasp of that.
tests/checkout/tree.c
Outdated
cl_git_pass(git_repository_open(&other_repo, git_repository_workdir(g_repo))); | ||
cl_git_pass(git_repository_index(&other_index, other_repo)); | ||
|
||
cl_git_pass(git_oid_fromstr(&entry.id, "1385f264afb75a56a5bec74243be9b367ba4ca08")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case you're re-rolling: whitespace issue
Add tests that ensure that we re-read the on-disk image by default during checkout, but when the `GIT_CHECKOUT_NO_REFRESH` option is specified, we do _not_ re-read the index.
Don't manipulate the repository's index during stash; instead, manipulate a temporary index and check it out. This allows us to use the checkout mechanism to update the workdir and the repository's index, and allows checkout to use its common mechanisms to write data and handle errors.
Teach the index when it is "dirty", and has unsaved changes. Consider the index dirty whenever a caller has added or removed an entry from the main index, REUC or NAME section, including when the index is completely cleared. Similarly, consider the index _not_ dirty immediately after it is written, or when it is read from the on-disk index. This allows us to ensure that unsaved changes are not lost when we automatically refresh the index.
Test that any changes to the index will mark the index as dirty. Also ensure that when we initialize a new index, read the index contents from disk, or write the index contents to disk that we reset the dirty flag to zero. Further ensure that an unforced read with dirty contents (when the on-disk index has not changed) does _not_ reset the dirty flag as we have not updated the contents of our index and our unsaved contents remain intact.
Now that the index has a "dirty" state, where it has changes that have not yet been committed or rolled back, our tests need to be adapted to actually commit or rollback the changes instead of assuming that the index can be operated on in its indeterminate state.
If the index is dirty, allow `GIT_CHECKOUT_FORCE` to obliterate unsaved changes. This is in keeping with its name and description.
When the index is dirty, return GIT_EINDEXDIRTY so that consumers can identify the exact problem programatically.
Add the `GIT_OPT_ENABLE_UNSAVED_INDEX_SAFETY` option, which will cause commands that reload the on-disk index to fail if the current `git_index` has changed that have not been saved. This will prevent users from - for example - adding a file to the index then calling a function like `git_checkout` and having that file be silently removed from the index since it was re-read from disk. Now calls that would re-read the index will fail if the index is "dirty", meaning changes have been made to it but have not been written. Users can either `git_index_read` to discard those changes explicitly, or `git_index_write` to write them.
a670454
to
243d40d
Compare
I don't think so. But I'll continue to give this some thought and do some testing with this functionality enabled before we make change changes. |
Add a new "dirty" state to the index when it has unsaved changes.
At present, you can load the index and make changes to it without saving those changes. However various functions in the library behave different when the repository's index has unsaved changes. For example, checkout will reload the index (by default), but merge will not.
This asymmetry is obviously a problem.
It was noted during an attempt to standardize this behavior that some people may be relying on the prior behavior that merge does not reload the index.
The discussion from this was that we should add a warning in the unlikely event that someone does rely on this behavior. This PR adds a "dirty" state to the PR - now if a consumer tries to act on the index in a way that changes would be discarded, they are warned. As a result, consumers must explicitly either commit their changes (using
git_index_write
) or discard their changes (through any of a variety of mechanisms, includinggit_index_read
) before executing a function that will otherwise mutate the index (eggit_checkout
,git_merge
, etc).The first few commits in this PR update tests that are incorrect regardless of the changes to add these restrictions and include a "dirty" state to the index. The remainder add this change and update several tests to explicitly commit the index before further mutating it.
This should (finally) pave the way for #4407.