Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove more index compatibility macros #830

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Dec 31, 2020

UPDATE: this is now based on ag/merge-strategies-in-c to avoid conflicts in 'seen'. The changes in builtin/rm.c still conflict with mt/rm-sparse-checkout, but that branch seems to be waiting for a clearer plan on some corner cases. I thought about ejecting it, but 'rm' still uses ce_match_stat(), so just dropping the patch gives less of a final stake at the end of the series. (I'm still open to it, if necessary.)

I noticed that Duy's project around USE_THE_INDEX_COMPATIBILITY_MACROS has been on pause for a while. Here is my attempt to continue that project a little.

I started going through the builtins that still use cache_name_pos() and the first was easy: mv and rm.

Then I hit update-index and it was a bit bigger.

My strategy for update-index was to create static globals "repo" and "istate" that point to the_repository and the_index, respectively. Then, I was able to remove macros one-by-one without changing method prototypes within the file. Then, these static globals were also removed by systematically updating the local method prototypes, plus some fancy structure stuff for the option parsing callbacks.

I had started trying to keep everything local to the method signatures, but I hit a snag when reaching the command-line parsing callbacks, which I could not modify their call signature. At that point, I had something that was already much more complicated than what I present now. Outside of the first update-index commit, everything was a mechanical find/replace.

In total, this allows us to remove four of the compatibility macros because they are no longer used.

Updates in V3

  • Methods that know about the 'repo' pointer no longer also have an 'istate' pointer and instead prefer 'repo->index'

  • This includes the callback_data struct which only has a 'repo' member, no 'istate'.

Thanks,
-Stolee

Cc: pclouds@gmail.com
Cc: gitster@pobox.com
cc: Elijah Newren newren@gmail.com
cc: Eric Sunshine sunshine@sunshineco.com
cc: Alban Gruin alban.gruin@gmail.com
cc: Derrick Stolee stolee@gmail.com

@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 1, 2021

Submitted as pull.830.git.1609506428.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-830/derrickstolee/index-compatibility-1-v1

To fetch this version to local tag pr-830/derrickstolee/index-compatibility-1-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-830/derrickstolee/index-compatibility-1-v1

builtin/update-index.c Outdated Show resolved Hide resolved
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 1, 2021

User Elijah Newren <newren@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 1, 2021

On the Git mailing list, Elijah Newren wrote (reply to this):

On Fri, Jan 1, 2021 at 5:10 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> I noticed that Duy's project around USE_THE_INDEX_COMPATIBILITY_MACROS has
> been on pause for a while. Here is my attempt to continue that project a
> little.
>
> I started going through the builtins that still use cache_name_pos() and the
> first few were easy: merge-inex, mv, rm.
>
> Then I hit update-index and it was a bit bigger. It's included here as well.
>
> My strategy for update-index was to create static globals "repo" and
> "istate" that point to the_repository and the_index, respectively. Then, I
> was able to remove macros one-by-one without changing method prototypes
> within the file.
>
> I had started trying to keep everything local to the method signatures, but
> I hit a snag when reaching the command-line parsing callbacks, which I could
> not modify their call signature. At that point, I had something that was
> already much more complicated than what I present now. Outside of the first
> update-index commit, everything was a mechanical find/replace.
>
> In total, this allows us to remove four of the compatibility macros because
> they are no longer used.

This series is divided nicely in a way that makes review easy.  I've
made some of these same types of changes elsewhere, and the whole
series is really just a long sequence of mechanical changes (plus a
case or two of fixing formatting on a line you were changing anyway,
such as adding spaces around an operator).  I think the work to
continue dropping the implicit dependency on the_index is helpful.

I only found two minor suggestions for improving the commit messages;
the patches all look good to me.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 2, 2021

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Fri, Jan 1, 2021 at 8:09 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> My strategy for update-index was to create static globals "repo" and
> "istate" that point to the_repository and the_index, respectively. Then, I
> was able to remove macros one-by-one without changing method prototypes
> within the file.
>
> I had started trying to keep everything local to the method signatures, but
> I hit a snag when reaching the command-line parsing callbacks, which I could
> not modify their call signature. [...]

You should be able to do this, not by modifying the callback
signature, but by taking advantage of the `extra` member of `struct
option` which is available to callback functions or arbitrary use. If
you need to access the index in a callback, then assign a `struct
index_state *` to `extra`; likewise assign a `struct repository *` to
`extra` to access the repository. If you need access to both the index
and the repository, then just store the repository in `extra` since
the repository has an `index` field.

You won't be able to use any of the canned OPT_FOO() macros to
initialize an entry in the update-index.c `options[]` array which
needs `extra`-initialization since the macros don't let you specify
`extra`, but you can easily bypass the macro and initialize the
`struct option` manually. (After all, the macros exist for
convenience; they are not a hard requirement.)

Within the callback, extract the `repository` or `index_state` as you
would any other field. For instance:

    const struct repository *repo = opt->extra;

This should allow you to get rid of the globals introduced by patch
[4/12] (assuming passing the index and repo arguments around
everywhere doesn't get overly hairy).

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 2, 2021

User Eric Sunshine <sunshine@sunshineco.com> has been added to the cc: list.

@@ -1,23 +1,23 @@
#define USE_THE_INDEX_COMPATIBILITY_MACROS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Alban Gruin wrote (reply to this):

Hi Derrick,

Le 01/01/2021 à 14:06, Derrick Stolee via GitGitGadget a écrit :
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> Replace uses of the old macros for the_index and instead pass around a
> 'struct index_state' pointer. This allows dropping the compatibility
> flag.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

I already libified builtin/merge-index.c in ag/merge-strategies-in-c,
and such dropped the_index.  I modified merge_entry(), merge_one_path()
and merge_all() to take a callback, itself taking a repository.  As
such, in my series, these functions take a `struct repository *' instead
of an index state.

I'm not sure how we should proceed with our respective patches.

Cheers,
Alban

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 1/3/2021 6:31 PM, Alban Gruin wrote:
> Hi Derrick,
> 
> Le 01/01/2021 à 14:06, Derrick Stolee via GitGitGadget a écrit :
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Replace uses of the old macros for the_index and instead pass around a
>> 'struct index_state' pointer. This allows dropping the compatibility
>> flag.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> 
> I already libified builtin/merge-index.c in ag/merge-strategies-in-c,
> and such dropped the_index.  I modified merge_entry(), merge_one_path()
> and merge_all() to take a callback, itself taking a repository.  As
> such, in my series, these functions take a `struct repository *' instead
> of an index state.
> 
> I'm not sure how we should proceed with our respective patches.

Hi Alban,

Sorry I didn't realize that. I'll drop this patch. Thanks for letting
me know!

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 3, 2021

User Alban Gruin <alban.gruin@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 4, 2021

User Derrick Stolee <stolee@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 4, 2021

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 1/2/2021 1:12 AM, Eric Sunshine wrote:
> On Fri, Jan 1, 2021 at 8:09 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> My strategy for update-index was to create static globals "repo" and
>> "istate" that point to the_repository and the_index, respectively. Then, I
>> was able to remove macros one-by-one without changing method prototypes
>> within the file.
>>
>> I had started trying to keep everything local to the method signatures, but
>> I hit a snag when reaching the command-line parsing callbacks, which I could
>> not modify their call signature. [...]
> 
> You should be able to do this, not by modifying the callback
> signature, but by taking advantage of the `extra` member of `struct
> option` which is available to callback functions or arbitrary use. If
> you need to access the index in a callback, then assign a `struct
> index_state *` to `extra`; likewise assign a `struct repository *` to
> `extra` to access the repository. If you need access to both the index
> and the repository, then just store the repository in `extra` since
> the repository has an `index` field.
> 
> You won't be able to use any of the canned OPT_FOO() macros to
> initialize an entry in the update-index.c `options[]` array which
> needs `extra`-initialization since the macros don't let you specify
> `extra`, but you can easily bypass the macro and initialize the
> `struct option` manually. (After all, the macros exist for
> convenience; they are not a hard requirement.)
> 
> Within the callback, extract the `repository` or `index_state` as you
> would any other field. For instance:
> 
>     const struct repository *repo = opt->extra;

Yes, this is definitely the way to make it possible.

> This should allow you to get rid of the globals introduced by patch
> [4/12] (assuming passing the index and repo arguments around
> everywhere doesn't get overly hairy).

My attempts just getting to the point of hitting these callbacks was
already making me frustrated with how complicated the code became with
that approach.

Perhaps now that I've removed the compatibility macros, it would be
easier to insert the method parameters since most of the lines that
need to change would be method prototypes and the calls to those methods
(plus the callback function details).

Is that a valuable effort? I could give it a try, but I want to be sure
that adjusting all of those helper methods in the builtin would indeed
have valuable improvements over the static globals used here.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 4, 2021

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Sun, Jan 3, 2021 at 8:01 PM Derrick Stolee <stolee@gmail.com> wrote:
> On 1/2/2021 1:12 AM, Eric Sunshine wrote:
> > This should allow you to get rid of the globals introduced by patch
> > [4/12] (assuming passing the index and repo arguments around
> > everywhere doesn't get overly hairy).
>
> Perhaps now that I've removed the compatibility macros, it would be
> easier to insert the method parameters since most of the lines that
> need to change would be method prototypes and the calls to those methods
> (plus the callback function details).
>
> Is that a valuable effort? I could give it a try, but I want to be sure
> that adjusting all of those helper methods in the builtin would indeed
> have valuable improvements over the static globals used here.

My impression was that the goal of the earlier work was to pass the
index and repository to each function specifically to avoid tying the
function to a particular index or repository. This helps in cases in
which client code needs to operate on a different index or repository
(for instance, a submodule). Generally speaking, making the index and
repository file-static rather than global does not help reach that
goal since functions are still tied to state which is not local to the
function itself.

Would the extra effort be valuable in this particular case? I'm not
familiar with this code, but given that `update-index` is a builtin,
such effort may not be too meaningful. If, however, any of the code
from `buildin/update-index.c` ever gets "libified" and moved into the
core library, then that would be a good time to update the functions
to take those values as arguments rather than relying on file-static
or globals. But that's not something that this series necessarily
needs to do; the task can wait until the code needs to be shared by
other modules, I would think.

@derrickstolee derrickstolee force-pushed the index-compatibility-1 branch 2 times, most recently from c65b4a8 to 9d8e48b Compare January 5, 2021 04:19
The mv builtin uses the compatibility macros to interact with the index.
Update these to use modern methods referring to a 'struct index_state'
pointer. Several helper methods need to be updated to consider such a
pointer, but the modifications are rudimentary.

Two macros can be deleted from cache.h because these are the last uses.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The rm builtin still uses the antiquated compatibility macros for
interacting with the index. Update these to the more modern uses by
passing around a 'struct index_state' pointer.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee derrickstolee changed the base branch from master to ag/merge-strategies-in-c January 5, 2021 04:28
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2021

Submitted as pull.830.v2.git.1609821783.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-830/derrickstolee/index-compatibility-1-v2

To fetch this version to local tag pr-830/derrickstolee/index-compatibility-1-v2:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-830/derrickstolee/index-compatibility-1-v2

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2021

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 1/4/2021 1:22 AM, Eric Sunshine wrote:
> On Sun, Jan 3, 2021 at 8:01 PM Derrick Stolee <stolee@gmail.com> wrote:
>> On 1/2/2021 1:12 AM, Eric Sunshine wrote:
>>> This should allow you to get rid of the globals introduced by patch
>>> [4/12] (assuming passing the index and repo arguments around
>>> everywhere doesn't get overly hairy).
>>
>> Perhaps now that I've removed the compatibility macros, it would be
>> easier to insert the method parameters since most of the lines that
>> need to change would be method prototypes and the calls to those methods
>> (plus the callback function details).
>>
>> Is that a valuable effort? I could give it a try, but I want to be sure
>> that adjusting all of those helper methods in the builtin would indeed
>> have valuable improvements over the static globals used here.
> 
> My impression was that the goal of the earlier work was to pass the
> index and repository to each function specifically to avoid tying the
> function to a particular index or repository. This helps in cases in
> which client code needs to operate on a different index or repository
> (for instance, a submodule). Generally speaking, making the index and
> repository file-static rather than global does not help reach that
> goal since functions are still tied to state which is not local to the
> function itself.
> 
> Would the extra effort be valuable in this particular case? I'm not
> familiar with this code, but given that `update-index` is a builtin,
> such effort may not be too meaningful. If, however, any of the code
> from `buildin/update-index.c` ever gets "libified" and moved into the
> core library, then that would be a good time to update the functions
> to take those values as arguments rather than relying on file-static
> or globals. But that's not something that this series necessarily
> needs to do; the task can wait until the code needs to be shared by
> other modules, I would think.

I tried again tonight, and it started getting messy, but then I
realized that I could group the callbacks that need the repo and
index to use a common struct that holds the other parameters they
need. It's still a bigger patch than I'd like, but it is more
reasonable.

v2 is incoming with my attempt at this.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 6, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> My strategy for update-index was to create static globals "repo" and
> "istate" that point to the_repository and the_index, respectively. Then, I
> was able to remove macros one-by-one without changing method prototypes
> within the file.

Knee-jerk reaction: swapping one pair of global with another?  Would
that give us enough upside?  It may allow some codepaths involved to
work on an in-core index instance that is different from the_index,
but does not make them reentrant.

Do we now have callers that actually pass an in-core index instance
that is different from the_index, and more importantly, that fail
loudly if the codepaths involved in this conversion forgets to
update some accesses to the_index not to the specified one?

If not, ... 

> In total, this allows us to remove four of the compatibility macros because
> they are no longer used.

... a conversion like this, removing the use of the compatibility
macros for the sake of removing them, invites future headaches by
leaving untested code churn behind with potential bugs that will
only get discovered after somebody actually starts making calls
with the non-default in-core index instances.

I've come to know the competence of you well enough to trust your
patches like patches from other proficient, prolific and prominent
contributors (I won't name names, but you know who you are), but we
are all human and are prone to introduce bugs.

That's all my knee-jerk impression before actually reading the
series through, though.  I'll certainloy know more after reading
them.

Thanks.

> Derrick Stolee (12):
>   merge-index: drop index compatibility macros
>   mv: remove index compatibility macros
>   rm: remove compatilibity macros
>   update-index: drop the_index, the_repository
>   update-index: use istate->cache over active_cache
>   update-index: use index->cache_nr over active_nr
>   update-index: use istate->cache_changed
>   update-index: use index_name_pos() over cache_name_pos()
>   update-index: use remove_file_from_index()
>   update-index: use add_index_entry()
>   update-index: replace several compatibility macros
>   update-index: remove ce_match_stat(), all macros
>
>  Documentation/technical/racy-git.txt |   6 +-
>  builtin/merge-index.c                |  33 +++---
>  builtin/mv.c                         |  42 ++++----
>  builtin/rm.c                         |  56 ++++++-----
>  builtin/update-index.c               | 145 ++++++++++++++-------------
>  cache.h                              |   4 -
>  6 files changed, 149 insertions(+), 137 deletions(-)
>
>
> base-commit: 71ca53e8125e36efbda17293c50027d31681a41f
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-830%2Fderrickstolee%2Findex-compatibility-1-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-830/derrickstolee/index-compatibility-1-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/830

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 6, 2021

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 1/5/2021 10:55 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> My strategy for update-index was to create static globals "repo" and
>> "istate" that point to the_repository and the_index, respectively. Then, I
>> was able to remove macros one-by-one without changing method prototypes
>> within the file.
> 
> Knee-jerk reaction: swapping one pair of global with another?  Would
> that give us enough upside?  It may allow some codepaths involved to
> work on an in-core index instance that is different from the_index,
> but does not make them reentrant.

My intention was to reduce the use of globals in libgit.a while keeping
with existing patterns of static globals in the builtin code. While
this can be thought of "module variables" instead of true globals, they
aren't exactly desirable. In v2, these static globals are temporary to
the series and are completely removed by the end.

The new patch sequence can hopefully be seen as "this preprocessor
macro was expanded" and then "static globals are replaced with
method parameters" which are pretty straightforward.

> Do we now have callers that actually pass an in-core index instance
> that is different from the_index, and more importantly, that fail
> loudly if the codepaths involved in this conversion forgets to
> update some accesses to the_index not to the specified one?
> 
> If not, ... 
> 
>> In total, this allows us to remove four of the compatibility macros because
>> they are no longer used.
> 
> ... a conversion like this, removing the use of the compatibility
> macros for the sake of removing them, invites future headaches by
> leaving untested code churn behind with potential bugs that will
> only get discovered after somebody actually starts making calls
> with the non-default in-core index instances.

Perhaps I had misunderstood the state of the conversion project. I
thought that the full conversion was just paused because Duy moved
on to other things. I thought it might be valuable to pick up the
baton while also thinking about the space.

If this is _not_ a valuable project to continue, then I can hold
off for now.

Unfortunately, we'll never know if everything is safe from assuming
the_index until the macro itself is gone. It helps that libgit.a
doesn't use it at 

> I've come to know the competence of you well enough to trust your
> patches like patches from other proficient, prolific and prominent
> contributors (I won't name names, but you know who you are), but we
> are all human and are prone to introduce bugs.

That means a lot, thanks. And yes, I'm well aware that bugs can be
introduced. I've added my share.

> That's all my knee-jerk impression before actually reading the
> series through, though.  I'll certainloy know more after reading
> them.

I look forward to your thoughts.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 6, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <stolee@gmail.com> writes:

> My intention was to reduce the use of globals in libgit.a while keeping
> with existing patterns of static globals in the builtin code.

The above (without having thought about it too hard or long enough
yet) sounds a sensible guideline to go by.  Thanks for a helpful
hint to keep in mind while reading the series.

> While
> this can be thought of "module variables" instead of true globals, they
> aren't exactly desirable. In v2, these static globals are temporary to
> the series and are completely removed by the end.

;-)

@@ -3,7 +3,6 @@
*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Mon, Jan 4, 2021 at 11:43 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> In order to remove index compatibility macros cleanly, we relied upon
> static globals 'repo' and 'istate' to be pointers to the_repository and
> the_index, respectively. We remove these static globals inside the
> option parsing callbacks, which are the final uses in update-index.
>
> The callbacks cannot change their method signature, so we must use the
> value member of 'struct option', assigned in the array of option macros.
> There are several callback methods that require at least one of 'repo'
> and 'istate', but they use a variety of different data types for the
> callback value.
>
> Unify these callback methods to use a consistent 'struct callback_data'
> that contains 'repo' and 'istate', ready to use. This takes the place of
> the previous 'struct refresh_params' which served only to group the
> 'flags' and 'has_errors' ints. We also collect other one-off settings,
> but only those that require access to the index or repository in their
> operation.

Makes sense. The patch itself is necessarily a bit noisy, but there's
nothing particularly complicated in that noise.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> @@ -784,19 +784,21 @@ static int do_reupdate(struct repository *repo,
> -struct refresh_params {
> +struct callback_data {
> +       struct repository *repo;
> +       struct index_state *istate;
> +
>         unsigned int flags;
> -       int *has_errors;
> +       unsigned int has_errors;
> +       unsigned nul_term_line;
> +       unsigned read_from_stdin;
>  };

The only mildly unexpected thing here is that `has_errors` is now a
simple value rather than a pointer to a value, but you handle that
easily enough by always accessing `has_error` directly from the
structure, even within the function in which `has_error` used to be a
local variable. Fine.

> @@ -818,7 +820,7 @@ static int really_refresh_callback(const struct option *opt,
>  static int chmod_callback(const struct option *opt,
> -                               const char *arg, int unset)
> +                         const char *arg, int unset)
> @@ -829,11 +831,12 @@ static int chmod_callback(const struct option *opt,
>  static int resolve_undo_clear_callback(const struct option *opt,
> -                               const char *arg, int unset)
> +                                      const char *arg, int unset)

A couple drive-by indentation fixes. Okay.

> @@ -1098,8 +1103,13 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
> -       istate = repo->index;
> +       cd.repo = repo;
> +       cd.istate = istate = repo->index;

Will there ever be a case in which `cd.istate` will be different from
`cd.repo->index`? If not, then we could get by with having only
`cd.repo`; callers requiring access to `istate` can fetch it from
`cd.repo`. If, on the other hand, `cd.istate` can be different from
`cd.repo->istate` -- or if that might become a possibility in the
future -- then having `cd.istate` makes sense. Not a big deal, though.
Just generally curious about it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 1/7/2021 12:09 AM, Eric Sunshine wrote:
> On Mon, Jan 4, 2021 at 11:43 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> @@ -1098,8 +1103,13 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>> -       istate = repo->index;
>> +       cd.repo = repo;
>> +       cd.istate = istate = repo->index;
> 
> Will there ever be a case in which `cd.istate` will be different from
> `cd.repo->index`? If not, then we could get by with having only
> `cd.repo`; callers requiring access to `istate` can fetch it from
> `cd.repo`. If, on the other hand, `cd.istate` can be different from
> `cd.repo->istate` -- or if that might become a possibility in the
> future -- then having `cd.istate` makes sense. Not a big deal, though.
> Just generally curious about it.
 
I don't believe that 'istate' and 'repo->index' will ever be
different in this file. This includes the members of the
callback_data struct, but also the method parameters throughout.

Mostly, this could be seen as an artifact of how we got here:

1. References to the_index or other compatibility macros were
   converted to use the static global 'istate'.

2. References to the static global 'istate' were replaced with
   method parameters for everything except these callbacks.

3. These callbacks were updated to use 'cd.istate' instead of
   the (now defunct) static global 'istate'.

It could be possible to replace all references to 'istate' with
'repo->index' but the patches get slightly more messy. I also
think the code looks messier, but you do make a good point that
there is no concrete reason to separate the two.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Thu, Jan 7, 2021 at 6:19 AM Derrick Stolee <stolee@gmail.com> wrote:
> On 1/7/2021 12:09 AM, Eric Sunshine wrote:
> > Will there ever be a case in which `cd.istate` will be different from
> > `cd.repo->index`? If not, then we could get by with having only
> > `cd.repo`; callers requiring access to `istate` can fetch it from
> > `cd.repo`. If, on the other hand, `cd.istate` can be different from
> > `cd.repo->istate` -- or if that might become a possibility in the
> > future -- then having `cd.istate` makes sense. Not a big deal, though.
> > Just generally curious about it.
>
> I don't believe that 'istate' and 'repo->index' will ever be
> different in this file. This includes the members of the
> callback_data struct, but also the method parameters throughout.
>
> It could be possible to replace all references to 'istate' with
> 'repo->index' but the patches get slightly more messy. I also
> think the code looks messier, but you do make a good point that
> there is no concrete reason to separate the two.

I agree that it would make the code a bit noisier (to read) if
`istate` is eliminated from the callback structure, however, even
though I didn't originally feel strongly one way or the other about
having both `repo` and `istate` in the structure, I'm now leaning more
toward seeing `istate` eliminated. My one (big) concern with `istate`
is that it confuses readers into wondering whether `istate` and
`repo->istate` will ever be different. One way to avoid such confusion
would be to leave a comment in the code stating that the two values
will always be the same. The other way, of course, is to eliminate
`istate` from the structure altogether. I don't want to make more work
for you, but the more I think about it, the more I feel that removing
`istate` is the sensible thing to do. (And it doesn't require an extra
patch -- it can just be how this patch is crafted -- without ever
introducing `istate` to the structure in the first place.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Eric Sunshine <sunshine@sunshineco.com> writes:

>> It could be possible to replace all references to 'istate' with
>> 'repo->index' but the patches get slightly more messy. I also
>> think the code looks messier, but you do make a good point that
>> there is no concrete reason to separate the two.
>
> I agree that it would make the code a bit noisier (to read) if
> `istate` is eliminated from the callback structure, however, even
> though I didn't originally feel strongly one way or the other about
> having both `repo` and `istate` in the structure, I'm now leaning more
> toward seeing `istate` eliminated. My one (big) concern with `istate`
> is that it confuses readers into wondering whether `istate` and
> `repo->istate` will ever be different.

Some applications may want to work on more than one in-core index at
the same time (perhaps a merge strategy may want to keep a copy of
the original index and update a second copy with the result of the
merge), and it may be useful for such applications if 'repo->istate'
does not have to be the in-core index instance to be worked on.  So
things that go in libgit.a may want to allow such distinction.

But what goes in builtin/ is a different story.  As long as this
application has no need for such a feature and will always work on
the primary in-core index, prepared for the in-core repository
structure for convenience, it may not worth it to support such a
feature that no callers benefit from.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 1/7/2021 2:57 PM, Junio C Hamano wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> 
>>> It could be possible to replace all references to 'istate' with
>>> 'repo->index' but the patches get slightly more messy. I also
>>> think the code looks messier, but you do make a good point that
>>> there is no concrete reason to separate the two.
>>
>> I agree that it would make the code a bit noisier (to read) if
>> `istate` is eliminated from the callback structure, however, even
>> though I didn't originally feel strongly one way or the other about
>> having both `repo` and `istate` in the structure, I'm now leaning more
>> toward seeing `istate` eliminated. My one (big) concern with `istate`
>> is that it confuses readers into wondering whether `istate` and
>> `repo->istate` will ever be different.
> 
> Some applications may want to work on more than one in-core index at
> the same time (perhaps a merge strategy may want to keep a copy of
> the original index and update a second copy with the result of the
> merge), and it may be useful for such applications if 'repo->istate'
> does not have to be the in-core index instance to be worked on.  So
> things that go in libgit.a may want to allow such distinction.
> 
> But what goes in builtin/ is a different story.  As long as this
> application has no need for such a feature and will always work on
> the primary in-core index, prepared for the in-core repository
> structure for convenience, it may not worth it to support such a
> feature that no callers benefit from.

I'll try to restructure my patches to do the following order:

1. replace compatibility macros with static global references, except
   i. use 'istate' in the methods that don't need a repository.
  ii. use 'repo->index' in the methods that need a repository.

2. replace static globals with method parameters.
   i. drop 'istate' static global with method parameters. Methods that
      have a repo will pass 'repo->index' to these methods.
  ii. drop 'repo' static global with method parameters.

3. replace static globals in callback methods using 'repo->index',
   where 'repo' is a member of the callback_data struct.

That should keep the structure as presented in v2 while also avoiding
this question of "can istate differ from repo->index?"

Thanks,
-Stolee

To reduce the need for the index compatibility macros, we will replace
their uses in update-index mechanically. This is the most interesting
change, which creates global "repo" and "istate" pointers. The macros
that expand to use the_index can then be mechanically replaced by
references to the istate pointer.

We will be careful to use "repo->index" over "istate" whenever repo is
needed by a method.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Also use repo->index over istate, when possible.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Also use "repo->index" over "istate" when possible.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 9, 2021

This branch is now known as ds/update-index.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 9, 2021

This patch series was integrated into seen via git@d9b4954.

@gitgitgadget gitgitgadget bot added the seen label Jan 9, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 10, 2021

This patch series was integrated into seen via git@f13eaa8.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 10, 2021

This patch series was integrated into seen via git@8c2d7b3.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 10, 2021

This patch series was integrated into seen via git@d2e8dfc.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 10, 2021

This patch series was integrated into seen via git@1347110.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 10, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> UPDATE: this is now based on ag/merge-strategies-in-c to avoid conflicts in
> 'seen'. The changes in builtin/rm.c still conflict with
> mt/rm-sparse-checkout, but that branch seems to be waiting for a clearer
> plan on some corner cases. I thought about ejecting it, but 'rm' still uses
> ce_match_stat(), so just dropping the patch gives less of a final stake at
> the end of the series. (I'm still open to it, if necessary.)

I haven't read this latest iteration myself yet beyond the cover
letter, but tonight's 'seen' has this queued near its tip.  I expect
it would either stay there or occasionally ejected, until the base
topic solidifies a bit more.

>  * Methods that know about the 'repo' pointer no longer also have an
>    'istate' pointer and instead prefer 'repo->index'
>
>  * This includes the callback_data struct which only has a 'repo' member, no
>    'istate'.

OK.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 10, 2021

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Sun, Jan 10, 2021 at 2:03 AM Junio C Hamano <gitster@pobox.com> wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > UPDATE: this is now based on ag/merge-strategies-in-c to avoid conflicts in
> > 'seen'. The changes in builtin/rm.c still conflict with
> > mt/rm-sparse-checkout, but that branch seems to be waiting for a clearer
> > plan on some corner cases. I thought about ejecting it, but 'rm' still uses
> > ce_match_stat(), so just dropping the patch gives less of a final stake at
> > the end of the series. (I'm still open to it, if necessary.)
>
> I haven't read this latest iteration myself yet beyond the cover
> letter, but tonight's 'seen' has this queued near its tip.  I expect
> it would either stay there or occasionally ejected, until the base
> topic solidifies a bit more.
>
> >  * Methods that know about the 'repo' pointer no longer also have an
> >    'istate' pointer and instead prefer 'repo->index'
> >
> >  * This includes the callback_data struct which only has a 'repo' member, no
> >    'istate'.
>
> OK.

I looked this version of the series over and did not find anything
else about which to comment.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 10, 2021

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 1/10/2021 2:03 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> UPDATE: this is now based on ag/merge-strategies-in-c to avoid conflicts in
>> 'seen'. The changes in builtin/rm.c still conflict with
>> mt/rm-sparse-checkout, but that branch seems to be waiting for a clearer
>> plan on some corner cases. I thought about ejecting it, but 'rm' still uses
>> ce_match_stat(), so just dropping the patch gives less of a final stake at
>> the end of the series. (I'm still open to it, if necessary.)
> 
> I haven't read this latest iteration myself yet beyond the cover
> letter, but tonight's 'seen' has this queued near its tip.  I expect
> it would either stay there or occasionally ejected, until the base
> topic solidifies a bit more.

Thanks. I'll continue to watch that topic and provide review as
new versions come out.

-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 12, 2021

This patch series was integrated into seen via git@4182c5b.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 12, 2021

This patch series was integrated into seen via git@13d8b6f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 12, 2021

This patch series was integrated into seen via git@cde3228.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 13, 2021

This patch series was integrated into seen via git@bf74db8.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 13, 2021

This patch series was integrated into seen via git@e4878a0.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 13, 2021

This patch series was integrated into seen via git@91defb9.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 14, 2021

This patch series was integrated into seen via git@b8d1e44.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 15, 2021

This patch series was integrated into seen via git@ab41db6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 15, 2021

This patch series was integrated into seen via git@b8f05fd.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 15, 2021

This patch series was integrated into seen via git@1a9c87e.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 16, 2021

This patch series was integrated into seen via git@a5b9558.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 16, 2021

This patch series was integrated into seen via git@b1bdc6f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 21, 2021

This patch series was integrated into seen via git@d726151.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2021

This patch series was integrated into seen via git@74b7b0a.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 23, 2021

This patch series was integrated into seen via git@8f33792.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 23, 2021

This patch series was integrated into seen via git@74b06e1.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 24, 2021

This patch series was integrated into seen via git@e6b2e66.

@derrickstolee
Copy link
Author

closing to unblock other work.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 26, 2021

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 1/10/2021 2:03 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> UPDATE: this is now based on ag/merge-strategies-in-c to avoid conflicts in
>> 'seen'. The changes in builtin/rm.c still conflict with
>> mt/rm-sparse-checkout, but that branch seems to be waiting for a clearer
>> plan on some corner cases. I thought about ejecting it, but 'rm' still uses
>> ce_match_stat(), so just dropping the patch gives less of a final stake at
>> the end of the series. (I'm still open to it, if necessary.)
> 
> I haven't read this latest iteration myself yet beyond the cover
> letter, but tonight's 'seen' has this queued near its tip.  I expect
> it would either stay there or occasionally ejected, until the base
> topic solidifies a bit more.

Junio,

Please drop this series for now. I'll be introducing a new series soon
that will collide with it and this is a lower priority.

I'll probably come back to revisit removing these macros, but I'll do
so one builtin at a time when others are not modifying them at the
same time.

Thanks,
-Stolee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant