Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create stronger guard rails on replace refs #1537

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented May 26, 2023

(This series is based on tb/pack-bitmap-traversal-with-boundary due to wanting to modify prepare_repo_settings() in a similar way.)

The replace-refs can be ignored via the core.useReplaceRefs=false config setting. This setting is possible to miss in some Git commands if they do not load default config at the appropriate time. See [1] for a recent example of this.

[1] https://lore.kernel.org/git/pull.1530.git.1683745654800.gitgitgadget@gmail.com/

This series aims to avoid this kind of error from happening in the future. The idea is to encapsulate the setting in such a way that we can guarantee that config has been checked before using the in-memory value.

Further, we must be careful that some Git commands want to disable replace refs unconditionally, as if GIT_NO_REPLACE_REFS was enabled in the environment.

The approach taken here is to split the global into two different sources. First, read_replace_refs is kept (but moved to replace-objects.c scope) and reflects whether or not the feature is permitted by the environment and the current command. Second, a new value is added to repo-settings and this is checked after using prepare_repo_settings() to guarantee the config has been read.

This presents a potential behavior change, in that now core.useReplaceRefs is specific to each in-memory repository instead of applying the superproject value to all submodules. I could not find a Git command that has multiple in-memory repositories and follows OIDs to object contents, so I'm not sure how to demonstrate it in a test.

Here is the breakdown of the series:

  • Patch 1 creates disable_replace_refs() to encapsulate the global disabling of the feature.
  • Patch 2 creates replace_refs_enabled() to check if the feature is enabled (with respect to a given repository).
  • Patch 3 creates the value in repo-settings as well as ensures that the repo settings have been prepared before accessing the value within replace_refs_enabled(). A test is added to demonstrate how the config value is now scoped on a per-repository basis.

Updates in v3

Thanks for the review on v2!

  • The removal of the global from environment.c is delayed to patch 3 because config.c still assigns the value in patch 2.
  • The comment for the member in the repo_settings struct is modified for better grammar.

Updates in v2

Thanks for the careful review on v1!

  • disable_replace_refs() now replaces "read_replace_refs = 0" in the exact same line to avoid possible behavior change.
  • Stale comments, include headers, and commit messages are updated to include the latest status.
  • Patch 3 contains a test of the repo-scoped value using 'git grep'.

Thanks,
-Stolee

cc: vdye@github.com
cc: me@ttaylorr.com
cc: newren@gmail.com
cc: gitster@pobox.com
cc: Jeff King peff@peff.net
cc: René Scharfe l.s.r@web.de

@derrickstolee derrickstolee self-assigned this May 26, 2023
@derrickstolee derrickstolee force-pushed the replace-refs-safety branch 4 times, most recently from 5776a4f to 481a81a Compare May 26, 2023 18:41
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented May 26, 2023

Submitted as pull.1537.git.1685126617.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1537/derrickstolee/replace-refs-safety-v1

To fetch this version to local tag pr-1537/derrickstolee/replace-refs-safety-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1537/derrickstolee/replace-refs-safety-v1

@@ -805,7 +805,7 @@ static int batch_objects(struct batch_options *opt)
if (repo_has_promisor_remote(the_repository))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Fri, May 26, 2023 at 11:43 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <derrickstolee@github.com>
>
> Several builtins depend on being able to disable the replace references
> so we actually operate on each object individually. These currently do
> so by directly mutating the 'read_replace_refs' global.
>
> A future change will move this global into a different place, so it will
> be necessary to change all of these lines. However, we can simplify that
> transition by abstracting the purpose of these global assignments with a
> method call.
>
> We will never scope this to an in-memory repository as we want to make
> sure that we never use replace refs throughout the life of the process
> if this method is called.

I'm confused; doesn't the 3rd patch do exactly what this paragraph
says you'll never do?

> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/cat-file.c       | 2 +-
>  builtin/commit-graph.c   | 5 ++---
>  builtin/fsck.c           | 2 +-
>  builtin/index-pack.c     | 2 +-
>  builtin/pack-objects.c   | 3 +--
>  builtin/prune.c          | 3 ++-
>  builtin/replace.c        | 3 ++-
>  builtin/unpack-objects.c | 3 +--
>  builtin/upload-pack.c    | 3 ++-
>  environment.c            | 2 +-
>  git.c                    | 2 +-
>  replace-object.c         | 5 +++++
>  replace-object.h         | 8 ++++++++
>  repo-settings.c          | 1 +
>  14 files changed, 29 insertions(+), 15 deletions(-)
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 0bafc14e6c0..27f070267a4 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -805,7 +805,7 @@ static int batch_objects(struct batch_options *opt)
>                 if (repo_has_promisor_remote(the_repository))
>                         warning("This repository uses promisor remotes. Some objects may not be loaded.");
>
> -               read_replace_refs = 0;
> +               disable_replace_refs();
>
>                 cb.opt = opt;
>                 cb.expand = &data;
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index a3d00fa232b..639c9ca8b91 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -122,7 +122,6 @@ static int graph_verify(int argc, const char **argv, const char *prefix)
>         return ret;
>  }
>
> -extern int read_replace_refs;
>  static struct commit_graph_opts write_opts;
>
>  static int write_option_parse_split(const struct option *opt, const char *arg,
> @@ -323,13 +322,13 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
>         struct option *options = parse_options_concat(builtin_commit_graph_options, common_opts);
>
>         git_config(git_default_config, NULL);
> -
> -       read_replace_refs = 0;
>         save_commit_buffer = 0;
>
>         argc = parse_options(argc, argv, prefix, options,
>                              builtin_commit_graph_usage, 0);
>         FREE_AND_NULL(options);
>
> +       disable_replace_refs();
> +

In this place and several others in the file, you opt to not just
replace the assignment with a function call, but move the action line
to later in the file.  In some cases, much later.

I don't think it hurts things, but it certainly makes me wonder why it
was moved.  Did it break for some reason when called earlier?  (Is
there something trickier about this patch than I expected?)

>         return fn(argc, argv, prefix);
>  }
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 2cd461b84c1..8a2d7afc83a 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -927,7 +927,6 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
>         fetch_if_missing = 0;
>
>         errors_found = 0;
> -       read_replace_refs = 0;
>         save_commit_buffer = 0;
>
>         argc = parse_options(argc, argv, prefix, fsck_opts, fsck_usage, 0);
> @@ -953,6 +952,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
>
>         git_config(git_fsck_config, &fsck_obj_options);
>         prepare_repo_settings(the_repository);
> +       disable_replace_refs();
>
>         if (connectivity_only) {
>                 for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index bb67e166559..d0d8067510b 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1752,7 +1752,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
>         if (argc == 2 && !strcmp(argv[1], "-h"))
>                 usage(index_pack_usage);
>
> -       read_replace_refs = 0;
> +       disable_replace_refs();
>         fsck_options.walk = mark_link;
>
>         reset_pack_idx_option(&opts);
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index a5b466839ba..55635bdf4b4 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -4284,9 +4284,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>         if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS))
>                 BUG("too many dfs states, increase OE_DFS_STATE_BITS");
>
> -       read_replace_refs = 0;
> -
>         sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
> +       disable_replace_refs();
>         if (the_repository->gitdir) {
>                 prepare_repo_settings(the_repository);
>                 if (sparse < 0)
> diff --git a/builtin/prune.c b/builtin/prune.c
> index 5dc9b207200..a8f3848c3a3 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -164,7 +164,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
>
>         expire = TIME_MAX;
>         save_commit_buffer = 0;
> -       read_replace_refs = 0;
>         repo_init_revisions(the_repository, &revs, prefix);
>
>         argc = parse_options(argc, argv, prefix, options, prune_usage, 0);
> @@ -172,6 +171,8 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
>         if (repository_format_precious_objects)
>                 die(_("cannot prune in a precious-objects repo"));
>
> +       disable_replace_refs();
> +
>         while (argc--) {
>                 struct object_id oid;
>                 const char *name = *argv++;
> diff --git a/builtin/replace.c b/builtin/replace.c
> index 981f1894436..6c6f0b3ed01 100644
> --- a/builtin/replace.c
> +++ b/builtin/replace.c
> @@ -566,11 +566,12 @@ int cmd_replace(int argc, const char **argv, const char *prefix)
>                 OPT_END()
>         };
>
> -       read_replace_refs = 0;
>         git_config(git_default_config, NULL);
>
>         argc = parse_options(argc, argv, prefix, options, git_replace_usage, 0);
>
> +       disable_replace_refs();
> +
>         if (!cmdmode)
>                 cmdmode = argc ? MODE_REPLACE : MODE_LIST;
>
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index 2c52c3a741f..3f5f6719405 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -609,9 +609,8 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix UNUSED)
>         int i;
>         struct object_id oid;
>
> -       read_replace_refs = 0;
> -
>         git_config(git_default_config, NULL);
> +       disable_replace_refs();
>
>         quiet = !isatty(2);
>
> diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
> index beb9dd08610..6fc9a8feab0 100644
> --- a/builtin/upload-pack.c
> +++ b/builtin/upload-pack.c
> @@ -36,7 +36,6 @@ int cmd_upload_pack(int argc, const char **argv, const char *prefix)
>         };
>
>         packet_trace_identity("upload-pack");
> -       read_replace_refs = 0;
>
>         argc = parse_options(argc, argv, prefix, options, upload_pack_usage, 0);
>
> @@ -50,6 +49,8 @@ int cmd_upload_pack(int argc, const char **argv, const char *prefix)
>         if (!enter_repo(dir, strict))
>                 die("'%s' does not appear to be a git repository", dir);
>
> +       disable_replace_refs();
> +
>         switch (determine_protocol_version_server()) {
>         case protocol_v2:
>                 if (advertise_refs)
> diff --git a/environment.c b/environment.c
> index 8a96997539a..3b4d87c322f 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -185,7 +185,7 @@ void setup_git_env(const char *git_dir)
>         strvec_clear(&to_free);
>
>         if (getenv(NO_REPLACE_OBJECTS_ENVIRONMENT))
> -               read_replace_refs = 0;
> +               disable_replace_refs();
>         replace_ref_base = getenv(GIT_REPLACE_REF_BASE_ENVIRONMENT);
>         git_replace_ref_base = xstrdup(replace_ref_base ? replace_ref_base
>                                                           : "refs/replace/");
> diff --git a/git.c b/git.c
> index 45899be8265..3252d4c7661 100644
> --- a/git.c
> +++ b/git.c
> @@ -185,7 +185,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
>                         if (envchanged)
>                                 *envchanged = 1;
>                 } else if (!strcmp(cmd, "--no-replace-objects")) {
> -                       read_replace_refs = 0;
> +                       disable_replace_refs();
>                         setenv(NO_REPLACE_OBJECTS_ENVIRONMENT, "1", 1);
>                         if (envchanged)
>                                 *envchanged = 1;
> diff --git a/replace-object.c b/replace-object.c
> index e98825d5852..ceec81c940c 100644
> --- a/replace-object.c
> +++ b/replace-object.c
> @@ -84,3 +84,8 @@ const struct object_id *do_lookup_replace_object(struct repository *r,
>         }
>         die(_("replace depth too high for object %s"), oid_to_hex(oid));
>  }
> +
> +void disable_replace_refs(void)
> +{
> +       read_replace_refs = 0;
> +}
> diff --git a/replace-object.h b/replace-object.h
> index 500482b02b3..7786d4152b0 100644
> --- a/replace-object.h
> +++ b/replace-object.h
> @@ -48,4 +48,12 @@ static inline const struct object_id *lookup_replace_object(struct repository *r
>         return do_lookup_replace_object(r, oid);
>  }
>
> +/*
> + * Some commands override config and environment settings for using
> + * replace references. Use this method to disable the setting and ensure
> + * those other settings will not override this choice. This applies
> + * globally to all in-process repositories.
> + */
> +void disable_replace_refs(void);
> +
>  #endif /* REPLACE_OBJECT_H */
> diff --git a/repo-settings.c b/repo-settings.c
> index 7b566d729d0..1df0320bf33 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -3,6 +3,7 @@
>  #include "repository.h"
>  #include "midx.h"
>  #include "compat/fsmonitor/fsm-listen.h"
> +#include "environment.h"

Why?  There are no other changes in this file, so I don't see why
you'd need another include.

>
>  static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
>                           int def)
> --
> gitgitgadget

I think the patch is probably fine, but I saw a few things that made
me wonder if I had missed something important, highlighted above.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 5/31/2023 12:41 AM, Elijah Newren wrote:
> On Fri, May 26, 2023 at 11:43 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:

>> We will never scope this to an in-memory repository as we want to make
>> sure that we never use replace refs throughout the life of the process
>> if this method is called.
> 
> I'm confused; doesn't the 3rd patch do exactly what this paragraph
> says you'll never do?

You mentioned in another reply that you figured it out, but for the sake
of anyone reading here: we _add_ a repo-scoped version for the config,
but we need this globally-scoped one for process-wide disabling the
feature. This could be said more clearly.
 
>> @@ -323,13 +322,13 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
>>         struct option *options = parse_options_concat(builtin_commit_graph_options, common_opts);
>>
>>         git_config(git_default_config, NULL);
>> -
>> -       read_replace_refs = 0;
>>         save_commit_buffer = 0;
>>
>>         argc = parse_options(argc, argv, prefix, options,
>>                              builtin_commit_graph_usage, 0);
>>         FREE_AND_NULL(options);
>>
>> +       disable_replace_refs();
>> +
> 
> In this place and several others in the file, you opt to not just
> replace the assignment with a function call, but move the action line
> to later in the file.  In some cases, much later.
> 
> I don't think it hurts things, but it certainly makes me wonder why it
> was moved.  Did it break for some reason when called earlier?  (Is
> there something trickier about this patch than I expected?)

Generally, I decided to move it after option-parsing, so it wouldn't
be called if we are hitting an option-parse error.

However, these moves were only important for a draft version where
I had not separated the global and local scopes, so calling the method
would also load config.

In this version of the patch, this is not needed at all, and I could
do an in-line replacement. Thanks!

>> diff --git a/repo-settings.c b/repo-settings.c
>> index 7b566d729d0..1df0320bf33 100644
>> --- a/repo-settings.c
>> +++ b/repo-settings.c
>> @@ -3,6 +3,7 @@
>>  #include "repository.h"
>>  #include "midx.h"
>>  #include "compat/fsmonitor/fsm-listen.h"
>> +#include "environment.h"
> 
> Why?  There are no other changes in this file, so I don't see why
> you'd need another include.

Thanks. This is a leftover from a previous version of the patch.
 
>>
>>  static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
>>                           int def)
>> --
>> gitgitgadget
> 
> I think the patch is probably fine, but I saw a few things that made
> me wonder if I had missed something important, highlighted above.

Thank you for pointing them out, as the things that brought you
confusion are cruft from an earlier version but are no longer
valuable in this version.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Wed, May 31, 2023 at 09:37:10AM -0400, Derrick Stolee wrote:

> > In this place and several others in the file, you opt to not just
> > replace the assignment with a function call, but move the action line
> > to later in the file.  In some cases, much later.
> > 
> > I don't think it hurts things, but it certainly makes me wonder why it
> > was moved.  Did it break for some reason when called earlier?  (Is
> > there something trickier about this patch than I expected?)
> 
> Generally, I decided to move it after option-parsing, so it wouldn't
> be called if we are hitting an option-parse error.

Playing devil's advocate: would option parsing ever access an object? I
think in most cases the answer is no, but I could imagine it happening
for some special cases (e.g., update-index uses callbacks to act on
options as we parse them, since order is important).

So I think as a general principle it makes sense for commands to set
this flag as early as possible.

> However, these moves were only important for a draft version where
> I had not separated the global and local scopes, so calling the method
> would also load config.
> 
> In this version of the patch, this is not needed at all, and I could
> do an in-line replacement. Thanks!

It sounds like you were going to switch the locations back anyway, but
maybe the above gives an extra motivation. :)

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes:

>> Generally, I decided to move it after option-parsing, so it wouldn't
>> be called if we are hitting an option-parse error.
>
> Playing devil's advocate: would option parsing ever access an object? I
> think in most cases the answer is no, but I could imagine it happening
> for some special cases (e.g., update-index uses callbacks to act on
> options as we parse them, since order is important).
>
> So I think as a general principle it makes sense for commands to set
> this flag as early as possible.

I agree with the "devil's advocate" above; indeed my suggestion to
follow-on work that is enabled by introducing this function, i.e. we
can ensure that the objects already instantiated when the call is
made are not affected by the replace mechanism, was exactly for such
a "we already accessed some objects via the replace mechanism and
then try to close the door of the barn afterwards with this call"
case.

Indeed, I think "git branch --no-merged 0369cf" resolves the object
name down to a commit object in parse_opt_merge_filter() before
parse_options() call returns.

Yes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Sat, Jun 03, 2023 at 09:28:13AM +0900, Junio C Hamano wrote:

> I agree with the "devil's advocate" above; indeed my suggestion to
> follow-on work that is enabled by introducing this function, i.e. we
> can ensure that the objects already instantiated when the call is
> made are not affected by the replace mechanism, was exactly for such
> a "we already accessed some objects via the replace mechanism and
> then try to close the door of the barn afterwards with this call"
> case.
> 
> Indeed, I think "git branch --no-merged 0369cf" resolves the object
> name down to a commit object in parse_opt_merge_filter() before
> parse_options() call returns.
> 
> Yes.

Ah, very good example. Yes, I'd think it would be appropriate to BUG()
if disable_replace_refs() is called and anybody has looked up an object
already.

-Peff

@gitgitgadget
Copy link

gitgitgadget bot commented May 31, 2023

On the Git mailing list, Elijah Newren wrote (reply to this):

On Fri, May 26, 2023 at 11:43 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> (This series is based on tb/pack-bitmap-traversal-with-boundary due to
> wanting to modify prepare_repo_settings() in a similar way.)
>
> The replace-refs can be ignored via the core.useReplaceRefs=false config
> setting. This setting is possible to miss in some Git commands if they do
> not load default config at the appropriate time. See [1] for a recent
> example of this.
>
> [1]
> https://lore.kernel.org/git/pull.1530.git.1683745654800.gitgitgadget@gmail.com/
>
> This series aims to avoid this kind of error from happening in the future.
> The idea is to encapsulate the setting in such a way that we can guarantee
> that config has been checked before using the in-memory value.
>
> Further, we must be careful that some Git commands want to disable replace
> refs unconditionally, as if GIT_NO_REPLACE_REFS was enabled in the
> environment.
>
> The approach taken here is to split the global into two different sources.
> First, read_replace_refs is kept (but moved to replace-objects.c scope) and
> reflects whether or not the feature is permitted by the environment and the
> current command. Second, a new value is added to repo-settings and this is
> checked after using prepare_repo_settings() to guarantee the config has been
> read.
>
> This presents a potential behavior change, in that now core.useReplaceRefs
> is specific to each in-memory repository instead of applying the
> superproject value to all submodules. I could not find a Git command that
> has multiple in-memory repositories and follows OIDs to object contents, so
> I'm not sure how to demonstrate it in a test.
>
> Here is the breakdown of the series:
>
>  * Patch 1 creates disable_replace_refs() to encapsulate the global
>    disabling of the feature.
>  * Patch 2 creates replace_refs_enabled() to check if the feature is enabled
>    (with respect to a given repository). This is a thin wrapper of the
>    global at this point, but does allow us to remove it from environment.h.
>  * Patch 3 creates the value in repo-settings as well as ensures that the
>    repo settings have been prepared before accessing the value within
>    replace_refs_enabled().

Thanks for implementing this.  I had a few questions on the first
patch (though I think one of them was answered by noting that you have
both a global and a repository setting for the flag), but otherwise it
looks good.

Several builtins depend on being able to disable the replace references
so we actually operate on each object individually. These currently do
so by directly mutating the 'read_replace_refs' global.

A future change will move this global into a different place, so it will
be necessary to change all of these lines. However, we can simplify that
transition by abstracting the purpose of these global assignments with a
method call.

We will need to keep this read_replace_refs global forever, as we want
to make sure that we never use replace refs throughout the life of the
process if this method is called. Future changes may present a
repository-scoped version of the variable to represent that repository's
core.useReplaceRefs config value, but a zero-valued read_replace_refs
will always override such a setting.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2023

On the Git mailing list, Junio C Hamano wrote (reply to this), regarding 56544ab (outdated):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> Several builtins depend on being able to disable the replace references
> so we actually operate on each object individually. These currently do
> so by directly mutating the 'read_replace_refs' global.
>
> A future change will move this global into a different place, so it will
> be necessary to change all of these lines. However, we can simplify that
> transition by abstracting the purpose of these global assignments with a
> method call.
>
> We will never scope this to an in-memory repository as we want to make
> sure that we never use replace refs throughout the life of the process
> if this method is called.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---

It will naturally be outside the scope of the series, but this
change will allow us to add a sanity check to make sure that nobody
has read objects that would have been affected by these replace refs
before the disable call was made, which is another reason to welcome
this change.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2023

This branch is now known as ds/disable-replace-refs.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2023

This patch series was integrated into seen via git@10c1b0f.

@gitgitgadget gitgitgadget bot added the seen label Jun 1, 2023
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2023

There was a status update in the "New Topics" section about the branch ds/disable-replace-refs on the Git mailing list:

Introduce a mechanism to disable replace refs globally and per
repository.

Will merge to 'next'???
What's the doneness of the base topic?
source: <pull.1537.git.1685126617.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2023

On the Git mailing list, Victoria Dye wrote (reply to this), regarding 56544ab (outdated):

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> Several builtins depend on being able to disable the replace references
> so we actually operate on each object individually. These currently do
> so by directly mutating the 'read_replace_refs' global.
> 
> A future change will move this global into a different place, so it will
> be necessary to change all of these lines. However, we can simplify that
> transition by abstracting the purpose of these global assignments with a
> method call.
> 
> We will never scope this to an in-memory repository as we want to make
> sure that we never use replace refs throughout the life of the process
> if this method is called.
> 

Although unfortunate (it would be nice to remove the global), this makes
sense. Disabling replace refs needs to be process-wide, and manually
propagating a repository setting to other repositories would be awkward
and prone to error.

All of my questions on this patch ("why were the 'disable_replace_refs()'
calls added later in the function than the original 'read_replace_refs =
0'?" and "why was '#include "environment.h"' added in 'repo-settings.c'?")
were asked [1] and answered [2] already. Beyond those two points, this patch
looks good!

[1] https://lore.kernel.org/git/CABPp-BFzA0yVecHK1DEGMpAhewm7oyqEim7BCw7-DTKpUzWnpw@mail.gmail.com/
[2] https://lore.kernel.org/git/ae89feda-0a76-29d7-14ce-662214414638@github.com/

> +/*
> + * Some commands override config and environment settings for using
> + * replace references. Use this method to disable the setting and ensure
> + * those other settings will not override this choice. This applies
> + * globally to all in-process repositories.
> + */
> +void disable_replace_refs(void);
> +

Thanks for including the function documentation. It concisely explains the
purpose of 'disable_replace_refs()' and helps clarify how replace refs are
treated in Git. 

>  #endif /* REPLACE_OBJECT_H */

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2023

On the Git mailing list, Victoria Dye wrote (reply to this), regarding 5fc2f92 (outdated):

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The 'read_replace_objects' constant is initialized by git_default_config
> (if core.useReplaceRefs is disabled) and within setup_git_env (if
> GIT_NO_REPLACE_OBJECTS) is set. To ensure that this variable cannot be
> set accidentally in other places, wrap it in a replace_refs_enabled()
> method.
> 
> This allows us to remove the global from being recognized outside of
> replace-objects.c.
> 
> Further, a future change will help prevent the variable from being read
> before it is initialized. Centralizing its access is an important first
> step.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  commit-graph.c   |  4 +---
>  environment.c    |  1 -
>  log-tree.c       |  2 +-
>  replace-object.c |  7 +++++++
>  replace-object.h | 15 ++++++++++++++-
>  5 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 43558b4d9b0..95873317bf7 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -203,14 +203,12 @@ static struct commit_graph *alloc_commit_graph(void)
>  	return g;
>  }
>  
> -extern int read_replace_refs;
> -
>  static int commit_graph_compatible(struct repository *r)
>  {
>  	if (!r->gitdir)
>  		return 0;
>  
> -	if (read_replace_refs) {
> +	if (replace_refs_enabled(r)) {
>  		prepare_replace_object(r);
>  		if (hashmap_get_size(&r->objects->replace_map->map))
>  			return 0;

This and the other 'read_replace_refs' -> 'replace_refs_enabled()'
replacements all look good. Although we're not using the 'struct repository'
argument yet, I see that it'll be used in patch 3 - adding the (unused) arg
here helps avoid the extra churn there.

> diff --git a/replace-object.c b/replace-object.c
> index ceec81c940c..cf91c3ba456 100644
> --- a/replace-object.c
> +++ b/replace-object.c
> @@ -85,7 +85,14 @@ const struct object_id *do_lookup_replace_object(struct repository *r,
>  	die(_("replace depth too high for object %s"), oid_to_hex(oid));
>  }
>  
> +static int read_replace_refs = 1;
> +
>  void disable_replace_refs(void)
>  {
>  	read_replace_refs = 0;
>  }
> +
> +int replace_refs_enabled(struct repository *r)
> +{
> +	return read_replace_refs;
> +}
> diff --git a/replace-object.h b/replace-object.h
> index 7786d4152b0..b141075023e 100644
> --- a/replace-object.h
> +++ b/replace-object.h
> @@ -27,6 +27,19 @@ void prepare_replace_object(struct repository *r);
>  const struct object_id *do_lookup_replace_object(struct repository *r,
>  						 const struct object_id *oid);
>  
> +
> +/*
> + * Some commands disable replace-refs unconditionally, and otherwise each
> + * repository could alter the core.useReplaceRefs config value.
> + *
> + * Return 1 if and only if all of the following are true:
> + *
> + *  a. disable_replace_refs() has not been called.
> + *  b. GIT_NO_REPLACE_OBJECTS is unset or zero.
> + *  c. the given repository does not have core.useReplaceRefs=false.
> + */
> +int replace_refs_enabled(struct repository *r);

Since the purpose of this function is to access global state, would
'environment.[c|h]' be a more appropriate place for it (and
'disable_replace_refs()', for that matter)? There's also some precedent;
'set_shared_repository()' and 'get_shared_repository()' have a very similar
design to what you've added here.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2023

On the Git mailing list, Victoria Dye wrote (reply to this), regarding 481a81a (outdated):

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The 'read_replace_refs' global specifies whether or not we should
> respect the references of the form 'refs/replace/<oid>' to replace which
> object we look up when asking for '<oid>'. This global has caused issues
> when it is not initialized properly, such as in b6551feadfd (merge-tree:
> load default git config, 2023-05-10).
> 
> To make this more robust, move its config-based initialization out of
> git_default_config and into prepare_repo_settings(). This provides a
> repository-scoped version of the 'read_replace_refs' global.

As you noted in [1], this could be clearer. I think the most confusing part
is referring to it as "a repository-scoped version of the...global" because
it implies that the global and repo-scoped setting do the same thing/take
the same precedence (when, in reality, if replace refs are disabled
globally, the config doesn't do anything). Maybe something like this would
make that clearer?

"This provides a repository-scoped configuration that's only used if replace
refs are not already disabled process-wide with the global
'read_replace_refs'."

[1] https://lore.kernel.org/git/ae89feda-0a76-29d7-14ce-662214414638@github.com/

> 
> The global still has its purpose: it is disabled process-wide by the
> GIT_NO_REPLACE_OBJECTS environment variable or by a call to
> disable_replace_refs() in some specific Git commands.
> 
> Since we already encapsulated the use of the constant inside
> replace_refs_enabled(), we can perform the initialization inside that
> method, if necessary. This solves the problem of forgetting to check the
> config, as we will check it before returning this value.
> 
> There is an interesting behavior change possible here: we now have a
> repository-scoped understanding of this config value. Thus, if there was
> a command that recurses into submodules and might follow replace refs,
> then it would now respect the core.useReplaceRefs config value in each
> repository.
> 
> Unfortunately, the existing processes that recurse into submodules do
> not appear to follow object IDs to their contents, so this behavior
> change is not visible in the current implementation. It is something
> valuable for future behavior changes.

AFAIK, the only '--recurse-submodules' commands that recurse in-process are
'ls-files' and 'grep'. However, 'grep' does call 'parse_object_or_die()',
which (further down in the call stack) calls 'lookup_replace_object()'.
Maybe I'm misreading and the replaced object isn't actually used, but could
'git grep --recurse-submodules' be used to test this?

> @@ -94,5 +94,14 @@ void disable_replace_refs(void)
>  
>  int replace_refs_enabled(struct repository *r)
>  {
> -	return read_replace_refs;
> +	if (!read_replace_refs)
> +		return 0;
> +
> +	if (r->gitdir) {
> +		prepare_repo_settings(r);
> +		return r->settings.read_replace_refs;
> +	}
> +
> +	/* repository has no objects or refs. */
> +	return 0;
>  }

This implementation matches the intent outlined in this patch/the cover
letter:

- if replace refs are disabled process-wide, always return 0
- if the gitdir is present, return the value of 'core.usereplacerefs'
- if there's no gitdir, there's no repository set up (and therefore no
  config to read/objects to replace), so return 0

I was a bit unsure about whether 'r->gitdir' was the right check to make,
but it's consistent with other gates to 'prepare_repo_settings()' (e.g.
those added in 059fda19021 (checkout/fetch/pull/pack-objects: allow `-h`
outside a repository, 2022-02-08)), so I'm happy with it.

> diff --git a/repo-settings.c b/repo-settings.c
> index 1df0320bf33..5a7c990300d 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -68,6 +68,7 @@ void prepare_repo_settings(struct repository *r)
>  	repo_cfg_bool(r, "pack.usebitmapboundarytraversal",
>  		      &r->settings.pack_use_bitmap_boundary_traversal,
>  		      r->settings.pack_use_bitmap_boundary_traversal);
> +	repo_cfg_bool(r, "core.usereplacerefs", &r->settings.read_replace_refs, 1);

This defaults to enabling replace refs, consistent with the (intended)
behavior prior to this series. Good!

>  
>  	/*
>  	 * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
> diff --git a/repository.h b/repository.h
> index c42f7ab6bdc..13fefa540bc 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -39,6 +39,14 @@ struct repo_settings {
>  	int pack_read_reverse_index;
>  	int pack_use_bitmap_boundary_traversal;
>  
> +	/*
> +	 * Do replace refs need to be checked this run?  This variable is
> +	 * initialized to true unless --no-replace-object is used or
> +	 * $GIT_NO_REPLACE_OBJECTS is set, but is set to false by some
> +	 * commands that do not want replace references to be active.
> +	 */
> +	int read_replace_refs;

I don't think this comment is accurate anymore, since the repo-scoped
'read_replace_refs' value is determined *only* by the 'core.usereplacerefs'
config. It's 'replace_refs_enabled()' that makes the overall determination
(taking into account 'GIT_NO_REPLACE_OBJECTS'/'--no-replace-object').

> +
>  	struct fsmonitor_settings *fsmonitor; /* lazily loaded */
>  
>  	int index_version;

@derrickstolee derrickstolee changed the base branch from seen to tb/pack-bitmap-traversal-with-boundary June 1, 2023 17:01
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2023

User Jeff King <peff@peff.net> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2023

On the Git mailing list, Derrick Stolee wrote (reply to this), regarding 5fc2f92 (outdated):

On 6/1/2023 12:35 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>

>> diff --git a/replace-object.h b/replace-object.h
>> index 7786d4152b0..b141075023e 100644
>> --- a/replace-object.h
>> +++ b/replace-object.h
>> @@ -27,6 +27,19 @@ void prepare_replace_object(struct repository *r);
>>  const struct object_id *do_lookup_replace_object(struct repository *r,
>>  						 const struct object_id *oid);
>>  
>> +
>> +/*
>> + * Some commands disable replace-refs unconditionally, and otherwise each
>> + * repository could alter the core.useReplaceRefs config value.
>> + *
>> + * Return 1 if and only if all of the following are true:
>> + *
>> + *  a. disable_replace_refs() has not been called.
>> + *  b. GIT_NO_REPLACE_OBJECTS is unset or zero.
>> + *  c. the given repository does not have core.useReplaceRefs=false.
>> + */
>> +int replace_refs_enabled(struct repository *r);
> 
> Since the purpose of this function is to access global state, would
> 'environment.[c|h]' be a more appropriate place for it (and
> 'disable_replace_refs()', for that matter)? There's also some precedent;
> 'set_shared_repository()' and 'get_shared_repository()' have a very similar
> design to what you've added here.
 
That's an interesting idea that I had not considered. My vague sense
is that it is worth isolating the functionality to this header instead
of lumping it into the giant environment.h header, but I've CC'd
Elijah (who is leading a lot of this header organization stuff) to see
if he has an opinion on this matter.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2023

On the Git mailing list, Derrick Stolee wrote (reply to this), regarding 481a81a (outdated):

On 6/1/2023 12:36 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>

>> Unfortunately, the existing processes that recurse into submodules do
>> not appear to follow object IDs to their contents, so this behavior
>> change is not visible in the current implementation. It is something
>> valuable for future behavior changes.
> 
> AFAIK, the only '--recurse-submodules' commands that recurse in-process are
> 'ls-files' and 'grep'. However, 'grep' does call 'parse_object_or_die()',
> which (further down in the call stack) calls 'lookup_replace_object()'.
> Maybe I'm misreading and the replaced object isn't actually used, but could
> 'git grep --recurse-submodules' be used to test this?

You're right. I was laser-focused on 'ls-files', but it shouldn't be hard
to construct an example where 'git grep --recurse-submodules' would show
different behavior when this config is toggled.

>> +	/*
>> +	 * Do replace refs need to be checked this run?  This variable is
>> +	 * initialized to true unless --no-replace-object is used or
>> +	 * $GIT_NO_REPLACE_OBJECTS is set, but is set to false by some
>> +	 * commands that do not want replace references to be active.
>> +	 */
>> +	int read_replace_refs;
> 
> I don't think this comment is accurate anymore, since the repo-scoped
> 'read_replace_refs' value is determined *only* by the 'core.usereplacerefs'
> config. It's 'replace_refs_enabled()' that makes the overall determination
> (taking into account 'GIT_NO_REPLACE_OBJECTS'/'--no-replace-object').

Thank you for catching my mistake here. I'll be sure to update it in v2.

Thanks,
-Stolee

@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 2, 2023

Submitted as pull.1537.v2.git.1685716157.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1537/derrickstolee/replace-refs-safety-v2

To fetch this version to local tag pr-1537/derrickstolee/replace-refs-safety-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1537/derrickstolee/replace-refs-safety-v2

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 3, 2023

On the Git mailing list, Elijah Newren wrote (reply to this), regarding 5fc2f92 (outdated):

On Thu, Jun 1, 2023 at 12:50 PM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 6/1/2023 12:35 PM, Victoria Dye wrote:
> > Derrick Stolee via GitGitGadget wrote:
> >> From: Derrick Stolee <derrickstolee@github.com>
>
> >> diff --git a/replace-object.h b/replace-object.h
> >> index 7786d4152b0..b141075023e 100644
> >> --- a/replace-object.h
> >> +++ b/replace-object.h
> >> @@ -27,6 +27,19 @@ void prepare_replace_object(struct repository *r);
> >>  const struct object_id *do_lookup_replace_object(struct repository *r,
> >>                                               const struct object_id *oid);
> >>
> >> +
> >> +/*
> >> + * Some commands disable replace-refs unconditionally, and otherwise each
> >> + * repository could alter the core.useReplaceRefs config value.
> >> + *
> >> + * Return 1 if and only if all of the following are true:
> >> + *
> >> + *  a. disable_replace_refs() has not been called.
> >> + *  b. GIT_NO_REPLACE_OBJECTS is unset or zero.
> >> + *  c. the given repository does not have core.useReplaceRefs=false.
> >> + */
> >> +int replace_refs_enabled(struct repository *r);
> >
> > Since the purpose of this function is to access global state, would
> > 'environment.[c|h]' be a more appropriate place for it (and
> > 'disable_replace_refs()', for that matter)? There's also some precedent;
> > 'set_shared_repository()' and 'get_shared_repository()' have a very similar
> > design to what you've added here.
>
> That's an interesting idea that I had not considered. My vague sense
> is that it is worth isolating the functionality to this header instead
> of lumping it into the giant environment.h header, but I've CC'd
> Elijah (who is leading a lot of this header organization stuff) to see
> if he has an opinion on this matter.

I haven't really formed much of an opinion on the sea of globals in
environment.h and elsewhere beyond "I sure wish we didn't have so many
globals".  Maybe I should have an opinion on it, but there was plenty
to clean up without worrying about all of those.  :-)

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 3, 2023

On the Git mailing list, Elijah Newren wrote (reply to this):

Hi,

On Fri, Jun 2, 2023 at 7:29 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
[...]
> Updates in v2
> =============
>
> Thanks for the careful review on v1!
>
>  * disable_replace_refs() now replaces "read_replace_refs = 0" in the exact
>    same line to avoid possible behavior change.
>  * Stale comments, include headers, and commit messages are updated to
>    include the latest status.
>  * Patch 3 contains a test of the repo-scoped value using 'git grep'.

This version looks good to me; thanks for working on it!

(As a sidenote, it might be nice to call out Victoria's questions
about possibly moving stuff to environment.h[1], so that if others
have opinions on the matter they can comment on it.  I don't have one
at this time.)

[1] Last paragraph of
https://lore.kernel.org/git/49ea603b-ebbd-4a14-e0dd-07078e56de0a@github.com/

@@ -203,14 +203,12 @@ static struct commit_graph *alloc_commit_graph(void)
return g;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, René Scharfe wrote (reply to this):

Am 02.06.23 um 16:29 schrieb Derrick Stolee via GitGitGadget:
> diff --git a/replace-object.c b/replace-object.c
> index ceec81c940c..cf91c3ba456 100644
> --- a/replace-object.c
> +++ b/replace-object.c
> @@ -85,7 +85,14 @@ const struct object_id *do_lookup_replace_object(struct repository *r,
>  	die(_("replace depth too high for object %s"), oid_to_hex(oid));
>  }
>
> +static int read_replace_refs = 1;
> +

This breaks compilation:

   replace-object.c:88:12: error: static declaration of 'read_replace_refs' follows non-static declaration
   static int read_replace_refs = 1;
              ^
   ./replace-object.h:14:12: note: previous declaration is here
   extern int read_replace_refs;
              ^

And this variable is still referenced in two more places outside this
file, which won't work now that it has become static (file-scoped):

   $ git grep read_replace_refs
   builtin/commit-graph.c:extern int read_replace_refs;
   config.c:               read_replace_refs = git_config_bool(var, value);
   replace-object.c: * references, regardless of the value of read_replace_refs.
   replace-object.c:static int read_replace_refs = 1;
   replace-object.c:       read_replace_refs = 0;
   replace-object.c:       return read_replace_refs;
   replace-object.h:extern int read_replace_refs;

Perhaps postpone adding "static" to patch 3?

>  void disable_replace_refs(void)
>  {
>  	read_replace_refs = 0;
>  }
> +
> +int replace_refs_enabled(struct repository *r)
> +{
> +	return read_replace_refs;
> +}
> diff --git a/replace-object.h b/replace-object.h
> index 7786d4152b0..b141075023e 100644
> --- a/replace-object.h
> +++ b/replace-object.h
> @@ -27,6 +27,19 @@ void prepare_replace_object(struct repository *r);
>  const struct object_id *do_lookup_replace_object(struct repository *r,
>  						 const struct object_id *oid);
>
> +
> +/*
> + * Some commands disable replace-refs unconditionally, and otherwise each
> + * repository could alter the core.useReplaceRefs config value.
> + *
> + * Return 1 if and only if all of the following are true:
> + *
> + *  a. disable_replace_refs() has not been called.
> + *  b. GIT_NO_REPLACE_OBJECTS is unset or zero.
> + *  c. the given repository does not have core.useReplaceRefs=false.
> + */
> +int replace_refs_enabled(struct repository *r);
> +
>  /*
>   * If object sha1 should be replaced, return the replacement object's
>   * name (replaced recursively, if necessary).  The return value is
> @@ -41,7 +54,7 @@ const struct object_id *do_lookup_replace_object(struct repository *r,
>  static inline const struct object_id *lookup_replace_object(struct repository *r,
>  							    const struct object_id *oid)
>  {
> -	if (!read_replace_refs ||
> +	if (!replace_refs_enabled(r) ||
>  	    (r->objects->replace_map_initialized &&
>  	     r->objects->replace_map->map.tablesize == 0))
>  		return oid;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 6/3/2023 2:22 AM, René Scharfe wrote:
> Am 02.06.23 um 16:29 schrieb Derrick Stolee via GitGitGadget:
>> diff --git a/replace-object.c b/replace-object.c
>> index ceec81c940c..cf91c3ba456 100644
>> --- a/replace-object.c
>> +++ b/replace-object.c
>> @@ -85,7 +85,14 @@ const struct object_id *do_lookup_replace_object(struct repository *r,
>>  	die(_("replace depth too high for object %s"), oid_to_hex(oid));
>>  }
>>
>> +static int read_replace_refs = 1;
>> +
> 
> This breaks compilation:
> 
>    replace-object.c:88:12: error: static declaration of 'read_replace_refs' follows non-static declaration
>    static int read_replace_refs = 1;
>               ^
>    ./replace-object.h:14:12: note: previous declaration is here
>    extern int read_replace_refs;
>               ^

Thanks for finding this issue within the patch. The removal of the global
should have happened in this patch, but I missed it when adjusting things
for this version.

> And this variable is still referenced in two more places outside this
> file, which won't work now that it has become static (file-scoped):
> 
>    $ git grep read_replace_refs
>    builtin/commit-graph.c:extern int read_replace_refs;
>    config.c:               read_replace_refs = git_config_bool(var, value);
>    replace-object.c: * references, regardless of the value of read_replace_refs.
>    replace-object.c:static int read_replace_refs = 1;
>    replace-object.c:       read_replace_refs = 0;
>    replace-object.c:       return read_replace_refs;
>    replace-object.h:extern int read_replace_refs;
> 
> Perhaps postpone adding "static" to patch 3?

You're right that this won't work unless we fix the config-parsing code
already here, so I should move the introduction of the static global until
patch 3, as you suggest.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 3, 2023

User René Scharfe <l.s.r@web.de> has been added to the cc: list.

The 'read_replace_objects' constant is initialized by git_default_config
(if core.useReplaceRefs is disabled) and within setup_git_env (if
GIT_NO_REPLACE_OBJECTS) is set. To ensure that this variable cannot be
set accidentally in other places, wrap it in a replace_refs_enabled()
method.

Since we still assign this global in config.c, we are not able to remove
the global scope of this variable and make it a static within
replace-object.c. This will happen in a later change which will also
prevent the variable from being read before it is initialized.

Centralizing read access to the variable is an important first step.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 5, 2023

On the Git mailing list, Derrick Stolee wrote (reply to this), regarding 5fc2f92 (outdated):

On 6/2/2023 9:47 PM, Elijah Newren wrote:
> On Thu, Jun 1, 2023 at 12:50 PM Derrick Stolee <derrickstolee@github.com> wrote:
>>
>> On 6/1/2023 12:35 PM, Victoria Dye wrote:
>>> Derrick Stolee via GitGitGadget wrote:
>>>> From: Derrick Stolee <derrickstolee@github.com>
>>
>>>> diff --git a/replace-object.h b/replace-object.h
>>>> index 7786d4152b0..b141075023e 100644
>>>> --- a/replace-object.h
>>>> +++ b/replace-object.h
>>>> @@ -27,6 +27,19 @@ void prepare_replace_object(struct repository *r);
>>>>  const struct object_id *do_lookup_replace_object(struct repository *r,
>>>>                                               const struct object_id *oid);
>>>>
>>>> +
>>>> +/*
>>>> + * Some commands disable replace-refs unconditionally, and otherwise each
>>>> + * repository could alter the core.useReplaceRefs config value.
>>>> + *
>>>> + * Return 1 if and only if all of the following are true:
>>>> + *
>>>> + *  a. disable_replace_refs() has not been called.
>>>> + *  b. GIT_NO_REPLACE_OBJECTS is unset or zero.
>>>> + *  c. the given repository does not have core.useReplaceRefs=false.
>>>> + */
>>>> +int replace_refs_enabled(struct repository *r);
>>>
>>> Since the purpose of this function is to access global state, would
>>> 'environment.[c|h]' be a more appropriate place for it (and
>>> 'disable_replace_refs()', for that matter)? There's also some precedent;
>>> 'set_shared_repository()' and 'get_shared_repository()' have a very similar
>>> design to what you've added here.
>>
>> That's an interesting idea that I had not considered. My vague sense
>> is that it is worth isolating the functionality to this header instead
>> of lumping it into the giant environment.h header, but I've CC'd
>> Elijah (who is leading a lot of this header organization stuff) to see
>> if he has an opinion on this matter.
> 
> I haven't really formed much of an opinion on the sea of globals in
> environment.h and elsewhere beyond "I sure wish we didn't have so many
> globals".  Maybe I should have an opinion on it, but there was plenty
> to clean up without worrying about all of those.  :-)

Thanks for chiming in (even with "I haven't thought about it too much").

Thinking back on this with some time since the initial question, I think
the grouping "global state" into environment.h isn't the right goal.
Using replace-object.h collects all _functionality related to the feature_
in a single place, and it just so happens to include some global state due
to the needs of the feature.

I plan to keep these methods in replace-object.h. With that, we have only
20 files that include that, as opposed to 141 including environment.h.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 5, 2023

On the Git mailing list, Victoria Dye wrote (reply to this), regarding 4c7dbeb (outdated):

Derrick Stolee via GitGitGadget wrote:
> diff --git a/repository.h b/repository.h
> index c42f7ab6bdc..5315e0852e3 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -39,6 +39,15 @@ struct repo_settings {
>  	int pack_read_reverse_index;
>  	int pack_use_bitmap_boundary_traversal;
>  
> +	/*
> +	 * Has this repository have core.useReplaceRefs=true (on by

s/Has/Does(?)

It's not a huge deal, but if you were planning to re-roll anyway due to [1],
this should be easy enough to include.

[1] https://lore.kernel.org/git/a1967c58-48c5-6ae0-f60a-2d8c107f8f66@web.de/

> +	 * default)? This provides a repository-scoped version of this
> +	 * config, though it could be disabled process-wide via some Git
> +	 * builtins or the --no-replace-objects option. See
> +	 * replace_refs_enabled() for more details.
> +	 */
> +	int read_replace_refs;
> +
>  	struct fsmonitor_settings *fsmonitor; /* lazily loaded */
>  
>  	int index_version;
> diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
> index 8143817b19e..d37c83b4640 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -594,4 +594,44 @@ test_expect_success 'grep partially-cloned submodule' '
>  	)
>  '
>  
> +test_expect_success 'check scope of core.useReplaceRefs' '
> +	git init base &&
> +	git init base/sub &&
> +
> +	echo A >base/a &&
> +	echo B >base/b &&
> +	echo C >base/sub/c &&
> +	echo D >base/sub/d &&
> +
> +	git -C base/sub add c d &&
> +	git -C base/sub commit -m "Add files" &&
> +
> +	git -C base submodule add ./sub &&
> +	git -C base add a b sub &&
> +	git -C base commit -m "Add files and submodule" &&
> +
> +	A=$(git -C base rev-parse HEAD:a) &&
> +	B=$(git -C base rev-parse HEAD:b) &&
> +	C=$(git -C base/sub rev-parse HEAD:c) &&
> +	D=$(git -C base/sub rev-parse HEAD:d) &&
> +
> +	git -C base replace $A $B &&
> +	git -C base/sub replace $C $D &&
> +
> +	test_must_fail git -C base grep --cached --recurse-submodules A &&
> +	test_must_fail git -C base grep --cached --recurse-submodules C &&

First, we verify that replace refs are enabled on both the superproject and
submodule by default (if they are, 'a' has been replaced with 'b' and 'c'
has been replaced with 'd')...

> +
> +	git -C base config core.useReplaceRefs false &&
> +	git -C base grep --recurse-submodules A &&
> +	test_must_fail git -C base grep --cached --recurse-submodules C &&

...then, if replace refs are disabled in the superproject (but not the
submodule), 'b' no longer replaces 'a' but 'd' still replaces 'c'...

> +
> +	git -C base/sub config core.useReplaceRefs false &&
> +	git -C base grep --cached --recurse-submodules A &&
> +	git -C base grep --cached --recurse-submodules C &&

...but once replace refs are disabled in the submodule, 'd' no longer
replaces 'c', and 'b' still doesn't replace 'a'...

> +
> +	git -C base config --unset core.useReplaceRefs &&
> +	test_must_fail git -C base grep --cached --recurse-submodules A &&
> +	git -C base grep --cached --recurse-submodules C

...and, finally, if we restore the default state in the superproject
(replace refs enabled) but not the submodule, 'b' once again replaces 'a',
but 'd' still doesn't replace 'c'.

Thanks for adding this test! It clearly and thoroughly exercises the updated
config behavior.

> +'
> +
>  test_done

The 'read_replace_refs' global specifies whether or not we should
respect the references of the form 'refs/replace/<oid>' to replace which
object we look up when asking for '<oid>'. This global has caused issues
when it is not initialized properly, such as in b6551fe (merge-tree:
load default git config, 2023-05-10).

To make this more robust, move its config-based initialization out of
git_default_config and into prepare_repo_settings(). This provides a
repository-scoped version of the 'read_replace_refs' global.

The global still has its purpose: it is disabled process-wide by the
GIT_NO_REPLACE_OBJECTS environment variable or by a call to
disable_replace_refs() in some specific Git commands.

Since we already encapsulated the use of the constant inside
replace_refs_enabled(), we can perform the initialization inside that
method, if necessary. This solves the problem of forgetting to check the
config, as we will check it before returning this value.

Due to this encapsulation, the global can move to be static within
replace-object.c.

There is an interesting behavior change possible here: we now have a
repository-scoped understanding of this config value. Thus, if there was
a command that recurses into submodules and might follow replace refs,
then it would now respect the core.useReplaceRefs config value in each
repository.

'git grep --recurse-submodules' is such a command that recurses into
submodules in-process. We can demonstrate the granularity of this config
value via a test in t7814.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 6, 2023

There was a status update in the "Cooking" section about the branch ds/disable-replace-refs on the Git mailing list:

Introduce a mechanism to disable replace refs globally and per
repository.

Will merge to 'next'.
source: <pull.1537.v2.git.1685716157.gitgitgadget@gmail.com>

@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 6, 2023

Submitted as pull.1537.v3.git.1686057877.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1537/derrickstolee/replace-refs-safety-v3

To fetch this version to local tag pr-1537/derrickstolee/replace-refs-safety-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1537/derrickstolee/replace-refs-safety-v3

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 6, 2023

On the Git mailing list, Victoria Dye wrote (reply to this):

Derrick Stolee via GitGitGadget wrote:
> Updates in v3
> =============
> 
> Thanks for the review on v2!
> 
>  * The removal of the global from environment.c is delayed to patch 3
>    because config.c still assigns the value in patch 2.
>  * The comment for the member in the repo_settings struct is modified for
>    better grammar.
> 

The changes in this version were small (and easy to review via range-diff);
everything looks good to me. Thanks for the updates!

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 12, 2023

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Updates in v3
> =============
>
> Thanks for the review on v2!
>
>  * The removal of the global from environment.c is delayed to patch 3
>    because config.c still assigns the value in patch 2.
>  * The comment for the member in the repo_settings struct is modified for
>    better grammar.

Thanks.  Will queue.  Let's merge it down to 'next'.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 12, 2023

This patch series was integrated into seen via git@ef11198.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 13, 2023

This patch series was integrated into seen via git@7f5adc6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 13, 2023

There was a status update in the "Cooking" section about the branch ds/disable-replace-refs on the Git mailing list:

Introduce a mechanism to disable replace refs globally and per
repository.

Will merge to 'next'.
source: <pull.1537.v3.git.1686057877.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 13, 2023

This patch series was integrated into seen via git@2c8616b.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2023

This patch series was integrated into seen via git@2057d2b.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2023

This patch series was integrated into seen via git@b6f9b74.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2023

This patch series was integrated into next via git@82ba5a6.

@gitgitgadget gitgitgadget bot added the next label Jun 14, 2023
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2023

There was a status update in the "Cooking" section about the branch ds/disable-replace-refs on the Git mailing list:

Introduce a mechanism to disable replace refs globally and per
repository.

Will merge to 'master'.
source: <pull.1537.v3.git.1686057877.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2023

This patch series was integrated into seen via git@c3e0386.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2023

This patch series was integrated into next via git@26c3a34.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant