Skip to content

Conversation

dscho
Copy link
Member

@dscho dscho commented Feb 9, 2020

It would probably make sense to apply this to maint, too, as it will cause CI failures even if there is nothing actionable to be done to really fix this on Git's side.

A recent update in the Linux VM images used by Azure Pipelines surfaced
a new problem in the "Documentation" job. Apparently, this warning
appears 396 times on `stderr` when running `make doc`:

/usr/lib/ruby/vendor_ruby/rubygems/defaults/operating_system.rb:10: warning: constant Gem::ConfigMap is deprecated

This problem was already reported to the `rubygems` project via
rubygems/rubygems#3068.

As there is nothing Git can do about this warning, and as the
"Documentation" job reports this warning as a failure, let's just
silence it and move on.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the workaround-for-rubygems-using-deprecated-component branch from c74fa46 to be59f18 Compare February 9, 2020 21:23
@dscho
Copy link
Member Author

dscho commented Feb 9, 2020

/submit

@gitgitgadget-git
Copy link

@gitgitgadget-git
Copy link

On the Git mailing list, Taylor Blau wrote (reply to this):

Hi Johannes,

On Sun, Feb 09, 2020 at 10:36:16PM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> A recent update in the Linux VM images used by Azure Pipelines surfaced
> a new problem in the "Documentation" job. Apparently, this warning
> appears 396 times on `stderr` when running `make doc`:
>
> /usr/lib/ruby/vendor_ruby/rubygems/defaults/operating_system.rb:10: warning: constant Gem::ConfigMap is deprecated
>
> This problem was already reported to the `rubygems` project via
> https://github.com/rubygems/rubygems/issues/3068.
>
> As there is nothing Git can do about this warning, and as the
> "Documentation" job reports this warning as a failure, let's just
> silence it and move on.

Thanks for explaining, and for taking time to silence errors such as
these that we can't do anything about. Everything as explained makes
good sense to me.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>     Fix CI/PR failures in the "Documentation" job
>
>     It would probably make sense to apply this to maint, too, as it will
>     cause CI failures even if there is nothing actionable to be done to
>     really fix this on Git's side.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-707%2Fdscho%2Fworkaround-for-rubygems-using-deprecated-component-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-707/dscho/workaround-for-rubygems-using-deprecated-component-v1
> Pull-Request: https://github.com/git/git/pull/707
>
>  ci/test-documentation.sh | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/ci/test-documentation.sh b/ci/test-documentation.sh
> index b3e76ef863..de41888430 100755
> --- a/ci/test-documentation.sh
> +++ b/ci/test-documentation.sh
> @@ -7,6 +7,7 @@
>
>  filter_log () {
>  	sed -e '/^GIT_VERSION = /d' \
> +	    -e "/constant Gem::ConfigMap is deprecated/d" \
>  	    -e '/^    \* new asciidoc flags$/d' \
>  	    -e '/stripped namespace before processing/d' \
>  	    -e '/Attributed.*IDs for element/d' \
>
> base-commit: d0654dc308b0ba76dd8ed7bbb33c8d8f7aacd783
> --
> gitgitgadget

Thanks,
Taylor

@gitgitgadget-git
Copy link

This branch is now known as js/ci-squelch-doc-warning.

@gitgitgadget-git
Copy link

This patch series was integrated into pu via db280cc.

@gitgitgadget-git gitgitgadget-git bot added the pu label Feb 10, 2020
@gitgitgadget-git
Copy link

This patch series was integrated into pu via 2c51ec0.

@gitgitgadget-git
Copy link

This patch series was integrated into pu via c47d377.

@gitgitgadget-git
Copy link

This patch series was integrated into next via 4fec291.

@gitgitgadget-git
Copy link

This patch series was integrated into pu via 0de2d14.

@gitgitgadget-git
Copy link

This patch series was integrated into next via 0de2d14.

@gitgitgadget-git
Copy link

This patch series was integrated into master via 0de2d14.

@gitgitgadget-git
Copy link

Closed via 0de2d14.

@dscho dscho deleted the workaround-for-rubygems-using-deprecated-component branch February 13, 2020 09:15
derrickstolee pushed a commit to derrickstolee/git that referenced this pull request Aug 5, 2025
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.

While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.

In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.

I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).

In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.

As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:

1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.

2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.

There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).

The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.

One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.

Alternative options considered included:

* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.

* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.

Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
derrickstolee pushed a commit to derrickstolee/git that referenced this pull request Sep 2, 2025
The --path-walk option in 'git pack-objects' is implied by the
pack.usePathWalk=true config value. This is intended to help the
packfile generation within 'git push' specifically.

While this config does enable the path-walk feature, it does not lead
the expected levels of compression in the cases it was designed to
handle. This is due to the default implication of the --reuse-delta
option as well as auto-GC.

In the performance tests used to evaluate the --path-walk option, such
as those in p5313, the --no-reuse-delta option is used to ensure that
deltas are recomputed according to the new object walk. However, it was
assumed (I assumed this) that when the objects were loose from
client-side operations that better deltas would be computed during this
operation. This wasn't confirmed because the test process used data that
was fetched from real repositories and thus existed in packed form only.

I was able to confirm that this does not reproduce when the objects to
push are loose. Careful use of making the pushed commit unreachable and
loosening the objects via 'git repack -Ad' helps to confirm my
suspicions here. Independent of this change, I'm pushing for these
pipeline agents to set 'gc.auto=0' before creating their Git objects. In
the current setup, the repo is adding objects and then incrementally
repacking them and ending up with bad cross-path deltas. This approach
can help scenarios where that makes sense, but will not cover all of our
users without them choosing to opt-in to background maintenance (and
even then, an incremental repack could cost them efficiency).

In order to make sure we are getting the intended compression in 'git
push', this change makes the --path-walk option imply --no-reuse-delta
when the --reuse-delta option is not provided.

As far as I can tell, the main motivation for implying the --reuse-delta
option by default is two-fold:

1. The code in send-pack.c that executes 'git pack-objects' is ignorant
of whether the current process is a client pushing to a remote or a
remote sending a fetch or clone to a client.

2. For servers, it is critical that they trust the previously computed
deltas whenever possible, or they could overload their CPU resources.

There's also the side that most servers use repacking logic that will
replace any bad deltas that are sent by clients (or at least, that's the
hope; we've seen that repacks can also pick bad deltas).

The --path-walk option at the moment is not compatible with reachability
bitmaps, so is not planned to be used by Git servers. Thus, we can
reasonably assume (for now) that the --path-walk option is assuming a
client-side scenario, either a push or a repack. The repack option will
be explicit about the --reuse-delta option or not.

One thing to be careful about is background maintenance, which uses a
list of objects instead of refs, so we condition this on the case where
the --path-walk option will be effective by checking that the --revs
option was provided.

Alternative options considered included:

* Adding _another_ config ('pack.reuseDelta=false') to opt-in to this
choice. However, we already have pack.usePathWalk=true as an opt-in to
"do the right thing to make my data small" as far as our internal users
are concerned.

* Modify the chain between builtin/push.c, transport.c, and
builtin/send-pack.c to communicate that we are in "push" mode, not
within a fetch or clone. However, this seemed like overkill. It may be
beneficial in the future to pass through a mode like this, but it does
not meet the bar for the immediate need.

Reviewers, please see git-for-windows#5171 for the baseline
implementation of this feature within Git for Windows and thus
microsoft/git. This feature is still under review upstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant