-
Notifications
You must be signed in to change notification settings - Fork 26.7k
Fix CI/PR failures in the "Documentation" job #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CI/PR failures in the "Documentation" job #707
Conversation
dc3135a
to
c74fa46
Compare
A recent update in the Linux VM images used by Azure Pipelines surfaced a new problem in the "Documentation" job. Apparently, this warning appears 396 times on `stderr` when running `make doc`: /usr/lib/ruby/vendor_ruby/rubygems/defaults/operating_system.rb:10: warning: constant Gem::ConfigMap is deprecated This problem was already reported to the `rubygems` project via rubygems/rubygems#3068. As there is nothing Git can do about this warning, and as the "Documentation" job reports this warning as a failure, let's just silence it and move on. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
c74fa46
to
be59f18
Compare
/submit |
On the Git mailing list, Taylor Blau wrote (reply to this):
|
This branch is now known as |
This patch series was integrated into pu via db280cc. |
This patch series was integrated into pu via 2c51ec0. |
This patch series was integrated into pu via c47d377. |
This patch series was integrated into next via 4fec291. |
This patch series was integrated into pu via 0de2d14. |
This patch series was integrated into next via 0de2d14. |
This patch series was integrated into master via 0de2d14. |
Closed via 0de2d14. |
The --path-walk option in 'git pack-objects' is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within 'git push' specifically. While this config does enable the path-walk feature, it does not lead the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via 'git repack -Ad' helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set 'gc.auto=0' before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in 'git push', this change makes the --path-walk option imply --no-reuse-delta when the --reuse-delta option is not provided. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). The --path-walk option at the moment is not compatible with reachability bitmaps, so is not planned to be used by Git servers. Thus, we can reasonably assume (for now) that the --path-walk option is assuming a client-side scenario, either a push or a repack. The repack option will be explicit about the --reuse-delta option or not. One thing to be careful about is background maintenance, which uses a list of objects instead of refs, so we condition this on the case where the --path-walk option will be effective by checking that the --revs option was provided. Alternative options considered included: * Adding _another_ config ('pack.reuseDelta=false') to opt-in to this choice. However, we already have pack.usePathWalk=true as an opt-in to "do the right thing to make my data small" as far as our internal users are concerned. * Modify the chain between builtin/push.c, transport.c, and builtin/send-pack.c to communicate that we are in "push" mode, not within a fetch or clone. However, this seemed like overkill. It may be beneficial in the future to pass through a mode like this, but it does not meet the bar for the immediate need. Reviewers, please see git-for-windows#5171 for the baseline implementation of this feature within Git for Windows and thus microsoft/git. This feature is still under review upstream.
The --path-walk option in 'git pack-objects' is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within 'git push' specifically. While this config does enable the path-walk feature, it does not lead the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via 'git repack -Ad' helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set 'gc.auto=0' before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in 'git push', this change makes the --path-walk option imply --no-reuse-delta when the --reuse-delta option is not provided. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). The --path-walk option at the moment is not compatible with reachability bitmaps, so is not planned to be used by Git servers. Thus, we can reasonably assume (for now) that the --path-walk option is assuming a client-side scenario, either a push or a repack. The repack option will be explicit about the --reuse-delta option or not. One thing to be careful about is background maintenance, which uses a list of objects instead of refs, so we condition this on the case where the --path-walk option will be effective by checking that the --revs option was provided. Alternative options considered included: * Adding _another_ config ('pack.reuseDelta=false') to opt-in to this choice. However, we already have pack.usePathWalk=true as an opt-in to "do the right thing to make my data small" as far as our internal users are concerned. * Modify the chain between builtin/push.c, transport.c, and builtin/send-pack.c to communicate that we are in "push" mode, not within a fetch or clone. However, this seemed like overkill. It may be beneficial in the future to pass through a mode like this, but it does not meet the bar for the immediate need. Reviewers, please see git-for-windows#5171 for the baseline implementation of this feature within Git for Windows and thus microsoft/git. This feature is still under review upstream.
It would probably make sense to apply this to
maint
, too, as it will cause CI failures even if there is nothing actionable to be done to really fix this on Git's side.