Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some more style fixes for the make-stash-a-builtin branch #1

Merged
merged 11 commits into from
Jan 18, 2016

Conversation

dscho
Copy link

@dscho dscho commented Jan 18, 2016

No description provided.

Use spaces around '=' and '==', after 'if', ',' and ';'.
Git's source files all end in newline characters because that is what
vim does anyways.
Repeat upstream's `!strcmp()` idiom.
Break up cuddled lines by introducing empty lines.
Do not use "type* ", but rather "type *", as upstream does.
Git's source code tends to use meaningful parameter names even in
function declarations, as they serve as documentation, too.
Avoid C90/C++ style comments. Git's source code tries to use only
old school comments.
@dscho
Copy link
Author

dscho commented Jan 18, 2016

For convenience, I prepared this branch so that git rebase -i --autosquash would collapse all of these changes into the appropriate commit. Unfortunately, the commit subjects are now really not very helpful.

@rimrul
Copy link
Owner

rimrul commented Jan 18, 2016

I can just merge this, run git rebase -i --autosquash and push it again?

@rimrul rimrul merged this pull request into rimrul:make-stash-a-builtin Jan 18, 2016
@rimrul
Copy link
Owner

rimrul commented Jan 18, 2016

Turns out I could.

@dscho dscho deleted the make-stash-a-builtin branch January 19, 2016 06:16
rimrul pushed a commit that referenced this pull request Jul 2, 2021
ibuf can be reused for multiple iterations of the loop. Specifically:
deflate() overwrites s.avail_in to show how much of the input buffer
has not been processed yet - and sometimes leaves 'avail_in > 0', in
which case ibuf will be processed again during the loop's subsequent
iteration.

But if we declare ibuf within the loop, then (in theory) we get a new
(and uninitialised) buffer for every iteration. In practice, my compiler
seems to resue the same buffer - meaning that this code does work - but
it doesn't seem safe to rely on this behaviour. MSAN correctly catches
this issue - as soon as we hit the 's.avail_in > 0' condition, we end up
reading from what seems to be uninitialised memory.

Therefore, we move ibuf out of the loop, making this reuse safe.

See MSAN output from t1050-large below - the interesting part is the
ibuf creation at the end, although there's a lot of indirection before
we reach the read from unitialised memory:

==11294==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x7f75db58fb1c in crc32_little crc32.c:283:9
    #1 0x7f75db58d5b3 in crc32_z crc32.c:220:20
    #2 0x7f75db59668c in crc32 crc32.c:242:12
    git-for-windows#3 0x8c94f8 in hashwrite csum-file.c:101:15
    git-for-windows#4 0x825faf in stream_to_pack bulk-checkin.c:154:5
    git-for-windows#5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    git-for-windows#6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    git-for-windows#7 0xa7cff2 in index_stream object-file.c:2234:9
    git-for-windows#8 0xa7bff7 in index_fd object-file.c:2256:9
    git-for-windows#9 0xa7d22d in index_path object-file.c:2274:7
    git-for-windows#10 0xb3c8c9 in add_to_index read-cache.c:802:7
    git-for-windows#11 0xb3e039 in add_file_to_index read-cache.c:835:9
    git-for-windows#12 0x4a99c3 in add_files add.c:458:7
    git-for-windows#13 0x4a7276 in cmd_add add.c:670:18
    git-for-windows#14 0x4a1e76 in run_builtin git.c:461:11
    git-for-windows#15 0x49e1e7 in handle_builtin git.c:714:3
    git-for-windows#16 0x4a0c08 in run_argv git.c:781:4
    git-for-windows#17 0x49d5a8 in cmd_main git.c:912:19
    git-for-windows#18 0x7974da in main common-main.c:52:11
    git-for-windows#19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    git-for-windows#20 0x421bd9 in _start start.S:120

  Uninitialized value was stored to memory at
    #0 0x7f75db58fa6b in crc32_little crc32.c:283:9
    #1 0x7f75db58d5b3 in crc32_z crc32.c:220:20
    #2 0x7f75db59668c in crc32 crc32.c:242:12
    git-for-windows#3 0x8c94f8 in hashwrite csum-file.c:101:15
    git-for-windows#4 0x825faf in stream_to_pack bulk-checkin.c:154:5
    git-for-windows#5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    git-for-windows#6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    git-for-windows#7 0xa7cff2 in index_stream object-file.c:2234:9
    git-for-windows#8 0xa7bff7 in index_fd object-file.c:2256:9
    git-for-windows#9 0xa7d22d in index_path object-file.c:2274:7
    git-for-windows#10 0xb3c8c9 in add_to_index read-cache.c:802:7
    git-for-windows#11 0xb3e039 in add_file_to_index read-cache.c:835:9
    git-for-windows#12 0x4a99c3 in add_files add.c:458:7
    git-for-windows#13 0x4a7276 in cmd_add add.c:670:18
    git-for-windows#14 0x4a1e76 in run_builtin git.c:461:11
    git-for-windows#15 0x49e1e7 in handle_builtin git.c:714:3
    git-for-windows#16 0x4a0c08 in run_argv git.c:781:4
    git-for-windows#17 0x49d5a8 in cmd_main git.c:912:19
    git-for-windows#18 0x7974da in main common-main.c:52:11
    git-for-windows#19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)

  Uninitialized value was stored to memory at
    #0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
    #1 0x7f75db5c2011 in flush_pending deflate.c:746:5
    #2 0x7f75db5cafa0 in deflate_stored deflate.c:1815:9
    git-for-windows#3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
    git-for-windows#4 0xd80b7f in git_deflate zlib.c:244:12
    git-for-windows#5 0x825dff in stream_to_pack bulk-checkin.c:140:12
    git-for-windows#6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    git-for-windows#7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    git-for-windows#8 0xa7cff2 in index_stream object-file.c:2234:9
    git-for-windows#9 0xa7bff7 in index_fd object-file.c:2256:9
    git-for-windows#10 0xa7d22d in index_path object-file.c:2274:7
    git-for-windows#11 0xb3c8c9 in add_to_index read-cache.c:802:7
    git-for-windows#12 0xb3e039 in add_file_to_index read-cache.c:835:9
    git-for-windows#13 0x4a99c3 in add_files add.c:458:7
    git-for-windows#14 0x4a7276 in cmd_add add.c:670:18
    git-for-windows#15 0x4a1e76 in run_builtin git.c:461:11
    git-for-windows#16 0x49e1e7 in handle_builtin git.c:714:3
    git-for-windows#17 0x4a0c08 in run_argv git.c:781:4
    git-for-windows#18 0x49d5a8 in cmd_main git.c:912:19
    git-for-windows#19 0x7974da in main common-main.c:52:11

  Uninitialized value was stored to memory at
    #0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
    #1 0x7f75db644241 in _tr_stored_block trees.c:873:5
    #2 0x7f75db5cad7c in deflate_stored deflate.c:1813:9
    git-for-windows#3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
    git-for-windows#4 0xd80b7f in git_deflate zlib.c:244:12
    git-for-windows#5 0x825dff in stream_to_pack bulk-checkin.c:140:12
    git-for-windows#6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    git-for-windows#7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    git-for-windows#8 0xa7cff2 in index_stream object-file.c:2234:9
    git-for-windows#9 0xa7bff7 in index_fd object-file.c:2256:9
    git-for-windows#10 0xa7d22d in index_path object-file.c:2274:7
    git-for-windows#11 0xb3c8c9 in add_to_index read-cache.c:802:7
    git-for-windows#12 0xb3e039 in add_file_to_index read-cache.c:835:9
    git-for-windows#13 0x4a99c3 in add_files add.c:458:7
    git-for-windows#14 0x4a7276 in cmd_add add.c:670:18
    git-for-windows#15 0x4a1e76 in run_builtin git.c:461:11
    git-for-windows#16 0x49e1e7 in handle_builtin git.c:714:3
    git-for-windows#17 0x4a0c08 in run_argv git.c:781:4
    git-for-windows#18 0x49d5a8 in cmd_main git.c:912:19
    git-for-windows#19 0x7974da in main common-main.c:52:11

  Uninitialized value was stored to memory at
    #0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
    #1 0x7f75db5c8fcf in deflate_stored deflate.c:1783:9
    #2 0x7f75db5bb7d2 in deflate deflate.c:1005:34
    git-for-windows#3 0xd80b7f in git_deflate zlib.c:244:12
    git-for-windows#4 0x825dff in stream_to_pack bulk-checkin.c:140:12
    git-for-windows#5 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    git-for-windows#6 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    git-for-windows#7 0xa7cff2 in index_stream object-file.c:2234:9
    git-for-windows#8 0xa7bff7 in index_fd object-file.c:2256:9
    git-for-windows#9 0xa7d22d in index_path object-file.c:2274:7
    git-for-windows#10 0xb3c8c9 in add_to_index read-cache.c:802:7
    git-for-windows#11 0xb3e039 in add_file_to_index read-cache.c:835:9
    git-for-windows#12 0x4a99c3 in add_files add.c:458:7
    git-for-windows#13 0x4a7276 in cmd_add add.c:670:18
    git-for-windows#14 0x4a1e76 in run_builtin git.c:461:11
    git-for-windows#15 0x49e1e7 in handle_builtin git.c:714:3
    git-for-windows#16 0x4a0c08 in run_argv git.c:781:4
    git-for-windows#17 0x49d5a8 in cmd_main git.c:912:19
    git-for-windows#18 0x7974da in main common-main.c:52:11
    git-for-windows#19 0x7f75da66f349 in __libc_start_main (/lib64/libc.so.6+0x24349)

  Uninitialized value was stored to memory at
    #0 0x447eb9 in __msan_memcpy msan_interceptors.cpp:1558:3
    #1 0x7f75db5ea545 in read_buf deflate.c:1181:5
    #2 0x7f75db5c97f7 in deflate_stored deflate.c:1791:9
    git-for-windows#3 0x7f75db5bb7d2 in deflate deflate.c:1005:34
    git-for-windows#4 0xd80b7f in git_deflate zlib.c:244:12
    git-for-windows#5 0x825dff in stream_to_pack bulk-checkin.c:140:12
    git-for-windows#6 0x82467b in deflate_to_pack bulk-checkin.c:225:8
    git-for-windows#7 0x823ff1 in index_bulk_checkin bulk-checkin.c:264:15
    git-for-windows#8 0xa7cff2 in index_stream object-file.c:2234:9
    git-for-windows#9 0xa7bff7 in index_fd object-file.c:2256:9
    git-for-windows#10 0xa7d22d in index_path object-file.c:2274:7
    git-for-windows#11 0xb3c8c9 in add_to_index read-cache.c:802:7
    git-for-windows#12 0xb3e039 in add_file_to_index read-cache.c:835:9
    git-for-windows#13 0x4a99c3 in add_files add.c:458:7
    git-for-windows#14 0x4a7276 in cmd_add add.c:670:18
    git-for-windows#15 0x4a1e76 in run_builtin git.c:461:11
    git-for-windows#16 0x49e1e7 in handle_builtin git.c:714:3
    git-for-windows#17 0x4a0c08 in run_argv git.c:781:4
    git-for-windows#18 0x49d5a8 in cmd_main git.c:912:19
    git-for-windows#19 0x7974da in main common-main.c:52:11

  Uninitialized value was created by an allocation of 'ibuf' in the stack frame of function 'stream_to_pack'
    #0 0x825710 in stream_to_pack bulk-checkin.c:101

SUMMARY: MemorySanitizer: use-of-uninitialized-value crc32.c:283:9 in crc32_little
Exiting

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Jul 2, 2021
cache_entry contains an object_id, and compare_ce_content() would
include that field when calling memcmp on a subset of the cache_entry.
Depending on which hashing algorithm is being used, only part of
object_id.hash is actually being used, therefore including it in a
memcmp() is incorrect. Instead we choose to exclude the object_id when
calling memcmp(), and call oideq() separately.

This issue was found when running t1700-split-index with MSAN, see MSAN
output below (on my machine, offset 76 corresponds to 4 bytes after the
start of object_id.hash).

Uninitialized bytes in MemcmpInterceptorCommon at offset 76 inside [0x7f60e7c00118, 92)
==27914==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x4524ee in memcmp /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:873:10
    #1 0xc867ae in compare_ce_content /home/ahunt/git/git/split-index.c:208:8
    #2 0xc859fb in prepare_to_write_split_index /home/ahunt/git/git/split-index.c:336:9
    git-for-windows#3 0xb4bbca in write_split_index /home/ahunt/git/git/read-cache.c:3107:2
    git-for-windows#4 0xb42b4d in write_locked_index /home/ahunt/git/git/read-cache.c:3295:8
    git-for-windows#5 0x638058 in try_merge_strategy /home/ahunt/git/git/builtin/merge.c:758:7
    git-for-windows#6 0x63057f in cmd_merge /home/ahunt/git/git/builtin/merge.c:1663:9
    git-for-windows#7 0x4a1e76 in run_builtin /home/ahunt/git/git/git.c:461:11
    git-for-windows#8 0x49e1e7 in handle_builtin /home/ahunt/git/git/git.c:714:3
    git-for-windows#9 0x4a0c08 in run_argv /home/ahunt/git/git/git.c:781:4
    git-for-windows#10 0x49d5a8 in cmd_main /home/ahunt/git/git/git.c:912:19
    git-for-windows#11 0x7974da in main /home/ahunt/git/git/common-main.c:52:11
    git-for-windows#12 0x7f60e928e349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    git-for-windows#13 0x421bd9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120

  Uninitialized value was stored to memory at
    #0 0x447eb9 in __msan_memcpy /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
    #1 0xb4d1e6 in dup_cache_entry /home/ahunt/git/git/read-cache.c:3457:2
    #2 0xd214fa in add_entry /home/ahunt/git/git/unpack-trees.c:215:18
    git-for-windows#3 0xd1fae0 in keep_entry /home/ahunt/git/git/unpack-trees.c:2276:2
    git-for-windows#4 0xd1ff9e in twoway_merge /home/ahunt/git/git/unpack-trees.c:2504:11
    git-for-windows#5 0xd27028 in call_unpack_fn /home/ahunt/git/git/unpack-trees.c:593:12
    git-for-windows#6 0xd2443d in unpack_nondirectories /home/ahunt/git/git/unpack-trees.c:1106:12
    git-for-windows#7 0xd19435 in unpack_callback /home/ahunt/git/git/unpack-trees.c:1306:6
    git-for-windows#8 0xd0d7ff in traverse_trees /home/ahunt/git/git/tree-walk.c:532:17
    git-for-windows#9 0xd1773a in unpack_trees /home/ahunt/git/git/unpack-trees.c:1683:9
    git-for-windows#10 0xdc6370 in checkout /home/ahunt/git/git/merge-ort.c:3590:8
    git-for-windows#11 0xdc51c3 in merge_switch_to_result /home/ahunt/git/git/merge-ort.c:3728:7
    git-for-windows#12 0xa195a9 in merge_ort_recursive /home/ahunt/git/git/merge-ort-wrappers.c:58:2
    git-for-windows#13 0x637fff in try_merge_strategy /home/ahunt/git/git/builtin/merge.c:751:12
    git-for-windows#14 0x63057f in cmd_merge /home/ahunt/git/git/builtin/merge.c:1663:9
    git-for-windows#15 0x4a1e76 in run_builtin /home/ahunt/git/git/git.c:461:11
    git-for-windows#16 0x49e1e7 in handle_builtin /home/ahunt/git/git/git.c:714:3
    git-for-windows#17 0x4a0c08 in run_argv /home/ahunt/git/git/git.c:781:4
    git-for-windows#18 0x49d5a8 in cmd_main /home/ahunt/git/git/git.c:912:19
    git-for-windows#19 0x7974da in main /home/ahunt/git/git/common-main.c:52:11

  Uninitialized value was created by a heap allocation
    #0 0x44e73d in malloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/msan/msan_interceptors.cpp:901:3
    #1 0xd592f6 in do_xmalloc /home/ahunt/git/git/wrapper.c:41:8
    #2 0xd59248 in xmalloc /home/ahunt/git/git/wrapper.c:62:9
    git-for-windows#3 0xa17088 in mem_pool_alloc_block /home/ahunt/git/git/mem-pool.c:22:6
    git-for-windows#4 0xa16f78 in mem_pool_init /home/ahunt/git/git/mem-pool.c:44:3
    git-for-windows#5 0xb481b8 in load_all_cache_entries /home/ahunt/git/git/read-cache.c
    git-for-windows#6 0xb44d40 in do_read_index /home/ahunt/git/git/read-cache.c:2298:17
    git-for-windows#7 0xb48a1b in read_index_from /home/ahunt/git/git/read-cache.c:2389:8
    git-for-windows#8 0xbd5a0b in repo_read_index /home/ahunt/git/git/repository.c:276:8
    git-for-windows#9 0xb4bcaf in repo_read_index_unmerged /home/ahunt/git/git/read-cache.c:3326:2
    git-for-windows#10 0x62ed26 in cmd_merge /home/ahunt/git/git/builtin/merge.c:1362:6
    git-for-windows#11 0x4a1e76 in run_builtin /home/ahunt/git/git/git.c:461:11
    git-for-windows#12 0x49e1e7 in handle_builtin /home/ahunt/git/git/git.c:714:3
    git-for-windows#13 0x4a0c08 in run_argv /home/ahunt/git/git/git.c:781:4
    git-for-windows#14 0x49d5a8 in cmd_main /home/ahunt/git/git/git.c:912:19
    git-for-windows#15 0x7974da in main /home/ahunt/git/git/common-main.c:52:11
    git-for-windows#16 0x7f60e928e349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:873:10 in memcmp
Exiting

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Jul 2, 2021
…ints

report_result() sends a struct to the parent process, but that struct
would contain uninitialised padding bytes. Running this code under MSAN
rightly triggers a warning - but we don't particularly care about this
warning because we control the receiving code, and we therefore know
that those padding bytes won't be read on the receiving end.

We could simply suppress this warning under MSAN with the approporiate
ifdef'd attributes, but a less intrusive solution is to 0-initialise the
struct, which guarantees that the padding will also be initialised.

Interestingly, in the error-case branch, we only try to copy the first
two members of pc_item_result, by copying only PC_ITEM_RESULT_BASE_SIZE
bytes. However PC_ITEM_RESULT_BASE_SIZE is defined as
'offsetof(the_last_member)', which means that we're copying padding bytes
after the end of the second last member. We could avoid doing this by
redefining PC_ITEM_RESULT_BASE_SIZE as
'offsetof(second_last_member) + sizeof(second_last_member)', but there's
no huge benefit to doing so (and this patch silences the MSAN warning in
this scenario either way).

MSAN output from t2080 (partially interleaved due to the
parallel work :) ):

Uninitialized bytes in __interceptor_write at offset 12 inside [0x7fff37d83408, 160)
==23279==WARNING: MemorySanitizer: use-of-uninitialized-value
Uninitialized bytes in __interceptor_write at offset 12 inside [0x7ffdb8a07ec8, 160)
==23280==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0xd5ac28 in xwrite /home/ahunt/git/git/wrapper.c:256:8
    #1 0xd5b327 in write_in_full /home/ahunt/git/git/wrapper.c:311:21
    #2 0xb0a8c4 in do_packet_write /home/ahunt/git/git/pkt-line.c:221:6
    git-for-windows#3 0xb0a5fd in packet_write /home/ahunt/git/git/pkt-line.c:242:6
    git-for-windows#4 0x4f7441 in report_result /home/ahunt/git/git/builtin/checkout--worker.c:69:2
    git-for-windows#5 0x4f6be6 in worker_loop /home/ahunt/git/git/builtin/checkout--worker.c:100:3
    git-for-windows#6 0x4f68d3 in cmd_checkout__worker /home/ahunt/git/git/builtin/checkout--worker.c:143:2
    git-for-windows#7 0x4a1e76 in run_builtin /home/ahunt/git/git/git.c:461:11
    git-for-windows#8 0x49e1e7 in handle_builtin /home/ahunt/git/git/git.c:714:3
    git-for-windows#9 0x4a0c08 in run_argv /home/ahunt/git/git/git.c:781:4
    git-for-windows#10 0x49d5a8 in cmd_main /home/ahunt/git/git/git.c:912:19
    git-for-windows#11 0x7974da in main /home/ahunt/git/git/common-main.c:52:11
    git-for-windows#12 0x7f8778114349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    git-for-windows#13 0x421bd9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120

  Uninitialized value was created by an allocation of 'res' in the stack frame of function 'report_result'
    #0 0x4f72c0 in report_result /home/ahunt/git/git/builtin/checkout--worker.c:55

SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/ahunt/git/git/wrapper.c:256:8 in xwrite
Exiting
    #0 0xd5ac28 in xwrite /home/ahunt/git/git/wrapper.c:256:8
    #1 0xd5b327 in write_in_full /home/ahunt/git/git/wrapper.c:311:21
    #2 0xb0a8c4 in do_packet_write /home/ahunt/git/git/pkt-line.c:221:6
    git-for-windows#3 0xb0a5fd in packet_write /home/ahunt/git/git/pkt-line.c:242:6
    git-for-windows#4 0x4f7441 in report_result /home/ahunt/git/git/builtin/checkout--worker.c:69:2
    git-for-windows#5 0x4f6be6 in worker_loop /home/ahunt/git/git/builtin/checkout--worker.c:100:3
    git-for-windows#6 0x4f68d3 in cmd_checkout__worker /home/ahunt/git/git/builtin/checkout--worker.c:143:2
    git-for-windows#7 0x4a1e76 in run_builtin /home/ahunt/git/git/git.c:461:11
    git-for-windows#8 0x49e1e7 in handle_builtin /home/ahunt/git/git/git.c:714:3
    git-for-windows#9 0x4a0c08 in run_argv /home/ahunt/git/git/git.c:781:4
    git-for-windows#10 0x49d5a8 in cmd_main /home/ahunt/git/git/git.c:912:19
    git-for-windows#11 0x7974da in main /home/ahunt/git/git/common-main.c:52:11
    git-for-windows#12 0x7f2749a0e349 in __libc_start_main (/lib64/libc.so.6+0x24349)
    git-for-windows#13 0x421bd9 in _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120

  Uninitialized value was created by an allocation of 'res' in the stack frame of function 'report_result'
    #0 0x4f72c0 in report_result /home/ahunt/git/git/builtin/checkout--worker.c:55

SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/ahunt/git/git/wrapper.c:256:8 in xwrite

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Jul 2, 2021
If we already have a block of potentially moved lines then as we move
down the diff we need to check if the next line of each potentially
moved line matches the current line of the diff. The implementation of
--color-moved-ws=allow-indentation-change was needlessly performing
this check on all the lines in the diff that matched the current line
rather than just the current line. To exacerbate the problem finding
all the other lines in the diff that match the current line involves a
fuzzy lookup so we were wasting even more time performing a second
comparison to filter out the non-matching lines. Fixing this reduces
time to run
  git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
by 88% and simplifies the code.

Before this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):      9.978 s ±  0.042 s    [User: 9.905 s, System: 0.057 s]
  Range (min … max):    9.917 s … 10.037 s    10 runs

After this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):      1.220 s ±  0.004 s    [User: 1.160 s, System: 0.058 s]
  Range (min … max):    1.214 s …  1.226 s    10 runs

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Jul 2, 2021
Calling xdiff_compare_lines() directly rather than using a function
pointer from the hash map reduces the time very slightly but more
importantly it will allow us to easily combine pmb_advance_or_null()
and pmb_advance_or_null_multi_match() in the next commit.

Before this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):      1.136 s ±  0.004 s    [User: 1.079 s, System: 0.053 s]
  Range (min … max):    1.130 s …  1.141 s    10 runs

After this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):      1.118 s ±  0.003 s    [User: 1.062 s, System: 0.053 s]
  Range (min … max):    1.114 s …  1.121 s    10 runs

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Jul 2, 2021
As libxdiff does not have a whitespace flag to ignore the indentation
the code for --color-moved-ws=allow-indentation-change uses
XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
there are non-indentation changes. This is filtering is inefficient as
we have to perform another string comparison.

By using the offset data that we have already computed to skip the
indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove
the extra checks which improves the performance by 14% and paves the
way for the elimination of string comparisons in the next commit.

This change slightly increases the runtime of other --color-moved
modes. This could be avoided by using different comparison functions
for the different modes but after the changes in the next commit there
is no measurable benefit.

Before this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):      1.116 s ±  0.005 s    [User: 1.057 s, System: 0.056 s]
  Range (min … max):    1.109 s …  1.123 s    10 runs

Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):      1.216 s ±  0.005 s    [User: 1.155 s, System: 0.059 s]
  Range (min … max):    1.206 s …  1.223 s    10 runs

After this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):      1.147 s ±  0.005 s    [User: 1.085 s, System: 0.059 s]
  Range (min … max):    1.140 s …  1.154 s    10 runs

Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):      1.048 s ±  0.005 s    [User: 987.4 ms, System: 58.8 ms]
  Range (min … max):    1.043 s …  1.056 s    10 runs

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Jul 2, 2021
Taking inspiration from xdl_classify_record() assign an id to each
addition and deletion such that lines that match for the current
--color-moved-ws mode share the same unique id. This reduces the
number of hash lookups a little (calculating the ids still involves
one hash lookup per line) but the main benefit is that when growing
blocks of potentially moved lines we can replace string comparisons
which involve chasing a pointer with a simple integer comparison.  On
a large diff this commit reduces the time to run 'diff --color-moved'
by 33% and 'diff --color-moved-ws=allow-indentation-change' by 20%.

Compared to master the time to run 'git log --patch --color-moved' is
increased by 2% and 'git log --patch
--color-moved-ws=allow-indentation-change' in reduced by 14%. These
timings were performed on an i5-7200U, on an i5-3470 both commands are
faster than master. The small speed decrease on commit sized diffs is
unfortunate but I think it is small enough to be worth it for the
gains on larger diffs.

Large diff before this change:
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):      1.147 s ±  0.005 s    [User: 1.085 s, System: 0.059 s]
  Range (min … max):    1.140 s …  1.154 s    10 runs

Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):      1.048 s ±  0.005 s    [User: 987.4 ms, System: 58.8 ms]
  Range (min … max):    1.043 s …  1.056 s    10 runs

Large diff after this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):     762.7 ms ±   2.8 ms    [User: 707.5 ms, System: 53.7 ms]
  Range (min … max):   758.0 ms … 767.0 ms    10 runs

Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):     831.7 ms ±   1.7 ms    [User: 776.5 ms, System: 53.3 ms]
  Range (min … max):   829.2 ms … 835.1 ms    10 runs

Small diffs on master
Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0
  Time (mean ± σ):      1.567 s ±  0.001 s    [User: 1.443 s, System: 0.121 s]
  Range (min … max):    1.566 s …  1.571 s    10 runs

Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0
  Time (mean ± σ):      1.865 s ±  0.008 s    [User: 1.748 s, System: 0.112 s]
  Range (min … max):    1.857 s …  1.881 s    10 runs

Small diffs after this change
Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0
  Time (mean ± σ):      1.597 s ±  0.003 s    [User: 1.413 s, System: 0.179 s]
  Range (min … max):    1.591 s …  1.601 s    10 runs

Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0
  Time (mean ± σ):      1.606 s ±  0.006 s    [User: 1.420 s, System: 0.181 s]
  Range (min … max):    1.601 s …  1.622 s    10 runs

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
origin starts off pointing to somewhere within line, which is owned by
the caller. Later we might allocate a new string using xmemdupz() or
xstrfmt(). To avoid leaking these new strings, we introduce a to_free
pointer - which allows us to safely free the newly allocated string when
we're done (we cannot just free origin directly as it might still be
pointing to line).

LSAN output from t0090:

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    #1 0xa71f49 in do_xmalloc wrapper.c:41:8
    #2 0xa720b0 in do_xmallocz wrapper.c:75:8
    git-for-windows#3 0xa720b0 in xmallocz wrapper.c:83:9
    git-for-windows#4 0xa720b0 in xmemdupz wrapper.c:99:16
    git-for-windows#5 0x8092ba in handle_line fmt-merge-msg.c:187:23
    git-for-windows#6 0x8092ba in fmt_merge_msg fmt-merge-msg.c:666:7
    git-for-windows#7 0x5ce2e6 in prepare_merge_message builtin/merge.c:1119:2
    git-for-windows#8 0x5ce2e6 in collect_parents builtin/merge.c:1215:3
    git-for-windows#9 0x5c9c1e in cmd_merge builtin/merge.c:1454:16
    git-for-windows#10 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#11 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#12 0x4cb01c in run_argv git.c:818:4
    git-for-windows#13 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#14 0x6b3fad in main common-main.c:52:11
    git-for-windows#15 0x7fb929620349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
realpath is only populated if we execute the git_work_tree_initialized
block. However that block also causes us to return early, meaning we
never actually release the strbuf in the case where we populated it.
Therefore we move all strbuf related code into the block to guarantee
that we can't leak it.

LSAN output from t0095:

Direct leak of 129 byte(s) in 1 object(s) allocated from:
    #0 0x49a9b9 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0x78f585 in xrealloc wrapper.c:126:8
    #2 0x713ff4 in strbuf_grow strbuf.c:98:2
    git-for-windows#3 0x713ff4 in strbuf_getcwd strbuf.c:597:3
    git-for-windows#4 0x4f0c18 in strbuf_realpath_1 abspath.c:99:7
    git-for-windows#5 0x5ae4a4 in set_git_work_tree environment.c:259:3
    git-for-windows#6 0x6fdd8a in setup_discovered_git_dir setup.c:931:2
    git-for-windows#7 0x6fdd8a in setup_git_directory_gently setup.c:1235:12
    git-for-windows#8 0x4cb50d in get_bloom_filter_for_commit t/helper/test-bloom.c:41:2
    git-for-windows#9 0x4cb50d in cmd__bloom t/helper/test-bloom.c:95:3
    git-for-windows#10 0x4caa1f in cmd_main t/helper/test-tool.c:124:11
    git-for-windows#11 0x4caded in main common-main.c:52:11
    git-for-windows#12 0x7f0869f02349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 129 byte(s) leaked in 1 allocation(s).

It looks like this leak has existed since realpath was first added to
set_git_work_tree() in:
  3d7747e (real_path: remove unsafe API, 2020-03-10)

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
relative_url() populates sb. In the normal return path, its buffer is
detached using strbuf_detach(). However the early return path does
nothing with sb, which means that sb's memory is leaked - therefore
we add a release to avoid this leak.

The reset is also only necessary for the normal return path, hence we
move it down to after the early-return to avoid unnecessary work.

LSAN output from t0060:

Direct leak of 121 byte(s) in 1 object(s) allocated from:
    #0 0x7f31246f28b0 in realloc (/usr/lib64/libasan.so.4+0xdc8b0)
    #1 0x98d7d6 in xrealloc wrapper.c:126
    #2 0x909a60 in strbuf_grow strbuf.c:98
    git-for-windows#3 0x90bf00 in strbuf_vaddf strbuf.c:401
    git-for-windows#4 0x90c321 in strbuf_addf strbuf.c:335
    git-for-windows#5 0x5cb78d in relative_url builtin/submodule--helper.c:182
    git-for-windows#6 0x5cbe46 in resolve_relative_url_test builtin/submodule--helper.c:248
    git-for-windows#7 0x410dcd in run_builtin git.c:475
    git-for-windows#8 0x410dcd in handle_builtin git.c:729
    git-for-windows#9 0x414087 in run_argv git.c:818
    git-for-windows#10 0x414087 in cmd_main git.c:949
    git-for-windows#11 0x40e9ec in main common-main.c:52
    git-for-windows#12 0x7f3123c41349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 121 byte(s) leaked in 1 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
cmd_for_each_repo() copies argv into args (a strvec), which is later
passed into run_command_on_repo(), which in turn copies that strvec onto
the end of child.args. The initial copy is unnecessary (we never modify
args). We therefore choose to just pass argv directly into
run_command_on_repo(), which lets us avoid the copy and fixes the leak.

LSAN output from t0068:

Direct leak of 192 byte(s) in 1 object(s) allocated from:
    #0 0x7f63bd4ab8b0 in realloc (/usr/lib64/libasan.so.4+0xdc8b0)
    #1 0x98d7e6 in xrealloc wrapper.c:126
    #2 0x916914 in strvec_push_nodup strvec.c:19
    git-for-windows#3 0x916a6e in strvec_push strvec.c:26
    git-for-windows#4 0x4be4eb in cmd_for_each_repo builtin/for-each-repo.c:49
    git-for-windows#5 0x410dcd in run_builtin git.c:475
    git-for-windows#6 0x410dcd in handle_builtin git.c:729
    git-for-windows#7 0x414087 in run_argv git.c:818
    git-for-windows#8 0x414087 in cmd_main git.c:949
    git-for-windows#9 0x40e9ec in main common-main.c:52
    git-for-windows#10 0x7f63bc9fa349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 22 byte(s) in 2 object(s) allocated from:
    #0 0x7f63bd445e30 in __interceptor_strdup (/usr/lib64/libasan.so.4+0x76e30)
    #1 0x98d698 in xstrdup wrapper.c:29
    #2 0x916a63 in strvec_push strvec.c:26
    git-for-windows#3 0x4be4eb in cmd_for_each_repo builtin/for-each-repo.c:49
    git-for-windows#4 0x410dcd in run_builtin git.c:475
    git-for-windows#5 0x410dcd in handle_builtin git.c:729
    git-for-windows#6 0x414087 in run_argv git.c:818
    git-for-windows#7 0x414087 in cmd_main git.c:949
    git-for-windows#8 0x40e9ec in main common-main.c:52
    git-for-windows#9 0x7f63bc9fa349 in __libc_start_main (/lib64/libc.so.6+0x24349)

See also discussion about the original implementation below - this code
appears to have evolved from a callback explaining the double-strvec-copy
pattern, but there's no strong reason to keep that now:
  https://lore.kernel.org/git/68bbeca5-314b-08ee-ef36-040e3f3814e9@gmail.com/

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
old_dir/new_dir are free()'d at the end of update_dir_rename_counts,
however if we return early we'll never free those strings. Therefore
we should move all new allocations after the possible early return,
avoiding a leak.

This seems like a fairly recent leak, that started happening since the
early-return was added in:
  1ad69eb (diffcore-rename: compute dir_rename_counts in stages, 2021-02-27)

LSAN output from t0022:

Direct leak of 7 byte(s) in 1 object(s) allocated from:
    #0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0xa71e48 in xstrdup wrapper.c:29:14
    #2 0x7db9c7 in update_dir_rename_counts diffcore-rename.c:464:12
    git-for-windows#3 0x7db6ae in find_renames diffcore-rename.c:1062:3
    git-for-windows#4 0x7d76c3 in diffcore_rename_extended diffcore-rename.c:1472:18
    git-for-windows#5 0x7b4cfc in diffcore_std diff.c:6705:4
    git-for-windows#6 0x855e46 in log_tree_diff_flush log-tree.c:846:2
    git-for-windows#7 0x856574 in log_tree_diff log-tree.c:955:3
    git-for-windows#8 0x856574 in log_tree_commit log-tree.c:986:10
    git-for-windows#9 0x9a9c67 in print_commit_summary sequencer.c:1329:7
    git-for-windows#10 0x52e623 in cmd_commit builtin/commit.c:1862:3
    git-for-windows#11 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#12 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#13 0x4cb01c in run_argv git.c:818:4
    git-for-windows#14 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#15 0x6b3f3d in main common-main.c:52:11
    git-for-windows#16 0x7fe397c7a349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 7 byte(s) in 1 object(s) allocated from:
    #0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0xa71e48 in xstrdup wrapper.c:29:14
    #2 0x7db9bc in update_dir_rename_counts diffcore-rename.c:463:12
    git-for-windows#3 0x7db6ae in find_renames diffcore-rename.c:1062:3
    git-for-windows#4 0x7d76c3 in diffcore_rename_extended diffcore-rename.c:1472:18
    git-for-windows#5 0x7b4cfc in diffcore_std diff.c:6705:4
    git-for-windows#6 0x855e46 in log_tree_diff_flush log-tree.c:846:2
    git-for-windows#7 0x856574 in log_tree_diff log-tree.c:955:3
    git-for-windows#8 0x856574 in log_tree_commit log-tree.c:986:10
    git-for-windows#9 0x9a9c67 in print_commit_summary sequencer.c:1329:7
    git-for-windows#10 0x52e623 in cmd_commit builtin/commit.c:1862:3
    git-for-windows#11 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#12 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#13 0x4cb01c in run_argv git.c:818:4
    git-for-windows#14 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#15 0x6b3f3d in main common-main.c:52:11
    git-for-windows#16 0x7fe397c7a349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 14 byte(s) leaked in 2 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
u.head is populated using resolve_refdup(), which returns a newly
allocated string - hence we also need to free() it.

Found while running t0041 with LSAN:

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0xa8be98 in xstrdup wrapper.c:29:14
    #2 0x9481db in head_atom_parser ref-filter.c:549:17
    git-for-windows#3 0x9408c7 in parse_ref_filter_atom ref-filter.c:703:30
    git-for-windows#4 0x9400e3 in verify_ref_format ref-filter.c:974:8
    git-for-windows#5 0x4f9e8b in print_ref_list builtin/branch.c:439:6
    git-for-windows#6 0x4f9e8b in cmd_branch builtin/branch.c:757:3
    git-for-windows#7 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#8 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#9 0x4cb01c in run_argv git.c:818:4
    git-for-windows#10 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#11 0x6bdc2d in main common-main.c:52:11
    git-for-windows#12 0x7f96edf86349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 16 byte(s) leaked in 1 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
repo_diff_setup() calls through to diff.c's static prep_parse_options(),
which in  turn allocates a new array into diff_opts.parseopts.
diff_setup_done() is responsible for freeing that array, and has the
benefit of verifying diff_opts too - hence we add a call to
diff_setup_done() to avoid leaking parseopts.

Output from the leak as found while running t0090 with LSAN:

Direct leak of 7120 byte(s) in 1 object(s) allocated from:
    #0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    #1 0xa8bf89 in do_xmalloc wrapper.c:41:8
    #2 0x7a7bae in prep_parse_options diff.c:5636:2
    git-for-windows#3 0x7a7bae in repo_diff_setup diff.c:4611:2
    git-for-windows#4 0x93716c in repo_index_has_changes read-cache.c:2518:3
    git-for-windows#5 0x872233 in unclean merge-ort-wrappers.c:12:14
    git-for-windows#6 0x872233 in merge_ort_recursive merge-ort-wrappers.c:53:6
    git-for-windows#7 0x5d5b11 in try_merge_strategy builtin/merge.c:752:12
    git-for-windows#8 0x5d0b6b in cmd_merge builtin/merge.c:1666:9
    git-for-windows#9 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#10 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#11 0x4cb01c in run_argv git.c:818:4
    git-for-windows#12 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#13 0x6bdc2d in main common-main.c:52:11
    git-for-windows#14 0x7f551eb51349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 7120 byte(s) leaked in 1 allocation(s)

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
apply_multi_file_filter and async_query_available_blobs both query
subprocess output using subprocess_read_status, which writes data into
the identically named filter_status strbuf. We add a strbuf_release to
avoid leaking their contents.

Leak output seen when running t0021 with LSAN:

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa8c2b5 in xrealloc wrapper.c:126:8
    #2 0x9ff99d in strbuf_grow strbuf.c:98:2
    git-for-windows#3 0x9ff99d in strbuf_addbuf strbuf.c:304:2
    git-for-windows#4 0xa101d6 in subprocess_read_status sub-process.c:45:5
    git-for-windows#5 0x77793c in apply_multi_file_filter convert.c:886:8
    git-for-windows#6 0x77793c in apply_filter convert.c:1042:10
    git-for-windows#7 0x77a0b5 in convert_to_git_filter_fd convert.c:1492:7
    git-for-windows#8 0x8b48cd in index_stream_convert_blob object-file.c:2156:2
    git-for-windows#9 0x8b48cd in index_fd object-file.c:2248:9
    git-for-windows#10 0x597411 in hash_fd builtin/hash-object.c:43:9
    git-for-windows#11 0x596be1 in hash_object builtin/hash-object.c:59:2
    git-for-windows#12 0x596be1 in cmd_hash_object builtin/hash-object.c:153:3
    git-for-windows#13 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#14 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#15 0x4cb01c in run_argv git.c:818:4
    git-for-windows#16 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#17 0x6bdc2d in main common-main.c:52:11
    git-for-windows#18 0x7f42acf79349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 24 byte(s) leaked in 1 allocation(s).

Direct leak of 120 byte(s) in 5 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa8c295 in xrealloc wrapper.c:126:8
    #2 0x9ff97d in strbuf_grow strbuf.c:98:2
    git-for-windows#3 0x9ff97d in strbuf_addbuf strbuf.c:304:2
    git-for-windows#4 0xa101b6 in subprocess_read_status sub-process.c:45:5
    git-for-windows#5 0x775c73 in async_query_available_blobs convert.c:960:8
    git-for-windows#6 0x80029d in finish_delayed_checkout entry.c:183:9
    git-for-windows#7 0xa65d1e in check_updates unpack-trees.c:493:10
    git-for-windows#8 0xa5f469 in unpack_trees unpack-trees.c:1747:8
    git-for-windows#9 0x525971 in checkout builtin/clone.c:815:6
    git-for-windows#10 0x525971 in cmd_clone builtin/clone.c:1409:8
    git-for-windows#11 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#12 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#13 0x4cb01c in run_argv git.c:818:4
    git-for-windows#14 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#15 0x6bdc2d in main common-main.c:52:11
    git-for-windows#16 0x7fa253fce349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 120 byte(s) leaked in 5 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
These leaks all happen at the end of cmd_mv, hence don't matter in any
way. But we still fix the easy ones and squash the rest to get us closer
to being able to run tests without leaks.

LSAN output from t0050:

Direct leak of 384 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa8c015 in xrealloc wrapper.c:126:8
    #2 0xa0a7e1 in add_entry string-list.c:44:2
    git-for-windows#3 0xa0a7e1 in string_list_insert string-list.c:58:14
    git-for-windows#4 0x5dac03 in cmd_mv builtin/mv.c:248:4
    git-for-windows#5 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#6 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#7 0x4cb01c in run_argv git.c:818:4
    git-for-windows#8 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#9 0x6bd9ad in main common-main.c:52:11
    git-for-windows#10 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    #1 0xa8bd09 in do_xmalloc wrapper.c:41:8
    #2 0x5dbc34 in internal_prefix_pathspec builtin/mv.c:32:2
    git-for-windows#3 0x5da575 in cmd_mv builtin/mv.c:158:14
    git-for-windows#4 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#5 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#6 0x4cb01c in run_argv git.c:818:4
    git-for-windows#7 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#8 0x6bd9ad in main common-main.c:52:11
    git-for-windows#9 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    #1 0xa8bd09 in do_xmalloc wrapper.c:41:8
    #2 0x5dbc34 in internal_prefix_pathspec builtin/mv.c:32:2
    git-for-windows#3 0x5da4e4 in cmd_mv builtin/mv.c:148:11
    git-for-windows#4 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#5 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#6 0x4cb01c in run_argv git.c:818:4
    git-for-windows#7 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#8 0x6bd9ad in main common-main.c:52:11
    git-for-windows#9 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x49a9a2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
    #1 0xa8c119 in xcalloc wrapper.c:140:8
    #2 0x5da585 in cmd_mv builtin/mv.c:159:22
    git-for-windows#3 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#4 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#5 0x4cb01c in run_argv git.c:818:4
    git-for-windows#6 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#7 0x6bd9ad in main common-main.c:52:11
    git-for-windows#8 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x49a9a2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
    #1 0xa8c119 in xcalloc wrapper.c:140:8
    #2 0x5da4f8 in cmd_mv builtin/mv.c:149:10
    git-for-windows#3 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#4 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#5 0x4cb01c in run_argv git.c:818:4
    git-for-windows#6 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#7 0x6bd9ad in main common-main.c:52:11
    git-for-windows#8 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 65 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa8c015 in xrealloc wrapper.c:126:8
    #2 0xa00226 in strbuf_grow strbuf.c:98:2
    git-for-windows#3 0xa00226 in strbuf_vaddf strbuf.c:394:3
    git-for-windows#4 0xa065c7 in xstrvfmt strbuf.c:981:2
    git-for-windows#5 0xa065c7 in xstrfmt strbuf.c:991:8
    git-for-windows#6 0x9e7ce7 in prefix_path_gently setup.c:115:15
    git-for-windows#7 0x9e7fa6 in prefix_path setup.c:128:12
    git-for-windows#8 0x5dbdbf in internal_prefix_pathspec builtin/mv.c:55:23
    git-for-windows#9 0x5da575 in cmd_mv builtin/mv.c:158:14
    git-for-windows#10 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#11 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#12 0x4cb01c in run_argv git.c:818:4
    git-for-windows#13 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#14 0x6bd9ad in main common-main.c:52:11
    git-for-windows#15 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 65 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa8c015 in xrealloc wrapper.c:126:8
    #2 0xa00226 in strbuf_grow strbuf.c:98:2
    git-for-windows#3 0xa00226 in strbuf_vaddf strbuf.c:394:3
    git-for-windows#4 0xa065c7 in xstrvfmt strbuf.c:981:2
    git-for-windows#5 0xa065c7 in xstrfmt strbuf.c:991:8
    git-for-windows#6 0x9e7ce7 in prefix_path_gently setup.c:115:15
    git-for-windows#7 0x9e7fa6 in prefix_path setup.c:128:12
    git-for-windows#8 0x5dbdbf in internal_prefix_pathspec builtin/mv.c:55:23
    git-for-windows#9 0x5da4e4 in cmd_mv builtin/mv.c:148:11
    git-for-windows#10 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#11 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#12 0x4cb01c in run_argv git.c:818:4
    git-for-windows#13 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#14 0x6bd9ad in main common-main.c:52:11
    git-for-windows#15 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 558 byte(s) leaked in 7 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
merge_name() calls dwim_ref(), which allocates a new string into
found_ref. Therefore add a free() to avoid leaking found_ref.

LSAN output from t0021:

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0xa8beb8 in xstrdup wrapper.c:29:14
    #2 0x954054 in expand_ref refs.c:671:12
    git-for-windows#3 0x953cb6 in repo_dwim_ref refs.c:644:22
    git-for-windows#4 0x5d3759 in dwim_ref refs.h:162:9
    git-for-windows#5 0x5d3759 in merge_name builtin/merge.c:517:6
    git-for-windows#6 0x5d3759 in collect_parents builtin/merge.c:1214:5
    git-for-windows#7 0x5cf60d in cmd_merge builtin/merge.c:1458:16
    git-for-windows#8 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#9 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#10 0x4cb01c in run_argv git.c:818:4
    git-for-windows#11 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#12 0x6bdbfd in main common-main.c:52:11
    git-for-windows#13 0x7f0430502349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 16 byte(s) leaked in 1 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
- cmd_rebase populates rebase_options.strategy with newly allocated
  strings, hence we need to free those strings at the end of cmd_rebase
  to avoid a leak.
- In some cases: get_replay_opts() is called, which prepares replay_opts
  using data from rebase_options. We used to simply copy the pointer
  from rebase_options.strategy,  however that would now result in a
  double-free because sequencer_remove_state() is eventually used to
  free replay_opts.strategy. To avoid this we xstrdup() strategy when
  adding it to replay_opts.

The original leak happens because we always populate
rebase_options.strategy, but we don't always enter the path that calls
get_replay_opts() and later sequencer_remove_state() - in  other words
we'd always allocate a new string into rebase_options.strategy but
only sometimes did we free it. We now make sure that rebase_options
and replay_opts both own their own copies of strategy, and each copy
is free'd independently.

This was first seen when running t0021 with LSAN, but t2012 helped catch
the fact that we can't just free(options.strategy) at the end of
cmd_rebase (as that can cause a double-free). LSAN output from t0021:

LSAN output from t0021:

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0xa71eb8 in xstrdup wrapper.c:29:14
    #2 0x61b1cc in cmd_rebase builtin/rebase.c:1779:22
    git-for-windows#3 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#4 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#5 0x4cb01c in run_argv git.c:818:4
    git-for-windows#6 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#7 0x6b3fad in main common-main.c:52:11
    git-for-windows#8 0x7f267b512349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
setup_unpack_trees_porcelain() populates various fields on
unpack_tree_opts, we need to call clear_unpack_trees_porcelain() to
avoid leaking them. Specifically, we used to leak
unpack_tree_opts.msgs_to_free.

We have to do this in leave_reset_head because there are multiple
scenarios where unpack_tree_opts has already been configured, followed
by a 'goto leave_reset_head'. But we can also 'goto leave_reset_head'
prior to having initialised unpack_tree_opts via memset(..., 0, ...).
Therefore we also move unpack_tree_opts initialisation to the start of
reset_head(), and convert it to use brace initialisation - which
guarantees that we can never clear an uninitialised unpack_tree_opts.
clear_unpack_tree_opts() is always safe to call as long as
unpack_tree_opts is at least zero-initialised, i.e. it does not depend
on a previous call to setup_unpack_trees_porcelain().

LSAN output from t0021:

Direct leak of 192 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa721e5 in xrealloc wrapper.c:126:8
    #2 0x9f7861 in strvec_push_nodup strvec.c:19:2
    git-for-windows#3 0x9f7861 in strvec_pushf strvec.c:39:2
    git-for-windows#4 0xa43e14 in setup_unpack_trees_porcelain unpack-trees.c:129:3
    git-for-windows#5 0x97e011 in reset_head reset.c:53:2
    git-for-windows#6 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
    git-for-windows#7 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#8 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#9 0x4cb01c in run_argv git.c:818:4
    git-for-windows#10 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#11 0x6b3f3d in main common-main.c:52:11
    git-for-windows#12 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 147 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa721e5 in xrealloc wrapper.c:126:8
    #2 0x9e8d54 in strbuf_grow strbuf.c:98:2
    git-for-windows#3 0x9e8d54 in strbuf_vaddf strbuf.c:401:3
    git-for-windows#4 0x9f7774 in strvec_pushf strvec.c:36:2
    git-for-windows#5 0xa43e14 in setup_unpack_trees_porcelain unpack-trees.c:129:3
    git-for-windows#6 0x97e011 in reset_head reset.c:53:2
    git-for-windows#7 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
    git-for-windows#8 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#9 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#10 0x4cb01c in run_argv git.c:818:4
    git-for-windows#11 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#12 0x6b3f3d in main common-main.c:52:11
    git-for-windows#13 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 134 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa721e5 in xrealloc wrapper.c:126:8
    #2 0x9e8d54 in strbuf_grow strbuf.c:98:2
    git-for-windows#3 0x9e8d54 in strbuf_vaddf strbuf.c:401:3
    git-for-windows#4 0x9f7774 in strvec_pushf strvec.c:36:2
    git-for-windows#5 0xa43fe4 in setup_unpack_trees_porcelain unpack-trees.c:168:3
    git-for-windows#6 0x97e011 in reset_head reset.c:53:2
    git-for-windows#7 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
    git-for-windows#8 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#9 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#10 0x4cb01c in run_argv git.c:818:4
    git-for-windows#11 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#12 0x6b3f3d in main common-main.c:52:11
    git-for-windows#13 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 130 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa721e5 in xrealloc wrapper.c:126:8
    #2 0x9e8d54 in strbuf_grow strbuf.c:98:2
    git-for-windows#3 0x9e8d54 in strbuf_vaddf strbuf.c:401:3
    git-for-windows#4 0x9f7774 in strvec_pushf strvec.c:36:2
    git-for-windows#5 0xa43f20 in setup_unpack_trees_porcelain unpack-trees.c:150:3
    git-for-windows#6 0x97e011 in reset_head reset.c:53:2
    git-for-windows#7 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
    git-for-windows#8 0x4ce83e in run_builtin git.c:475:11
    git-for-windows#9 0x4ccafe in handle_builtin git.c:729:3
    git-for-windows#10 0x4cb01c in run_argv git.c:818:4
    git-for-windows#11 0x4cb01c in cmd_main git.c:949:19
    git-for-windows#12 0x6b3f3d in main common-main.c:52:11
    git-for-windows#13 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 603 byte(s) leaked in 4 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
When doing reference negotiation, git-fetch-pack(1) is loading all refs
from disk in order to determine which commits it has in common with the
remote repository. This can be quite expensive in repositories with many
references though: in a real-world repository with around 2.2 million
refs, fetching a single commit by its ID takes around 44 seconds.

Dominating the loading time is decompression and parsing of the objects
which are referenced by commits. Given the fact that we only care about
commits (or tags which can be peeled to one) in this context, there is
thus an easy performance win by switching the parsing logic to make use
of the commit graph in case we have one available. Like this, we avoid
hitting the object database to parse these commits but instead only load
them from the commit-graph. This results in a significant performance
boost when executing git-fetch in said repository with 2.2 million refs:

    Benchmark #1: HEAD~: git fetch $remote $commit
      Time (mean ± σ):     44.168 s ±  0.341 s    [User: 42.985 s, System: 1.106 s]
      Range (min … max):   43.565 s … 44.577 s    10 runs

    Benchmark #2: HEAD: git fetch $remote $commit
      Time (mean ± σ):     19.498 s ±  0.724 s    [User: 18.751 s, System: 0.690 s]
      Range (min … max):   18.629 s … 20.454 s    10 runs

    Summary
      'HEAD: git fetch $remote $commit' ran
        2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
In order to compute whether objects reachable from a set of tips are all
connected, we do a revision walk with these tips as positive references
and `--not --all`. `--not --all` will cause the revision walk to load
all preexisting references as uninteresting, which can be very expensive
in repositories with many references.

Benchmarking the git-rev-list(1) command highlights that by far the most
expensive single phase is initial sorting of the input revisions: after
all references have been loaded, we first sort commits by author date.
In a real-world repository with about 2.2 million references, it makes
up about 40% of the total runtime of git-rev-list(1).

Ultimately, the connectivity check shouldn't really bother about the
order of input revisions at all. We only care whether we can actually
walk all objects until we hit the cut-off point. So sorting the input is
a complete waste of time.

Introduce a new "--unsorted-input" flag to git-rev-list(1) which will
cause it to not sort the commits and adjust the connectivity check to
always pass the flag. This results in the following speedups, executed
in a clone of gitlab-org/gitlab [1]:

    Benchmark #1: git rev-list  --objects --quiet --not --all --not $(cat newrev)
      Time (mean ± σ):      7.639 s ±  0.065 s    [User: 7.304 s, System: 0.335 s]
      Range (min … max):    7.543 s …  7.742 s    10 runs

    Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      4.995 s ±  0.044 s    [User: 4.657 s, System: 0.337 s]
      Range (min … max):    4.909 s …  5.048 s    10 runs

    Summary
      'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran
        1.53 ± 0.02 times faster than 'git rev-list  --objects --quiet --not --all --not $newrev'

[1]: https://gitlab.com/gitlab-org/gitlab.git. Note that not all refs
     are visible to clients.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
When queueing up references for the revision walk, `handle_one_ref()`
will resolve the reference's object ID via `get_reference()` and then
queue the ID as pending object via `add_pending_oid()`. But given that
`add_pending_oid()` is only a thin wrapper around `add_pending_object()`
which fist calls `get_reference()`, we effectively resolve the reference
twice and thus duplicate some of the work.

Fix the issue by instead calling `add_pending_object()` directly, which
takes the already-resolved object as input. In a repository with lots of
refs, this translates into a near 10% speedup:

    Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      5.015 s ±  0.038 s    [User: 4.698 s, System: 0.316 s]
      Range (min … max):    4.970 s …  5.089 s    10 runs

    Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      4.606 s ±  0.029 s    [User: 4.260 s, System: 0.345 s]
      Range (min … max):    4.565 s …  4.657 s    10 runs

    Summary
      'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran
        1.09 ± 0.01 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Sep 12, 2021
When queueing references in git-rev-list(1), we try to optimize parsing
of commits via the commit-graph. To do so, we first look up the object's
type, and if it is a commit we call `repo_parse_commit()` instead of
`parse_object()`. This is quite inefficient though given that we're
always uncompressing the object header in order to determine the type.
Instead, we can opportunistically search the commit-graph for the object
ID: in case it's found, we know it's a commit and can directly fill in
the commit object without having to uncompress the object header.

Expose a new function `lookup_commit_in_graph()`, which tries to find a
commit in the commit-graph by ID, and convert `get_reference()` to use
this function. This provides a big performance win in cases where we
load references in a repository with lots of references pointing to
commits. The following has been executed in a real-world repository with
about 2.2 million refs:

    Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      4.458 s ±  0.044 s    [User: 4.115 s, System: 0.342 s]
      Range (min … max):    4.409 s …  4.534 s    10 runs

    Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      3.089 s ±  0.015 s    [User: 2.768 s, System: 0.321 s]
      Range (min … max):    3.061 s …  3.105 s    10 runs

    Summary
      'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran
        1.44 ± 0.02 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Jan 8, 2022
The function get_categories() is invoked in a loop over all commands.
As it runs several processes, this takes an awful lot of time on
Windows. To reduce the number of processes, move the process that
filters empty lines to the other invoker of the function, where it is
needed. The invocation of get_categories() in the loop does not need
the empty line filtered away because the result is word-split by the
shell, which eliminates the empty line automatically.

Furthermore, use sort -u instead of sort | uniq to remove yet another
process.

[Ævar: on Linux this seems to speed things up a bit, although with
hyperfine(1) the results are fuzzy enough to land within the
confidence interval]:

$ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
$ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt'
Benchmark #1: sh generate-cmdlist.sh command-list.txt
  Time (mean ± σ):     371.3 ms ±  64.2 ms    [User: 430.4 ms, System: 72.5 ms]
  Range (min … max):   320.5 ms … 517.7 ms    10 runs

Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
  Time (mean ± σ):     489.9 ms ± 185.4 ms    [User: 724.7 ms, System: 141.3 ms]
  Range (min … max):   346.0 ms … 885.3 ms    10 runs

Summary
  'sh generate-cmdlist.sh command-list.txt' ran
    1.32 ± 0.55 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Jan 8, 2022
Change the "define_categories()" and "define_category_names()" functions
to take the already-parsed output of "category_list()" as an argument,
which brings our number of passes over "command-list.txt" from three
to two.

Then have "category_list()" itself take the output of "command_list()"
as an argument, bringing the number of times we parse the file to one.

Compared to the pre-image this speeds us up quite a bit:

    $ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
    $ hyperfine --warmup 10 -L v ,.old 'sh generate-cmdlist.sh{v} command-list.txt'
    Benchmark #1: sh generate-cmdlist.sh command-list.txt
      Time (mean ± σ):      22.9 ms ±   0.3 ms    [User: 15.8 ms, System: 9.6 ms]
      Range (min … max):    22.5 ms …  24.0 ms    125 runs

    Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
      Time (mean ± σ):      30.1 ms ±   0.4 ms    [User: 24.4 ms, System: 17.5 ms]
      Range (min … max):    29.5 ms …  32.3 ms    96 runs

    Summary
      'sh generate-cmdlist.sh command-list.txt' ran
        1.32 ± 0.02 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Mar 26, 2022
When fetching packfiles, we write a bunch of lockfiles for the packfiles
we're writing into the repository. In order to not leave behind any
cruft in case we exit or receive a signal, we register both an exit
handler as well as signal handlers for common signals like SIGINT. These
handlers will then unlink the locks and free the data structure tracking
them. We have observed a deadlock in this logic though:

    (gdb) bt
    #0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
    #1  0x00007f4932bea2cd in _int_free (av=0x7f4932f2eb20 <main_arena>, p=0x3e3e4200, have_lock=0) at malloc.c:3969
    #2  0x00007f4932bee58c in __GI___libc_free (mem=<optimized out>) at malloc.c:2975
    git-for-windows#3  0x0000000000662ab1 in string_list_clear ()
    git-for-windows#4  0x000000000044f5bc in unlock_pack_on_signal ()
    git-for-windows#5  <signal handler called>
    git-for-windows#6  _int_free (av=0x7f4932f2eb20 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:4024
    git-for-windows#7  0x00007f4932bee58c in __GI___libc_free (mem=<optimized out>) at malloc.c:2975
    git-for-windows#8  0x000000000065afd5 in strbuf_release ()
    git-for-windows#9  0x000000000066ddb9 in delete_tempfile ()
    git-for-windows#10 0x0000000000610d0b in files_transaction_cleanup.isra ()
    git-for-windows#11 0x0000000000611718 in files_transaction_abort ()
    git-for-windows#12 0x000000000060d2ef in ref_transaction_abort ()
    git-for-windows#13 0x000000000060d441 in ref_transaction_prepare ()
    git-for-windows#14 0x000000000060e0b5 in ref_transaction_commit ()
    git-for-windows#15 0x00000000004511c2 in fetch_and_consume_refs ()
    git-for-windows#16 0x000000000045279a in cmd_fetch ()
    git-for-windows#17 0x0000000000407c48 in handle_builtin ()
    git-for-windows#18 0x0000000000408df2 in cmd_main ()
    git-for-windows#19 0x00000000004078b5 in main ()

The process was killed with a signal, which caused the signal handler to
kick in and try free the data structures after we have unlinked the
locks. It then deadlocks while calling free(3P).

The root cause of this is that it is not allowed to call certain
functions in async-signal handlers, as specified by signal-safety(7).
Next to most I/O functions, this list of disallowed functions also
includes memory-handling functions like malloc(3P) and free(3P) because
they may not be reentrant. As a result, if we execute such functions in
the signal handler, then they may operate on inconistent state and fail
in unexpected ways.

Fix this bug by not calling non-async-signal-safe functions when running
in the signal handler. We're about to re-raise the signal anyway and
will thus exit, so it's not much of a problem to keep the string list of
lockfiles untouched. Note that it's fine though to call unlink(2), so
we'll still clean up the lockfiles correctly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Mar 26, 2022
Add "fast_unwind_on_malloc=0" to LSAN_OPTIONS to get more meaningful
stack traces from LSAN. This isn't required under ASAN which will emit
traces such as this one for a leak in "t/t0006-date.sh":

    $ ASAN_OPTIONS=detect_leaks=1 ./t0006-date.sh -vixd
    [...]
    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x488b94 in strdup (t/helper/test-tool+0x488b94)
        #1 0x9444a4 in xstrdup wrapper.c:29:14
        #2 0x5995fa in parse_date_format date.c:991:24
        git-for-windows#3 0x4d2056 in show_dates t/helper/test-date.c:39:2
        git-for-windows#4 0x4d174a in cmd__date t/helper/test-date.c:116:3
        git-for-windows#5 0x4cce89 in cmd_main t/helper/test-tool.c:127:11
        git-for-windows#6 0x4cd1e3 in main common-main.c:52:11
        git-for-windows#7 0x7fef3c695e49 in __libc_start_main csu/../csu/libc-start.c:314:16
        git-for-windows#8 0x422b09 in _start (t/helper/test-tool+0x422b09)

    SUMMARY: AddressSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

Whereas LSAN would emit this instead:

    $ ./t0006-date.sh -vixd
    [...]
    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x4323b8 in malloc (t/helper/test-tool+0x4323b8)
        #1 0x7f2be1d614aa in strdup string/strdup.c:42:15

    SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

Now we'll instead git this sensible stack trace under
LSAN. I.e. almost the same one (but starting with "malloc", as is
usual for LSAN) as under ASAN:

    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x4323b8 in malloc (t/helper/test-tool+0x4323b8)
        #1 0x7f012af5c4aa in strdup string/strdup.c:42:15
        #2 0x5cb164 in xstrdup wrapper.c:29:14
        git-for-windows#3 0x495ee9 in parse_date_format date.c:991:24
        git-for-windows#4 0x453aac in show_dates t/helper/test-date.c:39:2
        git-for-windows#5 0x453782 in cmd__date t/helper/test-date.c:116:3
        git-for-windows#6 0x451d95 in cmd_main t/helper/test-tool.c:127:11
        git-for-windows#7 0x451f1e in main common-main.c:52:11
        git-for-windows#8 0x7f012aef5e49 in __libc_start_main csu/../csu/libc-start.c:314:16
        git-for-windows#9 0x42e0a9 in _start (t/helper/test-tool+0x42e0a9)

    SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

As the option name suggests this does make things slower, e.g. for
t0001-init.sh we're around 10% slower:

    $ hyperfine -L v 0,1 'LSAN_OPTIONS=fast_unwind_on_malloc={v} make T=t0001-init.sh' -r 3
    Benchmark 1: LSAN_OPTIONS=fast_unwind_on_malloc=0 make T=t0001-init.sh
      Time (mean ± σ):      2.135 s ±  0.015 s    [User: 1.951 s, System: 0.554 s]
      Range (min … max):    2.122 s …  2.152 s    3 runs

    Benchmark 2: LSAN_OPTIONS=fast_unwind_on_malloc=1 make T=t0001-init.sh
      Time (mean ± σ):      1.981 s ±  0.055 s    [User: 1.769 s, System: 0.488 s]
      Range (min … max):    1.941 s …  2.044 s    3 runs

    Summary
      'LSAN_OPTIONS=fast_unwind_on_malloc=1 make T=t0001-init.sh' ran
        1.08 ± 0.03 times faster than 'LSAN_OPTIONS=fast_unwind_on_malloc=0 make T=t0001-init.sh'

I think that's more than worth it to get the more meaningful stack
traces, we can always provide LSAN_OPTIONS=fast_unwind_on_malloc=0 for
one-off "fast" runs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Mar 26, 2022
Add a failing test which demonstrates a regression in
a18d66c ("diff.c: free "buf" in diff_words_flush()", 2022-03-04),
the regression is discussed in detail in the subsequent commit. With
it running `git show --word-diff --color-moved` with SANITIZE=address
would emit:

	==31191==ERROR: AddressSanitizer: attempting double-free on 0x617000021100 in thread T0:
	    #0 0x49f0a2 in free (git+0x49f0a2)
	    #1 0x9b0e4d in diff_words_flush diff.c:2153:3
	    #2 0x9aed5d in fn_out_consume diff.c:2354:3
	    git-for-windows#3 0xe092ab in consume_one xdiff-interface.c:43:9
	    git-for-windows#4 0xe072eb in xdiff_outf xdiff-interface.c:76:10
	    git-for-windows#5 0xec7014 in xdl_emit_diffrec xdiff/xutils.c:53:6
	    [...]

	0x617000021100 is located 0 bytes inside of 768-byte region [0x617000021100,0x617000021400)
	freed by thread T0 here:
	    #0 0x49f0a2 in free (git+0x49f0a2)
	    [...(same stacktrace)...]

	previously allocated by thread T0 here:
	    #0 0x49f603 in __interceptor_realloc (git+0x49f603)
	    #1 0xde4da4 in xrealloc wrapper.c:126:8
	    #2 0x995dc5 in append_emitted_diff_symbol diff.c:794:2
	    git-for-windows#3 0x96c44a in emit_diff_symbol diff.c:1527:3
	    [...]

This was not caught by the test suite because we test `diff
--word-diff --color-moved` only so far.

Therefore, add a test for `show`, too.

Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Feb 20, 2023
When "read_strategy_opts()" is called we may have populated the
"opts->strategy" before, so we'll need to free() it to avoid leaking
memory.

We populate it before because we cal get_replay_opts() from within
"rebase.c" with an already populated "opts", which we then copy. Then
if we're doing a "rebase -i" the sequencer API itself will promptly
clobber our alloc'd version of it with its own.

If this code is changed to do, instead of the added free() here a:

	if (opts->strategy)
		opts->strategy = xstrdup("another leak");

We get a couple of stacktraces from -fsanitize=leak showing how we
ended up clobbering the already allocated value, i.e.:

	Direct leak of 6 byte(s) in 1 object(s) allocated from:
	    #0 0x7f2e8cd45545 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
	    #1 0x7f2e8cb0fcaa in __GI___strdup string/strdup.c:42
	    #2 0x6c4778 in xstrdup wrapper.c:39
	    git-for-windows#3 0x66bcb8 in read_strategy_opts sequencer.c:2902
	    git-for-windows#4 0x66bf7b in read_populate_opts sequencer.c:2969
	    git-for-windows#5 0x6723f9 in sequencer_continue sequencer.c:5063
	    git-for-windows#6 0x4a4f74 in run_sequencer_rebase builtin/rebase.c:348
	    git-for-windows#7 0x4a64c8 in run_specific_rebase builtin/rebase.c:753
	    git-for-windows#8 0x4a9b8b in cmd_rebase builtin/rebase.c:1824
	    git-for-windows#9 0x407a32 in run_builtin git.c:466
	    git-for-windows#10 0x407e0a in handle_builtin git.c:721
	    git-for-windows#11 0x40803d in run_argv git.c:788
	    git-for-windows#12 0x40850f in cmd_main git.c:923
	    git-for-windows#13 0x4eee79 in main common-main.c:57
	    git-for-windows#14 0x7f2e8ca9f209 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
	    git-for-windows#15 0x7f2e8ca9f2bb in __libc_start_main_impl ../csu/libc-start.c:389
	    git-for-windows#16 0x405fd0 in _start (git+0x405fd0)

	Direct leak of 4 byte(s) in 1 object(s) allocated from:
	    #0 0x7f2e8cd45545 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
	    #1 0x7f2e8cb0fcaa in __GI___strdup string/strdup.c:42
	    #2 0x6c4778 in xstrdup wrapper.c:39
	    git-for-windows#3 0x4a3c31 in xstrdup_or_null git-compat-util.h:1169
	    git-for-windows#4 0x4a447a in get_replay_opts builtin/rebase.c:163
	    git-for-windows#5 0x4a4f5b in run_sequencer_rebase builtin/rebase.c:346
	    git-for-windows#6 0x4a64c8 in run_specific_rebase builtin/rebase.c:753
	    git-for-windows#7 0x4a9b8b in cmd_rebase builtin/rebase.c:1824
	    git-for-windows#8 0x407a32 in run_builtin git.c:466
	    git-for-windows#9 0x407e0a in handle_builtin git.c:721
	    git-for-windows#10 0x40803d in run_argv git.c:788
	    git-for-windows#11 0x40850f in cmd_main git.c:923
	    git-for-windows#12 0x4eee79 in main common-main.c:57
	    git-for-windows#13 0x7f2e8ca9f209 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
	    git-for-windows#14 0x7f2e8ca9f2bb in __libc_start_main_impl ../csu/libc-start.c:389
	    git-for-windows#15 0x405fd0 in _start (git+0x405fd0)

This can be seen in e.g. the 4th test of
"t3404-rebase-interactive.sh".

In the larger picture the ownership of the "struct replay_opts" is
quite a mess, e.g. in this case rebase.c's static "get_replay_opts()"
function partially creates it, but nothing in rebase.c will free()
it. The structure is "mostly owned" by the sequencer API, but it also
expects to get these partially populated versions of it.

It would be better to have rebase keep track of what it allocated, and
free() that, and to pass that as a "const" to the sequencer API, which
would copy what it needs to its own version, and to free() that.

But doing so is a much larger change, and however messy the ownership
boundary is here is consistent with what we're doing already, so let's
just free() this to fix the leak.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
rimrul pushed a commit that referenced this pull request Feb 20, 2023
There is an out-of-bounds read possible when parsing gitattributes that
have an attribute that is 2^31+1 bytes long. This is caused due to an
integer overflow when we assign the result of strlen(3P) to an `int`,
where we use the wrapped-around value in a subsequent call to
memcpy(3P). The following code reproduces the issue:

    blob=$(perl -e 'print "a" x 2147483649 . " attr"' | git hash-object -w --stdin)
    git update-index --add --cacheinfo 100644,$blob,.gitattributes
    git check-attr --all file

    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==8451==ERROR: AddressSanitizer: SEGV on unknown address 0x7f93efa00800 (pc 0x7f94f1f8f082 bp 0x7ffddb59b3a0 sp 0x7ffddb59ab28 T0)
    ==8451==The signal is caused by a READ memory access.
        #0 0x7f94f1f8f082  (/usr/lib/libc.so.6+0x176082)
        #1 0x7f94f2047d9c in __interceptor_strspn /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:752
        #2 0x560e190f7f26 in parse_attr_line attr.c:375
        git-for-windows#3 0x560e190f9663 in handle_attr_line attr.c:660
        git-for-windows#4 0x560e190f9ddd in read_attr_from_index attr.c:769
        git-for-windows#5 0x560e190f9f14 in read_attr attr.c:797
        git-for-windows#6 0x560e190fa24e in bootstrap_attr_stack attr.c:867
        git-for-windows#7 0x560e190fa4a5 in prepare_attr_stack attr.c:902
        git-for-windows#8 0x560e190fb5dc in collect_some_attrs attr.c:1097
        git-for-windows#9 0x560e190fb93f in git_all_attrs attr.c:1128
        git-for-windows#10 0x560e18e6136e in check_attr builtin/check-attr.c:67
        git-for-windows#11 0x560e18e61c12 in cmd_check_attr builtin/check-attr.c:183
        git-for-windows#12 0x560e18e15993 in run_builtin git.c:466
        git-for-windows#13 0x560e18e16397 in handle_builtin git.c:721
        git-for-windows#14 0x560e18e16b2b in run_argv git.c:788
        git-for-windows#15 0x560e18e17991 in cmd_main git.c:926
        git-for-windows#16 0x560e190ae2bd in main common-main.c:57
        git-for-windows#17 0x7f94f1e3c28f  (/usr/lib/libc.so.6+0x2328f)
        git-for-windows#18 0x7f94f1e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        git-for-windows#19 0x560e18e110e4 in _start ../sysdeps/x86_64/start.S:115

    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV (/usr/lib/libc.so.6+0x176082)
    ==8451==ABORTING

Fix this bug by converting the variable to a `size_t` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Feb 20, 2023
It is possible to trigger an integer overflow when parsing attribute
names when there are more than 2^31 of them for a single pattern. This
can either lead to us dying due to trying to request too many bytes:

     blob=$(perl -e 'print "f" . " a=" x 2147483649' | git hash-object -w --stdin)
     git update-index --add --cacheinfo 100644,$blob,.gitattributes
     git attr-check --all file

    =================================================================
    ==1022==ERROR: AddressSanitizer: requested allocation size 0xfffffff800000032 (0xfffffff800001038 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0)
        #0 0x7fd3efabf411 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
        #1 0x5563a0a1e3d3 in xcalloc wrapper.c:150
        #2 0x5563a058d005 in parse_attr_line attr.c:384
        git-for-windows#3 0x5563a058e661 in handle_attr_line attr.c:660
        git-for-windows#4 0x5563a058eddb in read_attr_from_index attr.c:769
        git-for-windows#5 0x5563a058ef12 in read_attr attr.c:797
        git-for-windows#6 0x5563a058f24c in bootstrap_attr_stack attr.c:867
        git-for-windows#7 0x5563a058f4a3 in prepare_attr_stack attr.c:902
        git-for-windows#8 0x5563a05905da in collect_some_attrs attr.c:1097
        git-for-windows#9 0x5563a059093d in git_all_attrs attr.c:1128
        git-for-windows#10 0x5563a02f636e in check_attr builtin/check-attr.c:67
        git-for-windows#11 0x5563a02f6c12 in cmd_check_attr builtin/check-attr.c:183
        git-for-windows#12 0x5563a02aa993 in run_builtin git.c:466
        git-for-windows#13 0x5563a02ab397 in handle_builtin git.c:721
        git-for-windows#14 0x5563a02abb2b in run_argv git.c:788
        git-for-windows#15 0x5563a02ac991 in cmd_main git.c:926
        git-for-windows#16 0x5563a05432bd in main common-main.c:57
        git-for-windows#17 0x7fd3ef82228f  (/usr/lib/libc.so.6+0x2328f)

    ==1022==HINT: if you don't care about these errors you may set allocator_may_return_null=1
    SUMMARY: AddressSanitizer: allocation-size-too-big /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77 in __interceptor_calloc
    ==1022==ABORTING

Or, much worse, it can lead to an out-of-bounds write because we
underallocate and then memcpy(3P) into an array:

    perl -e '
        print "A " . "\rh="x2000000000;
        print "\rh="x2000000000;
        print "\rh="x294967294 . "\n"
    ' >.gitattributes
    git add .gitattributes
    git commit -am "evil attributes"

    $ git clone --quiet /path/to/repo
    =================================================================
    ==15062==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000002550 at pc 0x5555559884d5 bp 0x7fffffffbc60 sp 0x7fffffffbc58
    WRITE of size 8 at 0x602000002550 thread T0
        #0 0x5555559884d4 in parse_attr_line attr.c:393
        #1 0x5555559884d4 in handle_attr_line attr.c:660
        #2 0x555555988902 in read_attr_from_index attr.c:784
        git-for-windows#3 0x555555988902 in read_attr_from_index attr.c:747
        git-for-windows#4 0x555555988a1d in read_attr attr.c:800
        git-for-windows#5 0x555555989b0c in bootstrap_attr_stack attr.c:882
        git-for-windows#6 0x555555989b0c in prepare_attr_stack attr.c:917
        git-for-windows#7 0x555555989b0c in collect_some_attrs attr.c:1112
        git-for-windows#8 0x55555598b141 in git_check_attr attr.c:1126
        git-for-windows#9 0x555555a13004 in convert_attrs convert.c:1311
        git-for-windows#10 0x555555a95e04 in checkout_entry_ca entry.c:553
        git-for-windows#11 0x555555d58bf6 in checkout_entry entry.h:42
        git-for-windows#12 0x555555d58bf6 in check_updates unpack-trees.c:480
        git-for-windows#13 0x555555d5eb55 in unpack_trees unpack-trees.c:2040
        git-for-windows#14 0x555555785ab7 in checkout builtin/clone.c:724
        git-for-windows#15 0x555555785ab7 in cmd_clone builtin/clone.c:1384
        git-for-windows#16 0x55555572443c in run_builtin git.c:466
        git-for-windows#17 0x55555572443c in handle_builtin git.c:721
        git-for-windows#18 0x555555727872 in run_argv git.c:788
        git-for-windows#19 0x555555727872 in cmd_main git.c:926
        git-for-windows#20 0x555555721fa0 in main common-main.c:57
        git-for-windows#21 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308
        git-for-windows#22 0x555555723f39 in _start (git+0x1cff39)

    0x602000002552 is located 0 bytes to the right of 2-byte region [0x602000002550,0x602000002552) allocated by thread T0 here:
        #0 0x7ffff768c037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
        #1 0x555555d7fff7 in xcalloc wrapper.c:150
        #2 0x55555598815f in parse_attr_line attr.c:384
        git-for-windows#3 0x55555598815f in handle_attr_line attr.c:660
        git-for-windows#4 0x555555988902 in read_attr_from_index attr.c:784
        git-for-windows#5 0x555555988902 in read_attr_from_index attr.c:747
        git-for-windows#6 0x555555988a1d in read_attr attr.c:800
        git-for-windows#7 0x555555989b0c in bootstrap_attr_stack attr.c:882
        git-for-windows#8 0x555555989b0c in prepare_attr_stack attr.c:917
        git-for-windows#9 0x555555989b0c in collect_some_attrs attr.c:1112
        git-for-windows#10 0x55555598b141 in git_check_attr attr.c:1126
        git-for-windows#11 0x555555a13004 in convert_attrs convert.c:1311
        git-for-windows#12 0x555555a95e04 in checkout_entry_ca entry.c:553
        git-for-windows#13 0x555555d58bf6 in checkout_entry entry.h:42
        git-for-windows#14 0x555555d58bf6 in check_updates unpack-trees.c:480
        git-for-windows#15 0x555555d5eb55 in unpack_trees unpack-trees.c:2040
        git-for-windows#16 0x555555785ab7 in checkout builtin/clone.c:724
        git-for-windows#17 0x555555785ab7 in cmd_clone builtin/clone.c:1384
        git-for-windows#18 0x55555572443c in run_builtin git.c:466
        git-for-windows#19 0x55555572443c in handle_builtin git.c:721
        git-for-windows#20 0x555555727872 in run_argv git.c:788
        git-for-windows#21 0x555555727872 in cmd_main git.c:926
        git-for-windows#22 0x555555721fa0 in main common-main.c:57
        git-for-windows#23 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308

    SUMMARY: AddressSanitizer: heap-buffer-overflow attr.c:393 in parse_attr_line
    Shadow bytes around the buggy address:
      0x0c047fff8450: fa fa 00 02 fa fa 00 07 fa fa fd fd fa fa 00 00
      0x0c047fff8460: fa fa 02 fa fa fa fd fd fa fa 00 06 fa fa 05 fa
      0x0c047fff8470: fa fa fd fd fa fa 00 02 fa fa 06 fa fa fa 05 fa
      0x0c047fff8480: fa fa 07 fa fa fa fd fd fa fa 00 01 fa fa 00 02
      0x0c047fff8490: fa fa 00 03 fa fa 00 fa fa fa 00 01 fa fa 00 03
    =>0x0c047fff84a0: fa fa 00 01 fa fa 00 02 fa fa[02]fa fa fa fa fa
      0x0c047fff84b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
      Shadow gap:              cc
    ==15062==ABORTING

Fix this bug by using `size_t` instead to count the number of attributes
so that this value cannot reasonably overflow without running out of
memory before already.

Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Feb 20, 2023
When using a padding specifier in the pretty format passed to git-log(1)
we need to calculate the string length in several places. These string
lengths are stored in `int`s though, which means that these can easily
overflow when the input lengths exceeds 2GB. This can ultimately lead to
an out-of-bounds write when these are used in a call to memcpy(3P):

        ==8340==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1ec62f97fe at pc 0x7f2127e5f427 bp 0x7ffd3bd63de0 sp 0x7ffd3bd63588
    WRITE of size 1 at 0x7f1ec62f97fe thread T0
        #0 0x7f2127e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
        #1 0x5628e96aa605 in format_and_pad_commit pretty.c:1762
        #2 0x5628e96aa7f4 in format_commit_item pretty.c:1801
        git-for-windows#3 0x5628e97cdb24 in strbuf_expand strbuf.c:429
        git-for-windows#4 0x5628e96ab060 in repo_format_commit_message pretty.c:1869
        git-for-windows#5 0x5628e96acd0f in pretty_print_commit pretty.c:2161
        git-for-windows#6 0x5628e95a44c8 in show_log log-tree.c:781
        git-for-windows#7 0x5628e95a76ba in log_tree_commit log-tree.c:1117
        git-for-windows#8 0x5628e922bed5 in cmd_log_walk_no_free builtin/log.c:508
        git-for-windows#9 0x5628e922c35b in cmd_log_walk builtin/log.c:549
        git-for-windows#10 0x5628e922f1a2 in cmd_log builtin/log.c:883
        git-for-windows#11 0x5628e9106993 in run_builtin git.c:466
        git-for-windows#12 0x5628e9107397 in handle_builtin git.c:721
        git-for-windows#13 0x5628e9107b07 in run_argv git.c:788
        git-for-windows#14 0x5628e91088a7 in cmd_main git.c:923
        git-for-windows#15 0x5628e939d682 in main common-main.c:57
        git-for-windows#16 0x7f2127c3c28f  (/usr/lib/libc.so.6+0x2328f)
        git-for-windows#17 0x7f2127c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        git-for-windows#18 0x5628e91020e4 in _start ../sysdeps/x86_64/start.S:115

    0x7f1ec62f97fe is located 2 bytes to the left of 4831838265-byte region [0x7f1ec62f9800,0x7f1fe62f9839)
    allocated by thread T0 here:
        #0 0x7f2127ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
        #1 0x5628e98774d4 in xrealloc wrapper.c:136
        #2 0x5628e97cb01c in strbuf_grow strbuf.c:99
        git-for-windows#3 0x5628e97ccd42 in strbuf_addchars strbuf.c:327
        git-for-windows#4 0x5628e96aa55c in format_and_pad_commit pretty.c:1761
        git-for-windows#5 0x5628e96aa7f4 in format_commit_item pretty.c:1801
        git-for-windows#6 0x5628e97cdb24 in strbuf_expand strbuf.c:429
        git-for-windows#7 0x5628e96ab060 in repo_format_commit_message pretty.c:1869
        git-for-windows#8 0x5628e96acd0f in pretty_print_commit pretty.c:2161
        git-for-windows#9 0x5628e95a44c8 in show_log log-tree.c:781
        git-for-windows#10 0x5628e95a76ba in log_tree_commit log-tree.c:1117
        git-for-windows#11 0x5628e922bed5 in cmd_log_walk_no_free builtin/log.c:508
        git-for-windows#12 0x5628e922c35b in cmd_log_walk builtin/log.c:549
        git-for-windows#13 0x5628e922f1a2 in cmd_log builtin/log.c:883
        git-for-windows#14 0x5628e9106993 in run_builtin git.c:466
        git-for-windows#15 0x5628e9107397 in handle_builtin git.c:721
        git-for-windows#16 0x5628e9107b07 in run_argv git.c:788
        git-for-windows#17 0x5628e91088a7 in cmd_main git.c:923
        git-for-windows#18 0x5628e939d682 in main common-main.c:57
        git-for-windows#19 0x7f2127c3c28f  (/usr/lib/libc.so.6+0x2328f)
        git-for-windows#20 0x7f2127c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        git-for-windows#21 0x5628e91020e4 in _start ../sysdeps/x86_64/start.S:115

    SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy
    Shadow bytes around the buggy address:
      0x0fe458c572a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    =>0x0fe458c572f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
      0x0fe458c57300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==8340==ABORTING

The pretty format can also be used in `git archive` operations via the
`export-subst` attribute. So this is what in our opinion makes this a
critical issue in the context of Git forges which allow to download an
archive of user supplied Git repositories.

Fix this vulnerability by using `size_t` instead of `int` to track the
string lengths. Add tests which detect this vulnerability when Git is
compiled with the address sanitizer.

Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Original-patch-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Modified-by: Taylor  Blau <me@ttalorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Feb 20, 2023
With the `%>>(<N>)` pretty formatter, you can ask git-log(1) et al to
steal spaces. To do so we need to look ahead of the next token to see
whether there are spaces there. This loop takes into account ANSI
sequences that end with an `m`, and if it finds any it will skip them
until it finds the first space. While doing so it does not take into
account the buffer's limits though and easily does an out-of-bounds
read.

Add a test that hits this behaviour. While we don't have an easy way to
verify this, the test causes the following failure when run with
`SANITIZE=address`:

    ==37941==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000000baf at pc 0x55ba6f88e0d0 bp 0x7ffc84c50d20 sp 0x7ffc84c50d10
    READ of size 1 at 0x603000000baf thread T0
        #0 0x55ba6f88e0cf in format_and_pad_commit pretty.c:1712
        #1 0x55ba6f88e7b4 in format_commit_item pretty.c:1801
        #2 0x55ba6f9b1ae4 in strbuf_expand strbuf.c:429
        git-for-windows#3 0x55ba6f88f020 in repo_format_commit_message pretty.c:1869
        git-for-windows#4 0x55ba6f890ccf in pretty_print_commit pretty.c:2161
        git-for-windows#5 0x55ba6f7884c8 in show_log log-tree.c:781
        git-for-windows#6 0x55ba6f78b6ba in log_tree_commit log-tree.c:1117
        git-for-windows#7 0x55ba6f40fed5 in cmd_log_walk_no_free builtin/log.c:508
        git-for-windows#8 0x55ba6f41035b in cmd_log_walk builtin/log.c:549
        git-for-windows#9 0x55ba6f4131a2 in cmd_log builtin/log.c:883
        git-for-windows#10 0x55ba6f2ea993 in run_builtin git.c:466
        git-for-windows#11 0x55ba6f2eb397 in handle_builtin git.c:721
        git-for-windows#12 0x55ba6f2ebb07 in run_argv git.c:788
        git-for-windows#13 0x55ba6f2ec8a7 in cmd_main git.c:923
        git-for-windows#14 0x55ba6f581682 in main common-main.c:57
        git-for-windows#15 0x7f2d08c3c28f  (/usr/lib/libc.so.6+0x2328f)
        git-for-windows#16 0x7f2d08c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        git-for-windows#17 0x55ba6f2e60e4 in _start ../sysdeps/x86_64/start.S:115

    0x603000000baf is located 1 bytes to the left of 24-byte region [0x603000000bb0,0x603000000bc8)
    allocated by thread T0 here:
        #0 0x7f2d08ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
        #1 0x55ba6fa5b494 in xrealloc wrapper.c:136
        #2 0x55ba6f9aefdc in strbuf_grow strbuf.c:99
        git-for-windows#3 0x55ba6f9b0a06 in strbuf_add strbuf.c:298
        git-for-windows#4 0x55ba6f9b1a25 in strbuf_expand strbuf.c:418
        git-for-windows#5 0x55ba6f88f020 in repo_format_commit_message pretty.c:1869
        git-for-windows#6 0x55ba6f890ccf in pretty_print_commit pretty.c:2161
        git-for-windows#7 0x55ba6f7884c8 in show_log log-tree.c:781
        git-for-windows#8 0x55ba6f78b6ba in log_tree_commit log-tree.c:1117
        git-for-windows#9 0x55ba6f40fed5 in cmd_log_walk_no_free builtin/log.c:508
        git-for-windows#10 0x55ba6f41035b in cmd_log_walk builtin/log.c:549
        git-for-windows#11 0x55ba6f4131a2 in cmd_log builtin/log.c:883
        git-for-windows#12 0x55ba6f2ea993 in run_builtin git.c:466
        git-for-windows#13 0x55ba6f2eb397 in handle_builtin git.c:721
        git-for-windows#14 0x55ba6f2ebb07 in run_argv git.c:788
        git-for-windows#15 0x55ba6f2ec8a7 in cmd_main git.c:923
        git-for-windows#16 0x55ba6f581682 in main common-main.c:57
        git-for-windows#17 0x7f2d08c3c28f  (/usr/lib/libc.so.6+0x2328f)
        git-for-windows#18 0x7f2d08c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        git-for-windows#19 0x55ba6f2e60e4 in _start ../sysdeps/x86_64/start.S:115

    SUMMARY: AddressSanitizer: heap-buffer-overflow pretty.c:1712 in format_and_pad_commit
    Shadow bytes around the buggy address:
      0x0c067fff8120: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
      0x0c067fff8130: fd fd fa fa fd fd fd fd fa fa fd fd fd fa fa fa
      0x0c067fff8140: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
      0x0c067fff8150: fa fa fd fd fd fd fa fa 00 00 00 fa fa fa fd fd
      0x0c067fff8160: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
    =>0x0c067fff8170: fd fd fd fa fa[fa]00 00 00 fa fa fa 00 00 00 fa
      0x0c067fff8180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8190: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff81a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff81b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff81c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb

Luckily enough, this would only cause us to copy the out-of-bounds data
into the formatted commit in case we really had an ANSI sequence
preceding our buffer. So this bug likely has no security consequences.

Fix it regardless by not traversing past the buffer's start.

Reported-by: Patrick Steinhardt <ps@pks.im>
Reported-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Feb 20, 2023
An out-of-bounds read can be triggered when parsing an incomplete
padding format string passed via `--pretty=format` or in Git archives
when files are marked with the `export-subst` gitattribute.

This bug exists since we have introduced support for truncating output
via the `trunc` keyword a7f01c6 (pretty: support truncating in %>, %<
and %><, 2013-04-19). Before this commit, we used to find the end of the
formatting string by using strchr(3P). This function returns a `NULL`
pointer in case the character in question wasn't found. The subsequent
check whether any character was found thus simply checked the returned
pointer. After the commit we switched to strcspn(3P) though, which only
returns the offset to the first found character or to the trailing NUL
byte. As the end pointer is now computed by adding the offset to the
start pointer it won't be `NULL` anymore, and as a consequence the check
doesn't do anything anymore.

The out-of-bounds data that is being read can in fact end up in the
formatted string. As a consequence, it is possible to leak memory
contents either by calling git-log(1) or via git-archive(1) when any of
the archived files is marked with the `export-subst` gitattribute.

    ==10888==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000398 at pc 0x7f0356047cb2 bp 0x7fff3ffb95d0 sp 0x7fff3ffb8d78
    READ of size 1 at 0x602000000398 thread T0
        #0 0x7f0356047cb1 in __interceptor_strchrnul /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:725
        #1 0x563b7cec9a43 in strbuf_expand strbuf.c:417
        #2 0x563b7cda7060 in repo_format_commit_message pretty.c:1869
        git-for-windows#3 0x563b7cda8d0f in pretty_print_commit pretty.c:2161
        git-for-windows#4 0x563b7cca04c8 in show_log log-tree.c:781
        git-for-windows#5 0x563b7cca36ba in log_tree_commit log-tree.c:1117
        git-for-windows#6 0x563b7c927ed5 in cmd_log_walk_no_free builtin/log.c:508
        git-for-windows#7 0x563b7c92835b in cmd_log_walk builtin/log.c:549
        git-for-windows#8 0x563b7c92b1a2 in cmd_log builtin/log.c:883
        git-for-windows#9 0x563b7c802993 in run_builtin git.c:466
        git-for-windows#10 0x563b7c803397 in handle_builtin git.c:721
        git-for-windows#11 0x563b7c803b07 in run_argv git.c:788
        git-for-windows#12 0x563b7c8048a7 in cmd_main git.c:923
        git-for-windows#13 0x563b7ca99682 in main common-main.c:57
        git-for-windows#14 0x7f0355e3c28f  (/usr/lib/libc.so.6+0x2328f)
        git-for-windows#15 0x7f0355e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        git-for-windows#16 0x563b7c7fe0e4 in _start ../sysdeps/x86_64/start.S:115

    0x602000000398 is located 0 bytes to the right of 8-byte region [0x602000000390,0x602000000398)
    allocated by thread T0 here:
        #0 0x7f0356072faa in __interceptor_strdup /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:439
        #1 0x563b7cf7317c in xstrdup wrapper.c:39
        #2 0x563b7cd9a06a in save_user_format pretty.c:40
        git-for-windows#3 0x563b7cd9b3e5 in get_commit_format pretty.c:173
        git-for-windows#4 0x563b7ce54ea0 in handle_revision_opt revision.c:2456
        git-for-windows#5 0x563b7ce597c9 in setup_revisions revision.c:2850
        git-for-windows#6 0x563b7c9269e0 in cmd_log_init_finish builtin/log.c:269
        git-for-windows#7 0x563b7c927362 in cmd_log_init builtin/log.c:348
        git-for-windows#8 0x563b7c92b193 in cmd_log builtin/log.c:882
        git-for-windows#9 0x563b7c802993 in run_builtin git.c:466
        git-for-windows#10 0x563b7c803397 in handle_builtin git.c:721
        git-for-windows#11 0x563b7c803b07 in run_argv git.c:788
        git-for-windows#12 0x563b7c8048a7 in cmd_main git.c:923
        git-for-windows#13 0x563b7ca99682 in main common-main.c:57
        git-for-windows#14 0x7f0355e3c28f  (/usr/lib/libc.so.6+0x2328f)
        git-for-windows#15 0x7f0355e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        git-for-windows#16 0x563b7c7fe0e4 in _start ../sysdeps/x86_64/start.S:115

    SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:725 in __interceptor_strchrnul
    Shadow bytes around the buggy address:
      0x0c047fff8020: fa fa fd fd fa fa 00 06 fa fa 05 fa fa fa fd fd
      0x0c047fff8030: fa fa 00 02 fa fa 06 fa fa fa 05 fa fa fa fd fd
      0x0c047fff8040: fa fa 00 07 fa fa 03 fa fa fa fd fd fa fa 00 00
      0x0c047fff8050: fa fa 00 01 fa fa fd fd fa fa 00 00 fa fa 00 01
      0x0c047fff8060: fa fa 00 06 fa fa 00 06 fa fa 05 fa fa fa 05 fa
    =>0x0c047fff8070: fa fa 00[fa]fa fa fd fa fa fa fd fd fa fa fd fd
      0x0c047fff8080: fa fa fd fd fa fa 00 00 fa fa 00 fa fa fa fd fa
      0x0c047fff8090: fa fa fd fd fa fa 00 00 fa fa fa fa fa fa fa fa
      0x0c047fff80a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff80b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff80c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==10888==ABORTING

Fix this bug by checking whether `end` points at the trailing NUL byte.
Add a test which catches this out-of-bounds read and which demonstrates
that we used to write out-of-bounds data into the formatted message.

Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Original-patch-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Feb 20, 2023
The return type of both `utf8_strwidth()` and `utf8_strnwidth()` is
`int`, but we operate on string lengths which are typically of type
`size_t`. This means that when the string is longer than `INT_MAX`, we
will overflow and thus return a negative result.

This can lead to an out-of-bounds write with `--pretty=format:%<1)%B`
and a commit message that is 2^31+1 bytes long:

    =================================================================
    ==26009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000001168 at pc 0x7f95c4e5f427 bp 0x7ffd8541c900 sp 0x7ffd8541c0a8
    WRITE of size 2147483649 at 0x603000001168 thread T0
        #0 0x7f95c4e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
        #1 0x5612bbb1068c in format_and_pad_commit pretty.c:1763
        #2 0x5612bbb1087a in format_commit_item pretty.c:1801
        git-for-windows#3 0x5612bbc33bab in strbuf_expand strbuf.c:429
        git-for-windows#4 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
        git-for-windows#5 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
        git-for-windows#6 0x5612bba0a4d5 in show_log log-tree.c:781
        git-for-windows#7 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
        git-for-windows#8 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
        git-for-windows#9 0x5612bb69235b in cmd_log_walk builtin/log.c:549
        git-for-windows#10 0x5612bb6951a2 in cmd_log builtin/log.c:883
        git-for-windows#11 0x5612bb56c993 in run_builtin git.c:466
        git-for-windows#12 0x5612bb56d397 in handle_builtin git.c:721
        git-for-windows#13 0x5612bb56db07 in run_argv git.c:788
        git-for-windows#14 0x5612bb56e8a7 in cmd_main git.c:923
        git-for-windows#15 0x5612bb803682 in main common-main.c:57
        git-for-windows#16 0x7f95c4c3c28f  (/usr/lib/libc.so.6+0x2328f)
        git-for-windows#17 0x7f95c4c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        git-for-windows#18 0x5612bb5680e4 in _start ../sysdeps/x86_64/start.S:115

    0x603000001168 is located 0 bytes to the right of 24-byte region [0x603000001150,0x603000001168)
    allocated by thread T0 here:
        #0 0x7f95c4ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
        #1 0x5612bbcdd556 in xrealloc wrapper.c:136
        #2 0x5612bbc310a3 in strbuf_grow strbuf.c:99
        git-for-windows#3 0x5612bbc32acd in strbuf_add strbuf.c:298
        git-for-windows#4 0x5612bbc33aec in strbuf_expand strbuf.c:418
        git-for-windows#5 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
        git-for-windows#6 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
        git-for-windows#7 0x5612bba0a4d5 in show_log log-tree.c:781
        git-for-windows#8 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
        git-for-windows#9 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
        git-for-windows#10 0x5612bb69235b in cmd_log_walk builtin/log.c:549
        git-for-windows#11 0x5612bb6951a2 in cmd_log builtin/log.c:883
        git-for-windows#12 0x5612bb56c993 in run_builtin git.c:466
        git-for-windows#13 0x5612bb56d397 in handle_builtin git.c:721
        git-for-windows#14 0x5612bb56db07 in run_argv git.c:788
        git-for-windows#15 0x5612bb56e8a7 in cmd_main git.c:923
        git-for-windows#16 0x5612bb803682 in main common-main.c:57
        git-for-windows#17 0x7f95c4c3c28f  (/usr/lib/libc.so.6+0x2328f)

    SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy
    Shadow bytes around the buggy address:
      0x0c067fff81d0: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
      0x0c067fff81e0: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
      0x0c067fff81f0: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
      0x0c067fff8200: fd fd fd fa fa fa fd fd fd fd fa fa 00 00 00 fa
      0x0c067fff8210: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
    =>0x0c067fff8220: fd fa fa fa fd fd fd fa fa fa 00 00 00[fa]fa fa
      0x0c067fff8230: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8240: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8250: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8260: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8270: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==26009==ABORTING

Now the proper fix for this would be to convert both functions to return
an `size_t` instead of an `int`. But given that this commit may be part
of a security release, let's instead do the minimal viable fix and die
in case we see an overflow.

Add a test that would have previously caused us to crash.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Oct 18, 2023
When "update-index --unresolve $path" cannot find the resolve-undo
record for the path the user requested to unresolve, it stuffs the
blobs from HEAD and MERGE_HEAD to stage #2 and stage git-for-windows#3 as a
fallback.  For this reason, the operation does not even start unless
both "HEAD" and "MERGE_HEAD" exist.

This is suboptimal in a few ways:

 * It does not recreate stage #1.  Even though it is a correct
   design decision not to do so (because it is impossible to
   recreate in general cases, without knowing how we got there,
   including what merge strategy was used), it is much less useful
   not to have that information in the index.

 * It limits the "unresolve" operation only during a conflicted "git
   merge" and nothing else.  Other operations like "rebase",
   "cherry-pick", and "switch -m" may result in conflicts, and the
   user may want to unresolve the conflict that they incorrectly
   resolved in order to redo the resolution, but the fallback would
   not kick in.

 * Most importantly, the entire "unresolve" operation is disabled
   after a conflicted merge is committed and MERGE_HEAD is removed,
   even though the index has perfectly usable resolve-undo records.

By lazily reading the HEAD and MERGE_HEAD only when we need to go to
the fallback codepath, we will allow cases where resolve-undo
records are available (which is 100% of the time, unless the user is
reading from an index file created by Git more than 10 years ago) to
proceed even after a conflicted merge was committed, during other
mergy operations that do not use MERGE_HEAD, or after the result of
such mergy operations has been committed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
rimrul pushed a commit that referenced this pull request Oct 18, 2023
When t5583-push-branches.sh was originally introduced via 425b4d7
(push: introduce '--branches' option, 2023-05-06), it was not leak-free.
In fact, the test did not even run correctly until 022fbb6 (t5583:
fix shebang line, 2023-05-12), but after applying that patch, we see a
failure at t5583.8:

    ==2529087==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 384 byte(s) in 1 object(s) allocated from:
        #0 0x7fb536330986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x55e07606cbf9 in xrealloc wrapper.c:140
        #2 0x55e075fb6cb3 in prio_queue_put prio-queue.c:42
        git-for-windows#3 0x55e075ec81cb in get_reachable_subset commit-reach.c:917
        git-for-windows#4 0x55e075fe9cce in add_missing_tags remote.c:1518
        git-for-windows#5 0x55e075fea1e4 in match_push_refs remote.c:1665
        git-for-windows#6 0x55e076050a8e in transport_push transport.c:1378
        git-for-windows#7 0x55e075e2eb74 in push_with_options builtin/push.c:401
        git-for-windows#8 0x55e075e2edb0 in do_push builtin/push.c:458
        git-for-windows#9 0x55e075e2ff7a in cmd_push builtin/push.c:702
        git-for-windows#10 0x55e075d8aaf0 in run_builtin git.c:452
        git-for-windows#11 0x55e075d8af08 in handle_builtin git.c:706
        git-for-windows#12 0x55e075d8b12c in run_argv git.c:770
        git-for-windows#13 0x55e075d8b6a0 in cmd_main git.c:905
        git-for-windows#14 0x55e075e81f07 in main common-main.c:60
        git-for-windows#15 0x7fb5360ab6c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
        git-for-windows#16 0x7fb5360ab784 in __libc_start_main_impl ../csu/libc-start.c:360
        git-for-windows#17 0x55e075d88f40 in _start (git+0x1ff40) (BuildId: 38ad998b85a535e786129979443630d025ec2453)

    SUMMARY: LeakSanitizer: 384 byte(s) leaked in 1 allocation(s).

This leak was addressed independently via 68b5117 (commit-reach: fix
memory leak in get_reachable_subset(), 2023-06-03), which makes t5583
leak-free.

But t5583 was not in the tree when 68b5117 was written, and the two
only met after the latter was merged back in via 693bde4 (Merge
branch 'mh/commit-reach-get-reachable-plug-leak', 2023-06-20).

At that point, t5583 was leak-free. Let's mark it as such accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants