Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git can't read huge files out of a pack #2065

Closed
1 task done
filcab opened this issue Feb 11, 2019 · 4 comments
Closed
1 task done

Git can't read huge files out of a pack #2065

filcab opened this issue Feb 11, 2019 · 4 comments

Comments

@filcab
Copy link

filcab commented Feb 11, 2019

  • I was not able to find an open or closed issue matching what I'm seeing

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
$ git --version --build-options

git version 2.20.1.windows.1
cpu: x86_64
built from commit: 7c9fbc07db0e2939b36095df45864b8cda19b64f
sizeof-long: 4
sizeof-size_t: 8
  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
$ cmd.exe /c ver

Microsoft Windows [Version 10.0.17134.523]
  • What options did you set as part of the installation? Or did you choose the
    defaults?
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

Editor Option: VIM
Custom Editor Path:
Path Option: CmdTools
SSH Option: OpenSSH
CURL Option: OpenSSL
CRLF Option: CRLFCommitAsIs
Bash Terminal Option: MinTTY
Performance Tweaks FSCache: Enabled
Use Credential Manager: Enabled
Enable Symlinks: Enabled
  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

Nope.

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

bash

$ mkdir git-repo

$ cd git-repo/

$ git init .
Initialized empty Git repository in .../git-repo/.git/

$ dd if=/dev/zero of=test4G bs=4M count=1024
1024+0 records in
1024+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 2.5895 s, 1.7 GB/s

$ git add test4G

$ git commit -m hello
[master (root-commit) 636cdb5] hello
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 test4G

$ git ls-tree $(git log -1 --pretty="%T" HEAD)
100644 blob 451971a31ea5a207a10b391df2d5949910133565    test4G

$ git show 451971a31ea5a207a10b391df2d5949910133565 | wc -c
error: bad object header
fatal: packed object 451971a31ea5a207a10b391df2d5949910133565 (stored in .git/objects/pack/pack-43e2c696ce675c3ed09d82deeed262b870b6f27b.pack) is corrupt
0
  • What did you expect to occur after running these commands?

No errors when running git show $hash

  • What actually happened instead?

Errors

  • If the problem was occurring with a specific repository, can you provide the
    URL to that repository to help us with testing?

No need.

  • Additional notes:
    Getting to that folder with WSL's git (git version 2.17.1) works (as expected. It looks like the problem is only when reading the packfile, not when writing. The packfile is not corrupt):
$ git show 451971a31ea5a207a10b391df2d5949910133565 | wc -c
4294967296

This seems to be related to code like (in packfile.c):

int unpack_object_header(struct packed_git *p,
			 struct pack_window **w_curs,
			 off_t *curpos,
			 unsigned long *sizep)
{
	unsigned char *base;
	unsigned long left;
	unsigned long used;
	enum object_type type;

	/* use_pack() assures us we have [base, base + 20) available
	 * as a range that we can look at.  (Its actually the hash
	 * size that is assured.)  With our object header encoding
	 * the maximum deflated object size is 2^137, which is just
	 * insane, so we know won't exceed what we have been given.
	 */
	base = use_pack(p, w_curs, *curpos, &left);
	used = unpack_object_header_buffer(base, left, &type, sizep);
	if (!used) {
		type = OBJ_BAD;
	} else
		*curpos += used;

	return type;
}

curpos is off_t, which should be ok. But used is only an unsigned long, which is 32-bit on Windows (https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models ).
Changing left and used to be off_t (and then changing all callees) should fix this. How ok is it to create a test which generates a 4GiB file when being run?
I can try to fix this if I manage to get a build of git working on Windows. Should I then file a patch to this project or to the main git project? (Other platforms might have this problem, but I'm not aware of any. Still: It's probably more correct to use off_t in this code).

Thank you,
Filipe

@filcab
Copy link
Author

filcab commented Feb 11, 2019

Unfortunately, it looks like git is assuming unsigned long is an acceptable type for offsets into packs, etc. Changing the type can involve changing many more functions than expected. I'll try and create a localized change (which would fix this bug but potentially leave a few unfixed).

@PhilipOakley
Copy link

have a look at https://public-inbox.org/git/994568940.109648.1548957557643@ox.hosteurope.de/ and link in with Thomas.

The problem appears to be in the different type conversions on the different platforms during the zlib decode test

@t-b
Copy link

t-b commented Feb 12, 2019

Hi @filcab,

yes that is a known and current limitation in git. I'm planning to get that fixed at some point, but I don't have an ETA. If you want to help and get that done earlier, feel free to contact me via my email @PhilipOakley linked above.

My current WIP and not-yet-working branch is at gitgitgadget#115.

@dscho
Copy link
Member

dscho commented Jan 1, 2020

There is also #2179; let's move the discussion there.

@dscho dscho closed this as completed Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants