Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large file support on 64bit Windows #115


Copy link

t-b commented Jan 31, 2019

No description provided.

kblees and others added 30 commits Sep 8, 2013
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.

This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.

Enable read-only sections for 'git status' and preload_index.

Signed-off-by: Karsten Blees <>
Checking the work tree status is quite slow on Windows, due to slow lstat
emulation (git calls lstat once for each file in the index). Windows
operating system APIs seem to be much better at scanning the status
of entire directories than checking single files.

Add an lstat implementation that uses a cache for lstat data. Cache misses
read the entire parent directory and add it to the cache. Subsequent lstat
calls for the same directory are served directly from the cache.

Also implement opendir / readdir / closedir so that they create and use
directory listings in the cache.

The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.

Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.

Signed-off-by: Karsten Blees <>
There is a problem in the way 9ac3f0e (pack-objects: fix
performance issues on packing large deltas, 2018-07-22) initializes that
mutex in the `packing_data` struct. The problem manifests in a
segmentation fault on Windows, when a mutex (AKA critical section) is
accessed without being initialized. (With pthreads, you apparently do
not really have to initialize them?)

This was reported in git-for-windows#1839.

Signed-off-by: Johannes Schindelin <>
If multiple threads access a directory that is not yet in the cache, the
directory will be loaded by each thread. Only one of the results is added
to the cache, all others are leaked. This wastes performance and memory.

On cache miss, add a future object to the cache to indicate that the
directory is currently being loaded. Subsequent threads register themselves
with the future object and wait. When the first thread has loaded the
directory, it replaces the future object with the result and notifies
waiting threads.

Signed-off-by: Karsten Blees <>
Use a suffciently large buffer to strip the trailing slash.

Signed-off-by: Karsten Blees <>
Signed-off-by: Johannes Schindelin <>
MSys2's strace facility is very useful for debugging... With this patch,
the bash will be executed through strace if the environment variable
GIT_STRACE_COMMANDS is set, which comes in real handy when investigating
issues in the test suite.

Also support passing a path to a log file via GIT_STRACE_COMMANDS to
force Git to call strace.exe with the `-o <path>` argument, i.e. to log
into a file rather than print the log directly.

That comes in handy when the output would otherwise misinterpreted by a
calling process as part of Git's output.

Note: the values "1", "yes" or "true" are *not* specifying paths, but
tell Git to let strace.exe log directly to the console.

Signed-off-by: Johannes Schindelin <>
On Windows >= Vista, not having an application manifest with a
requestedExecutionLevel can cause several kinds of confusing behavior.

The first and more obvious behavior is "Installer Detection", where
Windows sometimes decides (by looking at things like the file name and
even sequences of bytes within the executable) that an executable is an
installer and should run elevated (causing the well-known popup dialog
to appear). In Git's context, subcommands such as "git patch-id" or "git
update-index" fall prey to this behavior.

The second and more confusing behavior is "File Virtualization". It
means that when files are written without having write permission, it
does not fail (as expected), but they are instead redirected to
somewhere else. When the files are read, the original contents are
returned, though, not the ones that were just written somewhere else.
Even more confusing, not all write accesses are redirected; Trying to
write to write-protected .exe files, for example, will fail instead of

In addition to being unwanted behavior, File Virtualization causes
dramatic slowdowns in Git (see for instance

There are two ways to prevent those two behaviors: Either you embed an
application manifest within all your executables, or you add an external
manifest (a file with the same name followed by .manifest) to all your
executables. Since Git's builtins are hardlinked (or copied), it is
simpler and more robust to embed a manifest.

A recent enough MSVC compiler should already embed a working internal
manifest, but for MinGW you have to do so by hand.

Very lightly tested on Wine, where like on Windows XP it should not make
any difference.

  - New UAC Technologies for Windows Vista
  - Create and Embed an Application Manifest (UAC)

[js: simplified the embedding dramatically by reusing Git for Windows'
existing Windows resource file, removed the optional (and dubious)
processorArchitecture attribute of the manifest's assemblyIdentity

Signed-off-by: Cesar Eduardo Barros <>
Signed-off-by: Johannes Schindelin <>
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).

Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the

Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).

Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).

Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).

Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH

Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.

While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.

Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
msysgit#122 (comment)

[jes: adjusted test number to avoid conflicts, added support for
chdir(), etc]

Thanks-to: Martin W. Kirst <>
Thanks-to: Doug Kelly <>
Signed-off-by: Karsten Blees <>
Original-test-by: Andrey Rogozhnikov <>
Signed-off-by: Stepan Kasal <>
Signed-off-by: Johannes Schindelin <>
As suggested privately to Brendan Forster by some unnamed person
(suggestion for the future: use the public mailing list, or even the
public GitHub issue tracker, that is a much better place to offer such
suggestions), we should make use of gcc's stack smashing protector that
helps detect stack buffer overruns early.

Rather than using -fstack-protector, we use -fstack-protector-strong
because it strikes a better balance between how much code is affected
and the performance impact.

In a local test (time git log --grep=is -p), best of 5 timings went from
23.009s to 22.997s (i.e. the performance impact was *well* lost in the

This fixes git-for-windows#501

Signed-off-by: Johannes Schindelin <>
The `git_terminal_prompt()` function expects the terminal window to be
attached to a Win32 Console. However, this is not the case with terminal
windows other than `cmd.exe`'s, e.g. with MSys2's own `mintty`.

Non-cmd terminals such as `mintty` still have to have a Win32 Console
to be proper console programs, but have to hide the Win32 Console to
be able to provide more flexibility (such as being resizeable not only
vertically but also horizontally). By writing to that Win32 Console,
`git_terminal_prompt()` manages only to send the prompt to nowhere and
to wait for input from a Console to which the user has no access.

This commit introduces a function specifically to support `mintty` -- or
other terminals that are compatible with MSys2's `/dev/tty` emulation. We
use the `TERM` environment variable as an indicator for that: if the value
starts with "xterm" (such as `mintty`'s "xterm_256color"), we prefer to
let `xterm_prompt()` handle the user interaction.

The most prominent user of `git_terminal_prompt()` is certainly
`git-remote-https.exe`. It is an interesting use case because both
`stdin` and `stdout` are redirected when Git calls said executable, yet
it still wants to access the terminal.

When running inside a `mintty`, the terminal is not accessible to the
`git-remote-https.exe` program, though, because it is a MinGW program
and the `mintty` terminal is not backed by a Win32 console.

To solve that problem, we simply call out to the shell -- which is an
*MSys2* program and can therefore access `/dev/tty`.

Helped-by: nalla <>
Signed-off-by: Karsten Blees <>
Signed-off-by: Johannes Schindelin <>
A '+' is not a valid part of a filename with Windows file systems (it is
reserved because the '+' operator meant file concatenation back in the
DOS days).

Let's just not use it.

Signed-off-by: Johannes Schindelin <>
HOME initialization was historically duplicated in many different places,
including /etc/profile, launch scripts such as git-bash.vbs and gitk.cmd,
and (although slightly broken) in the git-wrapper.

Even unrelated projects such as GitExtensions and TortoiseGit need to
implement the same logic to be able to call git directly.

Initialize HOME in git's own startup code so that we can eventually retire
all the duplicate initialization code.

Signed-off-by: Karsten Blees <>
Git on native Windows exclusively uses UTF-8 for console output (both with
mintty and native console windows). Gettext uses setlocale() to determine
the output encoding for translated text, however, MSVCRT's setlocale()
doesn't support UTF-8. As a result, translated text is encoded in system
encoding (GetAPC()), and non-ASCII chars are mangled in console output.

Use gettext's bind_textdomain_codeset() to force the encoding to UTF-8 on
native Windows.

In this developers' setup, HAVE_LIBCHARSET_H is apparently defined, but
we *really* want to override the locale_charset() here.

Signed-off-by: Karsten Blees <>
For performance reasons `stdout` is not unbuffered by default. That leads
to problems if after printing to `stdout` a read on `stdin` is performed.

For that reason interactive commands like `git clean -i` do not function
properly anymore if the `stdout` is not flushed by `fflush(stdout)` before
trying to read from `stdin`.

In the case of `git clean -i` all reads on `stdin` were preceded by a
`fflush(stdout)` call.

Signed-off-by: nalla <>
Accessing the Windows console through the special CONIN$ / CONOUT$ devices
doesn't work properly for non-ASCII usernames an passwords.

It also doesn't work for terminal emulators that hide the native console
window (such as mintty), and 'TERM=xterm*' is not necessarily a reliable
indicator for such terminals.

The new shell_prompt() function, on the other hand, works fine for both
MSys1 and MSys2, in native console windows as well as mintty, and properly
supports Unicode. It just needs bash on the path (for 'read -s', which is

On Windows, try to use the shell to read from the terminal. If that fails
with ENOENT (i.e. bash was not found), use CONIN/OUT as fallback.

Note: To test this, create a UTF-8 credential file with non-ASCII chars,
e.g. in git-bash: 'echo url=http://tä > cred.txt'. Then in git-cmd,
'git credential fill <cred.txt' works (shell version), while calling git
without the git-wrapper (i.e. 'mingw64\bin\git credential fill <cred.txt')
mangles non-ASCII chars in both console output and input.

Signed-off-by: Karsten Blees <>
With the recent update in efee955 (gpg-interface: check gpg signature
creation status, 2016-06-17), we ask GPG to send all status updates to
stderr, and then catch the stderr in an strbuf.

But GPG might fail, and send error messages to stderr. And we simply
do not show them to the user.

Even worse: this swallows any interactive prompt for a passphrase. And
detaches stderr from the tty so that the passphrase cannot be read.

So while the first problem could be fixed (by printing the captured
stderr upon error), the second problem cannot be easily fixed, and
presents a major regression.

So let's just revert commit efee955.

This fixes git-for-windows#871

Cc: Michael J Gruber <>
Signed-off-by: Johannes Schindelin <>
We introduced helper macros to simplify loading functions dynamically.
Might just as well use them.

Signed-off-by: Johannes Schindelin <>
A change between versions 2.4.1 and 2.6.0 of the MSYS2 runtime modified
how Cygwin's runtime (and hence Git for Windows' MSYS2 runtime
derivative) handles locales: d16a56306d (Consolidate wctomb/mbtowc calls
for POSIX-1.2008, 2016-07-20).

An unintended side-effect is that "cold-calling" into the POSIX
emulation will start with a locale based on the current code page,
something that Git for Windows is very ill-prepared for, as it expects
to be able to pass a command-line containing non-ASCII characters to the
shell without having those characters munged.

One symptom of this behavior: when `git clone` or `git fetch` shell out
to call `git-upload-pack` with a path that contains non-ASCII
characters, the shell tried to interpret the entire command-line
(including command-line parameters) as executable path, which obviously
must fail.

This fixes git-for-windows#1036

Signed-off-by: Johannes Schindelin <>
We should not actually expect the first `attrib.exe` in the PATH to
be the one we are looking for. Or that it is in the PATH, for that

Signed-off-by: Johannes Schindelin <>
When running an external diff from, say, a diff tool, it is safe to
assume that we want to write the files in question. On Windows, that
means that there cannot be any other process holding an open handle to
said files.

So let's make sure that `git diff` itself is not holding any open handle
to the files in question.

This fixes git-for-windows#1315

Signed-off-by: Johannes Schindelin <>
This way the libraries get properly installed into the "site_perl"
directory and we just have to move them out of the "mingw" directory.

Signed-off-by: Sebastian Schuberth <>
There is a really useful debugging technique developed by Sverre
Rabbelier that inserts "bash &&" somewhere in the test scripts, letting
the developer interact at given points with the current state.

Another debugging technique, used a lot by this here coder, is to run
certain executables via gdb by guarding a "gdb -args" call in

Both techniques were disabled by 781f76b(test-lib: redirect stdin of

Let's reinstate the ability to run an interactive shell by making the
redirection optional: setting the TEST_NO_REDIRECT environment variable
will skip the redirection.

Signed-off-by: Johannes Schindelin <>
POSIX-to-Windows path mangling would make it fail.

Signed-off-by: Johannes Schindelin <>
As of a couple of weeks ago, t9116 hangs sometimes -- but not always! --
when being run in the Git for Windows SDK.

The issue seems to be related to redirection via a pipe, but it is really
hard to diagnose, what with git.exe (a non-MSYS2 program) calling a Perl
script (which is executed by an MSYS2 Perl), piping into another MSYS2

As hunting time is scarce these days, simply work around this for now and
leave the real diagnosis and resolution for later.

Signed-off-by: Johannes Schindelin <>
Just like the workaround we added for t9116, t9001.83 hangs sometimes --
but not always! -- when being run in the Git for Windows SDK.

The issue seems to be related to redirection via a pipe, but it is really
hard to diagnose, what with git.exe (a non-MSYS2 program) calling a Perl
script (which is executed by an MSYS2 Perl), piping into another MSYS2

As hunting time is scarce these days, simply work around this for now and
leave the real diagnosis and resolution for later.

Signed-off-by: Johannes Schindelin <>
These patches are probably no longer necessary. Make sure before
dropping them, though.

Signed-off-by: Johannes Schindelin <>

strbuf_readlink() calls readlink() twice if the hint argument specifies the
exact size of the link target (e.g. by passing stat.st_size as returned by
lstat()). This is necessary because 'readlink(..., hint) == hint' could
mean that the buffer was too small.

Use hint + 1 as buffer size to prevent this.

Signed-off-by: Karsten Blees <>
Signed-off-by: Karsten Blees <>
strbuf_readlink() refuses to read link targets that exceed PATH_MAX (even
if a sufficient size was specified by the caller).

As some platforms support longer paths, remove this restriction (similar
to strbuf_getcwd()).

Signed-off-by: Karsten Blees <>
dscho and others added 11 commits Dec 27, 2018
It is a bit ridiculous to spin up a full-blown Perl instance (especially
on Windows, where that means spinning up a full POSIX emulation layer,
AKA the MSYS2 runtime) just to tell how large a given file is.

So let's just use the test-tool to do that job instead.

Signed-off-by: Johannes Schindelin <>
Make sure to write the .xml in UTF-8 encoding.

We also need to make sure that invalid UTF-8 encoding is turned into
valid UTF-8 (using the Replacement Character, \uFFFD) because t9902's
trace contains such invalid byte sequences, and the task that uploads
the test results would refuse to do anything if it was asked to parse an
.xml file with invalid UTF-8 in it.

Signed-off-by: Johannes Schindelin <>
… failure

Let's use the new `test-tool path-utils file-size` command in the new
JUnit XML feature.

Signed-off-by: Johannes Schindelin <>
Add the comment that we added to the patch series intended for upstream

Signed-off-by: Johannes Schindelin <>
The fact that Git's test suite is implemented in Unix shell script that
is as portable as we can muster, combined with the fact that Unix shell
scripting is foreign to Windows (and therefore has to be emulated),
results in pretty abysmal speed of the test suite on that platform, for
pretty much no other reason than that language choice.

For comparison: while the Linux build & test is typically done within
about 8 minutes, the Windows build & test typically lasts about 80
minutes in Azure Pipelines.

To help with that, let's use the Azure Pipeline feature where you can
parallelize jobs, make jobs depend on each other, and pass artifacts
between them.

The tests are distributed using the following heuristic: listing all
test scripts ordered by size in descending order (as a cheap way to
estimate the overall run time), every Nth script is run (where N is the
total number of parallel jobs), starting at the index corresponding to
the parallel job. This slicing is performed by a new function that is
added to the `test-tool`.

To optimize the overall runtime of the entire Pipeline, we need to move
the Windows jobs to the beginning (otherwise there would be a very
decent chance for the Pipeline to be run only the Windows build, while
all the parallel Windows test jobs wait for this single one).

We use Azure Pipelines Artifacts for both the minimal Git for Windows
SDK as well as the built executables, as deduplication and caching close
to the agents makes that really fast. For comparison: while downloading
and unpacking the minimal Git for Windows SDK via PowerShell takes only
one minute (down from anywhere between 2.5 to 7 when using a shallow
clone), uploading it as Pipeline Artifact takes less than 30s and
downloading and unpacking less than 20s (sometimes even as little as
only twelve seconds).

Signed-off-by: Johannes Schindelin <>
Improve Azure Pipelines support
The NtQueryDirectoryFile() call works best (and on Windows 8.1 and
earlier, it works *only*) with buffer sizes up to 64kB. Which is 32k
wide characters, so let's use that as our buffer size.

Signed-off-by: Johannes Schindelin <>
Fix a network drive related bug in the new, fast FSCache
Signed-off-by: Martin Koegler <>
Signed-off-by: Junio C Hamano <>
Signed-off-by: Torsten Bögershausen <>
Helped-by: SZEDER Gábor <>
Signed-off-by: Ramsay Jones <>
Signed-off-by: Junio C Hamano <>
Currently the length of data which is stored in memory is stored
in "unsigned long" at many places in the code base.
This is OK when both "unsigned long" and size_t are 32 bits,
(and is OK when both are 64 bits).
On a 64 bit Windows system am "unsigned long" is 32 bit, and
that may be too short to measure the size of objects in memory,
a size_t is the natural choice.

Improve the code base in "small steps", as small as possible.
The smallest step seems to be much bigger than expected.
See this code-snippet from convert.c:
        const char *ret;
        unsigned long sz;
        void *data = read_blob_data_from_index(istate, path, &sz);
        ret = gather_convert_stats_ascii(data, sz);

The corrected version looks like this:
        const char *ret;
        size_t sz;
        void *data = read_blob_data_from_index(istate, path, &sz);
        ret = gather_convert_stats_ascii(data, sz);

However, when the Git code base is compiled with a compiler that
complains that "unsigned long" is different from size_t, we end
up in this huge patch, before the code base cleanly compiles.

Signed-off-by: Torsten Bögershausen <>
Signed-off-by: Thomas Braun <>
@t-b t-b force-pushed the t-b:tb.190104_0635_convert_size_t_only_git_master_181124_mk_size_t branch from adca490 to 488bfb0 Jan 31, 2019
t-b added 5 commits Jan 31, 2019
Signed-off-by: Thomas Braun <>
Signed-off-by: Thomas Braun <>
@t-b t-b force-pushed the t-b:tb.190104_0635_convert_size_t_only_git_master_181124_mk_size_t branch from 488bfb0 to 01fe65a Feb 1, 2019
@t-b t-b mentioned this pull request Feb 12, 2019
1 of 1 task complete
cache.h Show resolved Hide resolved
@dscho dscho mentioned this pull request Mar 1, 2019
1 of 1 task complete
@t-b t-b closed this Mar 7, 2019
@t-b t-b deleted the t-b:tb.190104_0635_convert_size_t_only_git_master_181124_mk_size_t branch Mar 7, 2019

This comment has been minimized.

Copy link

t-b commented Mar 7, 2019


This comment has been minimized.

Copy link

PhilipOakley commented Mar 21, 2019

My latest version is a

The key points I have found are:

  1. The typecast comments were 'wrong' but well intentioned, see the code.
  2. The zlib changes in my code appear to work, even if I drop the ZLIB_BUF_MAX size down to a 1Mb, while reducing the added file to a middling size.
  3. The deflateBound function is only uInt (32bit), so we need to disable it and use the #define.
  4. the crc32() function also needs fixing (again only uInt(32bit) for len bytes) - many places - so we will need a wrapper.
  5. added a test function to show the zlib compiler flags that tell us what sizes the types are within the library in use.
  6. The test had a mistaken option --verify, corrected.
  7. At the moment the test has been extended(drawn out),
  8. and, the initial 'git add' isn't getting past the verify-pack, and I can't tell which one is in error.

** NEED A LINUX PASSING TEST RESULT FOR COMPARISON** at the 'git add' stage to see if the 'pack that the large file generates is correct.. Any help/suggestions?



This comment has been minimized.

Copy link

t-b commented Mar 21, 2019

@t-b Nice finds. The branch in #115 (comment) has a commit which has a zipped trash directory from a linux box.


This comment has been minimized.

Copy link

PhilipOakley commented Mar 22, 2019

@t-b Thanks. I also got WSl going - I had the impression that there were 32 bit problems ther but they don't appear to have surfaced.

I did find that, relative to my code, the WSL pack after the initial git add file was 5 bytes longer than my GfW code, as reported by the WSL git verify-pack -v. However my GfW verify-pack code rejects both!

I will compare to your zipped trash directory to add an extra reference point. Many thanks.


This comment has been minimized.

Copy link

dscho commented on builtin/index-pack.c in 7557481 Mar 29, 2019

@tboegi this is incomplete: both size and c need to be of type size_t, as the value will actually be read into c. So if that variable is too small to hold the size, then the size can be as wide as it wants, and it will still have the incorrect size.

This comment has been minimized.

Copy link

tboegi replied Mar 29, 2019

c is in the range 0..255 and will be promoted to size_t when needed.
I just changed c to be an unsigned char, and the test suite passes.

This comment has been minimized.

Copy link

dscho replied Apr 1, 2019

Hmm. But unsigned long should be plenty capable of holding values between 0..0xff...

This comment has been minimized.

Copy link

dscho replied Apr 1, 2019

I just changed c to be an unsigned char, and the test suite passes.

BTW the test suite does not seem to have a test that catches the problem reported in git-for-windows#1848... so I have not much confidence that changing it to unsigned char would not re-introduce the bug...


This comment has been minimized.

Copy link

dscho commented on builtin/unpack-objects.c in 7557481 Mar 29, 2019

@tboegi this is incomplete (same as in index-pack.c): both size and c need to be of type size_t, as the value will actually be read into c. So if that variable is too small to hold the size, then the size can be as wide as it wants, and it will still have the incorrect size.

With these two fixes, I can clone the MCVE provided by @ribalda in git-for-windows#1848

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
You can’t perform that action at this time.