Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cin.seekg(0) erroneously returns success in 2.41.0 when cin is a Git Bash pipe. Worked in all earlier versions. #4464

Closed
1 task done
MarkCallow opened this issue Jun 12, 2023 · 8 comments

Comments

@MarkCallow
Copy link

  • I was not able to find an open or closed issue matching what I'm seeing

This issue is about a problem with pipes in the Bash / MingW(?) / MSYS(?) included with 2.41.0. Is this the right place to report such an issue? If not, where should I report it and how to I find the relevant version information to included with the report?

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
$ git --version --build-options

git version 2.41.0.windows.1
cpu: x86_64
built from commit: ff94e79c4724635915dbb3d4ba38f6bb91528260
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
$ cmd.exe /c ver

Microsoft Windows [Version 10.0.22621.1702]
  • What options did you set as part of the installation? Or did you choose the
    defaults?
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
> type "$env:USERPROFILE\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

Editor Option: CustomEditor
Custom Editor Path: "C:\Program Files\Vim\vim90\gvim.exe"
Default Branch Option: main
Path Option: Cmd
SSH Option: OpenSSH
Tortoise Option: false
CURL Option: OpenSSL
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Git Pull Behavior Option: Merge
Use Credential Manager: Enabled
Performance Tweaks FSCache: Enabled
Enable Symlinks: Enabled
Enable Pseudo Console Support: Enabled
Enable FSMonitor: Disabled

  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

No. I've seen it on my own machine and on 2 different CI runners.

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

Bash and PowerShell.

$ cat color_grid_uastc_zstd.ktx2 | ktx2check

I have not yet been able to create a simple reproducer

  • What did you expect to occur after running these commands?

I expected ktx2check to tell me the file was valid.

  • What actually happened instead?

ktx2check told me the file did not have the expected byte count of image data and then told me it wasn't a KTX file.

  • If the problem was occurring with a specific repository, can you provide the
    URL to that repository to help us with testing?

This is not repo related.

What this is really all about

Now the, in this instance not very helpful, boilerplate is out of the way here are the gory details.

Pipes between bash processes are not working correctly. New in 2.41.0. All worked fine in Git for Windows 2.40.1.windows.1 and all previous that I've used. The problem appeared in our GitHub workflows builds on the VS2019 and VS 2022 runners after Git was updated to 2.41.0.

A simple cat foo | cat > bar works fine. bar ends up identical to foo. It is more complicated than that. My application ktx2check is doing this

// Read entire pipe content into buffer as seek does not work on this cin

std::stringstream buffer;
buffer << std::cin.rdbuf();
std::istream* isp = &buffer;

Reads after this all work fine. Once the app has read everything up to payload data it does

off_t dataStart = (off_t)(isp->tellg());
isp->seekg(0, ios_base::end);
if (ctx.inp->fail())
    addIssue(logger::eFatal, IOError.FileSeekEndFailure,
             strerror(errno));
off_t dataEnd = (off_t)(isp->tellg());
dataSizeInFile = dataEnd - dataStart;

The tellg result shows the size is significantly less than the actual file data. 43k less in a 170k file. It is seemingly being truncated somewhere.

Later the app does

   isp->seekg(0); 
   std::streambuf* _streambuf = (isp->rdbuf());

and starts reading from _streambuf. All data read from the streambuf is gibberish leading to the not a KTX file message.

The application code makes no distinction between a pipe and stdin redirection from a file. It just uses std::cin. stdin redirection still works.

I won't have time to try to create a minimal reproducer for several more days. Sorry for that. I am hoping somebody else will recognize the symptoms.

@dscho
Copy link
Member

dscho commented Jun 12, 2023

This might be a fallout from git-for-windows/msys2-runtime@0e357e2.

Looking forward to that minimal reproducer so that we can know for sure.

@MarkCallow
Copy link
Author

MarkCallow commented Jun 15, 2023

Here is a minimal reproducer.

First is a bash script that finds the size of the target file and runs the pipe, sending the pipe-reader program the size of file to be used for comparison.

#!/bin/bash

FILENAME=$1
FILESIZE=$(stat -c%s "$FILENAME")   # Linux, Git 4 Windows
#FILESIZE=$(stat -f%z "$FILENAME")   # macOS, BSD
echo "Size of $FILENAME = $FILESIZE bytes."

cat $FILENAME | ./test-pipe.exe $FILESIZE

and a C++ program that reads from the pipe into a stringstream buffer with an istream

#include <iostream>
#include <sstream>

#include <assert.h>
#include <stdlib.h>

#if defined(_WIN32)
  #include <corecrt_io.h>
  #include <fcntl.h>
  #include <windows.h>
#endif

using namespace std;

int main(int argc, char* argv[])
{
    size_t filesize = atoi(argv[1]);

    istream* isp;
    stringstream buffer;

#if defined(_WIN32)
    /* Set "stdin" to have binary mode */
    (void)_setmode( _fileno( stdin ), _O_BINARY );
#endif
    // Can we seek in this cin?
    cin.seekg(0);
    if (cin.fail()) {
        // Read entire file into a stringstream so we can seek.
        buffer << std::cin.rdbuf();
        isp = &buffer;
    } else {
        isp = &cin;
    }

    uint8_t dbuf[80];
    isp->read((char*)dbuf, sizeof(dbuf));

    off_t dataStart = (off_t)(isp->tellg());
    assert(dataStart == sizeof(dbuf) && "In pointer not at expected offset");

    isp->seekg(0, ios_base::end);
    if (isp->fail()) {
        cerr << "Seek to end failed" << endl;
        exit(1);
    }
    off_t dataSizeInFile = (off_t)(isp->tellg());
    if (dataSizeInFile < 0) {
        cerr << "File tell failure" << endl;
        exit(1);
    }
    if (dataSizeInFile != filesize) {
        cerr << "End of pipe(" << dataSizeInFile << ") != filesize(" << filesize << ")." << endl;
        exit(1);
    }
    cout << "All good." << endl;
    return 0;
}

Compile the C++ program in a "developer PowerShell" or any shell that has Visual Studio stuff in PATH and environment

cl /Fe:test-pipe.exe /EHsc test-pipe.cc

It probably does not matter which compiler. I chose cl because that is what we use on the program where this issue first appeared.

Run the script in Git Bash

$ ./test-pipe ../tests/testimages/color_grid_uastc_zstd.ktx2
Size of ../tests/testimages/color_grid_uastc_zstd.ktx2 = 170512 bytes.
End of pipe(126976) != filesize(170512).

Note that there is no testing of the arguments in either script or c++ program. The former expects one argument, the path to the file to cat. The latter expects a single argument that it runs atoi on. It does not check the result.

@StefanStojanovic
Copy link

Hey @dscho, is there any update on the status of this issue?

@naweed2486
Copy link

$ ./test-pipe ../tests/testimages/color_grid_uastc_zstd.ktx2
Size of ../tests/testimages/color_grid_uastc_zstd.ktx2 = 170512 bytes.
End of pipe(126976) != filesize(170512).

@dscho
Copy link
Member

dscho commented Aug 9, 2023

@MarkCallow thank you for the reproducer. I adapted this minimally to compile with g++ -Wall -static -g -o test-pipe.exe test-pipe.cxx in Git for Windows' SDK:

test-pipe.cxx
#include <iostream>
#include <sstream>

#include <assert.h>
#include <stdlib.h>

#if defined(_WIN32)
  #ifdef __MSVC__
    #include <corecrt_io.h>
  #else
    #include <cstdint>
  #endif
  #include <fcntl.h>
  #include <windows.h>
#endif

using namespace std;

int main(int argc, char* argv[])
{
    off_t filesize = (off_t)strtoull(argv[1], NULL, 10);

    istream* isp;
    stringstream buffer;

#if defined(_WIN32)
    /* Set "stdin" to have binary mode */
    (void)_setmode( _fileno( stdin ), _O_BINARY );
#endif
    // Can we seek in this cin?
    cin.seekg(0);
    if (cin.fail()) {
        // Read entire file into a stringstream so we can seek.
        buffer << std::cin.rdbuf();
        isp = &buffer;
    } else {
        isp = &cin;
    }

    uint8_t dbuf[80];
    isp->read((char*)dbuf, sizeof(dbuf));

    off_t dataStart = (off_t)(isp->tellg());
    assert(dataStart == sizeof(dbuf) && "In pointer not at expected offset");

    isp->seekg(0, ios_base::end);
    if (isp->fail()) {
        cerr << "Seek to end failed" << endl;
        exit(1);
    }
    off_t dataSizeInFile = (off_t)(isp->tellg());
    if (dataSizeInFile < 0) {
        cerr << "File tell failure" << endl;
        exit(1);
    }
    if (dataSizeInFile != filesize) {
        cerr << "End of pipe(" << dataSizeInFile << ") != filesize(" << filesize << ")." << endl;
        exit(1);
    }
    cout << "All good." << endl;
    return 0;
}

The good news is that it reproduces even with vanilla Cygwin. The bad news that it reproduces even with their cygwin-3_4-branch, which is the release train on which MSYS2 and Git for Windows still are. The even worse news is that it reproduces with Cygwin's runtime as built of their main branch, at revision cygwin/cygwin@b9e867d.

Since I lack the bandwidth to pursue this further (and unfortunately, I suspect that the pseudo Console code is responsible for this bug even if pipes have exactly nothing at all to do with pseudo Consoles, and that code is really, really obtuse and there's nothing I can do about it), I would like to suggest taking this to the Cygwin project who are in a much better position to fix this bug. See https://cygwin.com/problems.html how to report bugs there.

@dscho dscho closed this as not planned Won't fix, can't repro, duplicate, stale Aug 9, 2023
@MarkCallow
Copy link
Author

I've reported the issue to the Cygwin project which has lead to an interesting discussion. One question that has arisen is whether the failure has been seen when the pipe reader is a Cygwin program. @dscho how did you run the tests on cygwin. Did you compile test-pipe.cxx with Cgywin's g++?

@dscho
Copy link
Member

dscho commented Aug 15, 2023

I've reported the issue to the Cygwin project which has lead to an interesting discussion. One question that has arisen is whether the failure has been seen when the pipe reader is a Cygwin program. @dscho how did you run the tests on cygwin. Did you compile test-pipe.cxx with Cgywin's g++?

No, I specifically compiled it using mingw-w64 g++.

This is definitely a bug when calling non-Cygwin consumers.

@MarkCallow
Copy link
Author

MarkCallow commented Aug 19, 2023

In the discussion on the cygwin@cygwin.com mailing list we have identified the exact cause of this issue.

The trigger is the cin.seekg(0) test in the reproducer used to decide whether to buffer the contents of stdin. Windows shells set the FILE_SYNCHRONOUS_IO_NONALERT option when creating pipes. Cygwin since 3.4.x does the same thing, a change which affects anything dependent on it including Git for Windows 2.41.0+ and MSYS2. When this option is set, cin.seekg(0) erroneously returns success so the reproducer does not buffer the data and uses seekg to attempt to determine the size of the data.

This can be worked around by always buffering stdin data on Windows.

I will change the issue title to match the understanding.

@MarkCallow MarkCallow changed the title Git Bash pipes are truncating data in 2.41.0. Worked in all earlier versions. cin.seekg(0) erroneously returns success in 2.41.0 when cin is a Git Bash pipe. Worked in all earlier versions. Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants