git-lfs fetch fails ~50% of the time when using helper mode (macos, linux) #853

olsen232 · 2023-05-31T07:49:48Z

git-lfs fetch can be called by kart clone, kart checkout, kart lfs+ fetch when working with tile-based datasets (point clouds, rasters). Of these kart clone is the most reliable way to reproduce since the lfs-cache is guaranteed to be empty at this point.

The crash is characterised by these two lines:
fatal error: signal_recv: inconsistent state at the top of the crash
Error: There was a problem with git-lfs fetch: Command '['git-lfs', 'fetch', 'origin', '<GIT_SHA>']' returned non-zero exit status 2. at the bottom of the crash

Re-running the command often succeeds.
Re-running the command with KART_USE_HELPER=0 always succeeds.
Since this git-lfs command shouldn't need to worry about if it's connected to a tty, or need to read from stdin, it's not clear what's going wrong with helper mode that's breaking git-lfs.

The text was updated successfully, but these errors were encountered:

olsen232 · 2023-05-31T08:12:17Z

This seems to pretty reliably toggle between a good-state and a bad state.
After the crash, it moves it into a good state: the next command will succeed.
The successful next command moves it into a bad state: the next command will fail.

Any kart command toggles this between good and bad state: it is not necessary that the command calls git-lfs. So, for instance, it is possible to reliably run kart clone without it ever crashing provided after each one you run kart --version or kart status or something that is guaranteed not to fail, and do so an odd number of times.

Killing the kart-helper process moves it into a bad state: the next command will always fail.

This is definitely related to #854: kart --help never works with helper mode, but the way in which kart --help fails can also tell you which state you are in. If kart --help has no output, you are moving into a bad state: the next command will fail. If kart --help has the output sh: 1: Bad file descriptor then you are moving into a good state: the next command will succeed.

olsen232 · 2023-06-01T03:14:56Z

Definitely something similar is happening on linux, where using the helper causes subprocesses to fail but only exactly half the time.

I was struggling to exercise this bug properly on my linux VM: man wasn't working, my SSH config wasn't working - so I wrote an ext-run script. The following script works half the time for me, on macos or linux:

import subprocess

def main(ctx, args):
    subprocess.check_call(["pwd"])

From here I discovered that if you explicitly set the following: stdin=sys.stdin, stdout=sys.stdout, stderr=sys.stderr - then it works 100% of the time. This is a bit like a workaround I discovered earlier: when running the git-lfs subprocess, if you set stdin to dev-null when making the subprocess call, it will always work - but now it is clearer why.

Ordinarily, setting stdin=sys.stdin, stdout=sys.stdout, stderr=sys.stderr would have no effect when running a subprocess - those are the default values. However, in helper mode, the values of sys.stdin, sys.stdout, sys.stderr change often. Presumably, failing to set the keyword-args means they take some default value, but not the latest, correct value. Or to put it another way: Python's "default-stdin-when-calling-a-subprocess" ends up out of sync with "sys.stdin", and sys.stdin has the value we want.
I still don't know how these values being out-of-sync results in a 2-state cycle of things being more or less broken.

Basically the fix is setting these values everywhere. This also (I think, I need to do further testing) even fixes interactive editors - the tty-capability of these streams is still working. But, I have some decisions about how to fix them:

if I don't expects a subprocess to listen for input, should I just give it stdin=/dev/null? Or should I give it sys.stdin, in case in rare circumstances it needs to interact with the user. Probably the second.
Should I write my own subprocess module that delegates to the existing subprocess module, but always sets stdin, stdout, and stderr to the appropriate values, so that to verify all of Kart is correct I just have to make sure I never import the wrong subprocess module? Probably I should: the only obstacle is that Kart already has some half-hearted wrappers on the subprocess module, which I need to look through to see how they fit in.

olsen232 mentioned this issue May 31, 2023

man-style kart --help doesn't work with helper mode (macos) #854

Closed

olsen232 changed the title ~~git-lfs fetch fails ~50% of the time when using helper mode (macos)~~ git-lfs fetch fails ~50% of the time when using helper mode (macos, linux) Jun 1, 2023

olsen232 mentioned this issue Jun 2, 2023

Explicitly connect stdin, stdout, stderr #856

Merged

3 tasks

olsen232 added a commit that referenced this issue Jun 2, 2023

Update CHANGELOG with #853

f639c5a

olsen232 closed this as completed in #856 Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

git-lfs fetch fails ~50% of the time when using helper mode (macos, linux) #853

git-lfs fetch fails ~50% of the time when using helper mode (macos, linux) #853

olsen232 commented May 31, 2023 •

edited

Loading

olsen232 commented May 31, 2023

olsen232 commented Jun 1, 2023 •

edited

Loading

git-lfs fetch fails ~50% of the time when using helper mode (macos, linux) #853

git-lfs fetch fails ~50% of the time when using helper mode (macos, linux) #853

Comments

olsen232 commented May 31, 2023 • edited Loading

olsen232 commented May 31, 2023

olsen232 commented Jun 1, 2023 • edited Loading

olsen232 commented May 31, 2023 •

edited

Loading

olsen232 commented Jun 1, 2023 •

edited

Loading