Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fish 3.0 does not run under Cygwin #5426

Closed
zanchey opened this Issue Dec 21, 2018 · 26 comments

Comments

Projects
None yet
8 participants
@zanchey
Copy link
Member

zanchey commented Dec 21, 2018

The latest build from the Integration_3.0.0 (d6e315d) and master (50fbc36) fails to launch, and fails the tests at the Ctrl-C cancellation by hanging indefinitely, under Cygwin.

Running fish -d5 leaves me with an infinitely-looping output that finishes up like this:

<4> fish: proc::read_try('status current-command')
<4> fish: io_buffer_t::read: blocking read on fd 5
<4> fish: Created pipe using fds 5 and 6
<5> fish: intern /home/zanchey/src/fish-shell/share/functions/fish_title.fish
<4> fish: Created pipe using fds 7 and 8
<5> fish: intern /home/zanchey/src/fish-shell/share/functions/fish_title.fish
<5> fish: intern /home/zanchey/src/fish-shell/share/functions/__fish_pwd.fish
<5> fish: intern /home/zanchey/src/fish-shell/share/functions/fish_title.fish
<5> fish: intern /home/zanchey/src/fish-shell/share/functions/__fish_pwd.fish
<5> fish: intern /home/zanchey/src/fish-shell/share/functions/__fish_pwd.fish
<4> fish: Created pipe using fds 9 and 10
<5> fish: intern /home/zanchey/src/fish-shell/share/functions/__fish_pwd.fish
<5> fish: path_get_path( 'uname' )
<4> fish: env_export_arr() recalc
<4> fish: Fork #1, pid 5984: spawn external command '/usr/bin/uname' from '/home/zanchey/src/fish-shell/share/functions/__fish_pwd.fish'
<3> fish: Created job 5 from command 'uname' with pgrp 5384
<4> fish: Start job 5, gid 5384 (uname), UNCOMPLETED, NON-INTERACTIVE
<4> fish: select_try on fd 9
<4> fish: select_try hit timeout
<4> fish: select_try: no fds returned valid data within the timeout
<4> fish: select_try on fd 9
<4> fish: proc::read_try('uname')
<5> fish: Skipping wait on incomplete job 4 (source ...)
<5> fish: Skipping wait on incomplete job 1 (fish_title ...)
<4> fish: select_try on fd 9
<4> fish: select_try hit timeout
<4> fish: select_try: no fds returned valid data within the timeout
<5> fish: Skipping wait on incomplete job 4 (source ...)
<5> fish: Skipping wait on incomplete job 1 (fish_title ...)

(repeats from select_try on fd 9 forever)

Getting builds on Cygwin involves an astonishing amount of yak-shaving for me (recent searches include "freebsd nfs tuning", "roaming profile gpo samba 4", "ipv6 preferred address", and "tomcat log"), so I don't even want to think what getting a debugging build is going to involve. @mqudsi, any thoughts?

@zanchey zanchey added this to the fish-3.0 milestone Dec 21, 2018

@andrew-schulman

This comment has been minimized.

Copy link
Contributor

andrew-schulman commented Dec 26, 2018

FWIW the same problem happens in the 32- and 64-bit builds.

@zanchey

This comment has been minimized.

Copy link
Member Author

zanchey commented Dec 27, 2018

Having thought about this some more, I'm not going to block 3.0.0 for this. Cygwin users will be no worse off than they are today ("use 2.7.1") and it will get a new release to other platforms' users sooner. We could look at a 3.0.1 if we can get it fixed well before 3.1. (sorry @andrew-schulman - not much of a Christmas present!)

@andrew-schulman

This comment has been minimized.

Copy link
Contributor

andrew-schulman commented Dec 27, 2018

S'ok. I don't disagree. When I can find some time, I'll do some debugging of my own, or ask on the Cygwin list.

@zanchey zanchey modified the milestones: fish-3.0, fish-future Dec 28, 2018

@CodeMonkeyKevin

This comment was marked as resolved.

Copy link

CodeMonkeyKevin commented Dec 28, 2018

Having the same issue on MacOS 10.13.

@zanchey

This comment was marked as resolved.

Copy link
Member Author

zanchey commented Dec 28, 2018

@CodeMonkeyKevin, I'm surprised to hear that as we did lots more testing on macOS than Cygwin - I'm wondering if it's definitely the same issue. Are you able to run fish with fish -d5 and confirm that it displays the same output as above?

@CodeMonkeyKevin

This comment was marked as resolved.

Copy link

CodeMonkeyKevin commented Dec 29, 2018

@zanchey looks like it was a different issue. An issue with VCS ohm plugin.

@amzyang

This comment has been minimized.

Copy link

amzyang commented Dec 30, 2018

The same issue here. When cd into a git repo directory, it blocks.

<4> fish: select_try: no fds returned valid data within the timeout
<5> fish: Skipping wait on incomplete job 6 (__fish_git_prompt_show_upstream ...)
<5> fish: Skipping wait on incomplete job 3 (__fish_git_prompt ...)
<5> fish: Skipping wait on incomplete job 1 (fish_prompt ...)
<4> fish: select_try on fd 9

Reproduce:

set __fish_git_prompt_showupstream 'auto' or some other value.


    command git config -z --get-regexp '^(svn-remote\..*\.url|bash\.showupstream)$' 2>/dev/null | while read -lz key value
        switch $key
            case bash.showupstream
                set show_upstream $value
                test -n "$show_upstream"
                or return
            case svn-remote.'*'.url
                set svn_remote $svn_remote $value
                # Avoid adding \| to the beginning to avoid needing #?? later
                if test -n "$svn_url_pattern"
                    set svn_url_pattern $svn_url_pattern"|$value"
                else
                    set svn_url_pattern $value
                end
                set upstream svn+git # default upstream is SVN if available, else git

                # Save the config key (without .url) for later use
                set -l remote_prefix (string replace -r '\.url$' '' -- $key)
                set svn_prefix $svn_prefix $remote_prefix
        end
    end
@faho

This comment has been minimized.

Copy link
Member

faho commented Dec 30, 2018

@amzyang: Which OS is this on?

@faho

This comment has been minimized.

Copy link
Member

faho commented Dec 30, 2018

@amzyang: I have reproduced your particular issue.

The issue is this bit from your dotfiles:

        env GIT_TERMINAL_PROMPT=0 GIT_SSH_COMMAND="ssh -o BatchMode=yes" git -c gc.auto=0 fetch -t 2>/dev/null >/dev/null &
        disown

inside a function triggered --on-variable PWD, together with some other jobs that happen to happen if the git prompt shows the upstream.

Apparently that means we miss the select timeout and then get into an endless loop somehow (if I ctrl-c this quickly, it quits. If I wait a couple of seconds it doesn't).

@mqudsi did a bunch of changes to our job handling, that might cause this.

I'm not entirely sure it's the same thing as the cygwin stuff.

@faho

This comment has been minimized.

Copy link
Member

faho commented Dec 30, 2018

Simplifying a bit:

fish -c 'sleep 10 &; disown; set -g __fish_git_prompt_showupstream auto; __fish_git_prompt'

triggers it inside a git repo.

I need to try to simplify the git-prompt bit still.

Anyway, it seems that the presence of a disowned job can mess this up.

Enabling "full" job control (i.e. adding status job-control full before the __fish_git_prompt - it doesn't have to be before the sleep 10 &) seems to fix it.

faho added a commit that referenced this issue Dec 30, 2018

Don't wait for disowned pgids if they are special
If a job is disowned that, for some reason, has a pgid that is special
to waitpid, like 0 (process with pgid of the calling process), -1 (any
process), or our actual pgid, that would lead to us waiting for too
many processes when we later try to reap the disowned processes (to
stop zombies from appearing).

And that means we'd snag away the processes we actually do want to
wait for, which would end with us in a waiting loop.

This is tough to reproduce, the easiest I've found was

    fish -ic 'sleep 5 &; disown; set -g __fish_git_prompt_showupstream auto; __fish_git_prompt'

in a git repo.

What we do is to not allow special pgids in the disowned_pids list.
That means we might leave a zombie around (though we probably wait on
0 somewhere), but that's preferable to infinitely looping.

See #5426.
@faho

This comment has been minimized.

Copy link
Member

faho commented Dec 30, 2018

@amzyang: Your issue should be fixed in 4a3ac6e.

The underlying problem was that we disownd a process via its pgid, which was one of those special values that caused us to reap the wrong process, which caused us to keep waiting for a process that was long gone.

@zanchey, @andrew-schulman: It'd be nice if either of you could check that commit to see if it's the same issue. I don't think so, but you never know.

@mqudsi

This comment has been minimized.

Copy link
Contributor

mqudsi commented Dec 31, 2018

@zanchey is there a Cygwin VM image or EC2 AMI I could test this under? I have extremely painful memories about how long it takes to get Cygwin installed and how uncontained a Cygwin installation is.

I changed the behavior of how waiting within the context of a function works in 259cf02, and although it may have some impact on this issue I am doubtful that it would make a real difference as we always keep enumerating jobs/processes and calling waitpid(2), even after a job is reaped (just without suspending CPU if fg jobs are present).

@zanchey

This comment has been minimized.

Copy link
Member Author

zanchey commented Dec 31, 2018

Still failing with 3.0.0-54-ga615151d on Cygwin. I'll see what I can do about getting you access to a VM, although getting Cygwin installed is fairly straightforward these days.

@mqudsi

This comment has been minimized.

Copy link
Contributor

mqudsi commented Dec 31, 2018

I'll try that the next chance I get to hack away at fish.

@amzyang

This comment has been minimized.

Copy link

amzyang commented Jan 1, 2019

@amzyang: Your issue should be fixed in 4a3ac6e.

Thanks. The master branch works for me now.

@mqudsi mqudsi changed the title 3.0b1 fails tests, hangs on startup Fish 3.0 does not run under Cygwin Jan 2, 2019

@mqudsi

This comment has been minimized.

Copy link
Contributor

mqudsi commented Jan 2, 2019

@zanchey @andrew-schulman I just commited d1913f0 to master which gets fish running again under Cygwin. Make sure to build via cmake.

It seems to have been caused by some bugs in Cygwin's process management, specifically with regards to job reaping, that we did not previously run into with the past naive job control implementation.

The low-level tests still fail but I have to run and I did not get a chance to see why (none of them reported an error, for what it's worth).

@orthoxerox

This comment has been minimized.

Copy link

orthoxerox commented Jan 9, 2019

FYI, it hangs under MSYS2 as well, and setting it up for a debug build is much easier than Cygwin.

@mqudsi

This comment has been minimized.

Copy link
Contributor

mqudsi commented Jan 9, 2019

As I said, I already committed a fix for this but I am waiting on feedback. Did you build master or 3.0?

@orthoxerox

This comment has been minimized.

Copy link

orthoxerox commented Jan 10, 2019

MSYS2 uses the following commands to build its version of fish, so I doubt a cmake fix will work as-is:

patch -p1 -i ${srcdir}/01-msysize.patch
autoreconf -ivf
./configure --prefix=/usr --with-doxygen --sysconfdir=/etc
make
make DESTDIR="${pkgdir}/" install

I hard-defined CYGWIN in common.h, and make install finished with a bunch of fork errors I failed to copy, but fish 3.0.0-124-gc15a702f-dirty itself seems to work fine.

@andrew-schulman

This comment has been minimized.

Copy link
Contributor

andrew-schulman commented Jan 11, 2019

This problem is fixed for me now, building from master. Thanks!

Installation to a different destination directory doesn't work right, but that's a separate issue.

@zanchey zanchey modified the milestones: fish-future, fish 3.1.0 Jan 12, 2019

@zanchey zanchey closed this Jan 12, 2019

@zanchey zanchey referenced this issue Jan 12, 2019

Closed

fish 3.0.1? #5520

7 of 7 tasks complete

@orthoxerox orthoxerox referenced this issue Jan 14, 2019

Closed

fish broken #1551

@zanchey zanchey modified the milestones: fish 3.1.0, fish 3.0.1 Jan 18, 2019

@ridiculousfish

This comment has been minimized.

Copy link
Member

ridiculousfish commented Jan 22, 2019

Reopening pending a pick/fix for 3.0.1

@zanchey

This comment has been minimized.

Copy link
Member Author

zanchey commented Jan 22, 2019

The fix should probably just use #ifdef __CYGWIN__ rather than adding to the CMake configuration checks.

mqudsi added a commit that referenced this issue Jan 22, 2019

faho added a commit that referenced this issue Jan 23, 2019

Don't wait for disowned pgids if they are special
If a job is disowned that, for some reason, has a pgid that is special
to waitpid, like 0 (process with pgid of the calling process), -1 (any
process), or our actual pgid, that would lead to us waiting for too
many processes when we later try to reap the disowned processes (to
stop zombies from appearing).

And that means we'd snag away the processes we actually do want to
wait for, which would end with us in a waiting loop.

This is tough to reproduce, the easiest I've found was

    fish -ic 'sleep 5 &; disown; set -g __fish_git_prompt_showupstream auto; __fish_git_prompt'

in a git repo.

What we do is to not allow special pgids in the disowned_pids list.
That means we might leave a zombie around (though we probably wait on
0 somewhere), but that's preferable to infinitely looping.

See #5426.
@faho

This comment has been minimized.

Copy link
Member

faho commented Jan 23, 2019

I have cherry-picked 4a3ac6e, which is a slightly different issue.

I'm assuming we can just simplify the CYGWIN detection of d1913f0 for the actual issue?

@mqudsi

This comment has been minimized.

Copy link
Contributor

mqudsi commented Jan 23, 2019

@faho that one + 462cb60 is it.

@faho

This comment has been minimized.

Copy link
Member

faho commented Jan 23, 2019

@mqudsi: I get a conflict if I try to cherry-pick these, and I'm not sure which part to take.

So should I simply add a

#ifdef __CYGWIN__
wait_by_process = true
#endif 

(sorry for the trailing space, github refuses to not autocomplete issue numbers after a #)

or would you prefer to resolve it?

@mqudsi

This comment has been minimized.

Copy link
Contributor

mqudsi commented Jan 23, 2019

No problem. Cherry picked for fish 3.0.1 as 91ac0f1 and a5ef1e3.

@mqudsi mqudsi closed this Jan 23, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.