Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run fish in docker image will spawn a zombie python process #7183

Closed
sheeaza opened this issue Jul 7, 2020 · 13 comments · Fixed by #7189
Closed

run fish in docker image will spawn a zombie python process #7183

sheeaza opened this issue Jul 7, 2020 · 13 comments · Fixed by #7189
Labels
bug Something that's not working as intended
Milestone

Comments

@sheeaza
Copy link

sheeaza commented Jul 7, 2020

fish version: 3.1.0
docker image: ubuntu 20.04
docker run command: docker run -it --rm --init <image id>
image

However when I run fish on host linux, there is no such issue. Is this docker issue or fish issue? Or can you explain the python zombie process?

@faho
Copy link
Member

faho commented Jul 7, 2020

This is fish running the completion generator. It happens on first start, when it will create a ~/.local/share/fish/generated_completions directory, and if that exists it won't do it again.

It should just complete after a few minutes.

@faho faho added the question label Jul 7, 2020
@sheeaza
Copy link
Author

sheeaza commented Jul 8, 2020

  1. It's strange that the zombie python always here until the parent fish process is down
  2. each fish shell will spawn a zombie python
  3. On docker platform there is no ~/.local/share/fish/generated_completions such directory, but on the normal host OS there is, and how should I debug why in docker there is no such directory?

@zanchey
Copy link
Member

zanchey commented Jul 8, 2020

Yes, I don't think this is expected.

Zombie processes in Docker are usually the result of running the container without a full init process, though your top output above appears ok. I would be interested in the output of pstree (if you have it installed) or ps -o pid,ppid,state,user,cmd if not.

@sheeaza
Copy link
Author

sheeaza commented Jul 8, 2020

I found that if I mkdir ~/.local/share/fish/generated_completions, the new lunched fish shell will not spawn zombie python, I will later put pstree

I have use the --init option in docker to do the init job

@faho
Copy link
Member

faho commented Jul 8, 2020

It is expected that fish runs the completion generator.

It is not expected that it doesn't finish. It should just take a couple of minutes (worst case - it takes 6s for me with ~3000 man pages but that's on a fairly quick ssd) and then exit.

@mqudsi
Copy link
Contributor

mqudsi commented Jul 8, 2020

zombie is different from background process or a process that isn't finished; it is finishing but it's not being reaped (which is to be expected in the absence of init). Since it didn't finish successfully, this happens each time rather than just the first session. Perhaps broken permissions or broken docker path mappings are preventing it from completing successfully (which is different from not completing)?

I would delete that folder you just created and manually run the completion script in a session to see what errors it throws/why it's not able to generate the completions.

@sheeaza
Copy link
Author

sheeaza commented Jul 9, 2020

image

# fish_update_completions
python3: can't open file '/usr/share/fish/tools/create_manpage_completions.py': [Errno 2]
 No such file or directory

The pstree is like this, I run the completions command and the output is as above, it seems that the fish package lacks some python scripts?
==================
updates: the create_manpage_completions.py is in fish-tools package, and the docker image did not install it, only installed fish, and usually there is no man misc in docker image since the image should be small.

        if not test -d $__fish_user_data_dir/generated_completions
            # Generating completions from man pages needs python (see issue #3588).

            # We cannot simply do `fish_update_completions &` because it is a function.
            # We cannot do `eval` since it is a function.
            # We don't want to call `fish -c` since that is unnecessary and sources config.fish again.
            # Hence we'll call python directly.
            # c_m_p.py should work with any python version.
            set -l update_args -B $__fish_data_dir/tools/create_manpage_completions.py --manpath --cleanup-in '~/.config/fish/completions' --cleanup-in '~/.config/fish/generated_completions'
            if set -l python (__fish_anypython)
                # Run python directly in the background and swallow all output
                $python $update_args >/dev/null 2>&1 &
                # Then disown the job so that it continues to run in case of an early exit (#6269)
                disown >/dev/null 2>&1
            end
        end

I found the zombie maker piece is in the __fish_config_interactive.fish as above.
So conclusion:
If there is no generated_completions and create_manpage_completions.py, the fish shell will spawn a zombie python on startup.

@faho
Copy link
Member

faho commented Jul 9, 2020

Ah, okay.

So the conclusion is:

  • fish starts python, disowns it so you don't become notified when it ends, reparenting it to PID 1
  • python doesn't manage to actually start the script because your distro broke it, so it exits
  • you don't have an init system that does its minimum job (reaping children), so python stays a zombie

Tbh I'm tempted to call this a distro bug, but then the combination with docker that makes this annoying would make them not want to handle it as well.

There's little we can do about python turning into a zombie when it does something (get an actual init system please), but we can check if the script exists so it doesn't happen again and again and again.

@faho faho closed this as completed in a40c82d Jul 9, 2020
@zanchey
Copy link
Member

zanchey commented Jul 10, 2020

I don't think what's happening? PID 1 is only supposed to clean up orphaned zombie processes (where the parent has exited), but these processes have a parent - fish - that is not reaping them properly. docker-init is an "actual" init as far as this is concerned.

@faho
Copy link
Member

faho commented Jul 10, 2020

fish - that is not reaping them properly

We disown them, which means we entirely forget about them. They should be reaped by init after we exit, I was under the impression that wasn't happening. If it is there is no problem (but not starting useless python processes is still better).

@zanchey
Copy link
Member

zanchey commented Jul 10, 2020

Disowning removes them from the job list, but (because of #5342) fish keeps a list of disowned processes and occasionally calls waitpid. fish has not exited yet, which is why they haven't been reparented to init.

You can produce zombies by running sleep 3 &; disown, then not touching that terminal - the handler only runs every few commands. Note that this is not what's happening above, as far as I can tell; there's been other commands run which should have triggered the handler.

However, it looks like I can reproduce this by putting sleep 1 &; disown into my config.fish. I can't get a debug build at present so can't quite tell what's going on, but I wonder if the disowned jobs are never put into the disowned list before going interactive?

@zanchey
Copy link
Member

zanchey commented Jul 10, 2020

(also, bash calls waitpid when receiving SIGCHLD, which might be an easier way of dealing with zombie children)

@zanchey
Copy link
Member

zanchey commented Jul 10, 2020

Oh no! PGID strikes again!

disown's disown_job doesn't add PIDs to the s_disowned_pids list (for which the comment is a house of lies; it will never be anything but PGIDs). It passes the PGID to add_disowned_pgid, but because the process group for all jobs started in config.fish is fish's process group, it immediately gets dropped on the floor.

I think the answer here is to make remove_disowned_jobs add the PIDs to s_disowned_pids, but getting the mechanics right (if both PID and PGID are in the list, then not waiting on IDs that don't exist) is proving a bit tricky.

@zanchey zanchey added bug Something that's not working as intended and removed question labels Jul 10, 2020
@zanchey zanchey added this to the fish-future milestone Jul 10, 2020
@zanchey zanchey reopened this Jul 10, 2020
zanchey added a commit to zanchey/fish-shell that referenced this issue Jul 10, 2020
add_disowned_pgid skips jobs that have a PGID equal to the running
process. However, this includes processes started in config.fish or when
job control is turned off, so they never get waited on.

Instead, add all the PIDs of the job to the list of disowned PIDs/PGIDs
as well.

Fixes fish-shell#7183.
zanchey added a commit to zanchey/fish-shell that referenced this issue Jul 14, 2020
add_disowned_pgid skipped jobs that have a PGID equal to the running
process. However, this includes processes started in config.fish or when
job control is turned off, so they never get waited on.

Instead, refactor this function to add_disowned_job, and add either the PGID or
all the PIDs of the job to the list of disowned PIDs/PGIDs.

Fixes fish-shell#7183.
zanchey added a commit to zanchey/fish-shell that referenced this issue Jul 16, 2020
add_disowned_pgid skipped jobs that have a PGID equal to the running
process. However, this includes processes started in config.fish or when
job control is turned off, so they never get waited on.

Instead, refactor this function to add_disowned_job, and add either the PGID or
all the PIDs of the job to the list of disowned PIDs/PGIDs.

Fixes fish-shell#7183.
mqudsi pushed a commit to zanchey/fish-shell that referenced this issue Jul 26, 2020
add_disowned_pgid skipped jobs that have a PGID equal to the running
process. However, this includes processes started in config.fish or when
job control is turned off, so they never get waited on.

Instead, refactor this function to add_disowned_job, and add either the PGID or
all the PIDs of the job to the list of disowned PIDs/PGIDs.

Fixes fish-shell#7183.
mqudsi added a commit that referenced this issue Jul 26, 2020
Disown PIDs as well as PGIDs

Closes #7183
@zanchey zanchey modified the milestones: fish-future, fish 3.2.0 Jul 26, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something that's not working as intended
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants