Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parent exit() vs. child setsid() race #27

Closed
dannf opened this issue Mar 18, 2016 · 1 comment · Fixed by #55
Closed

parent exit() vs. child setsid() race #27

dannf opened this issue Mar 18, 2016 · 1 comment · Fixed by #55
Labels

Comments

@dannf
Copy link

dannf commented Mar 18, 2016

Forwarding from http://bugs.launchpad.net/bugs/1558967
== Comment: #21 - Hendrik Brueckner - 2016-03-16 06:44:09 ==
Package: libfuse2
Version: 2.9.4-1ubuntu2

The cmsfs-fuse program is used to transfer files from a CMSFS dasd (on z/VM) to Linux. The procedure is to mount, copy files, umount. All commands are issued from within an application over an SSH connection.

The problem is that the copy intermittently fails with "Transport endpoint is not connected". The procedure is as follows:

#mount cmsfs
sudo /usr/bin/cmsfs-fuse /dev/dasdb /usr/wave/wavedisk

copy file

/bin/cp -f /usr/wave/wavedisk/WAVEDATA.SCRIPT /usr/wave/wavedata
/bin/cp: cannot stat '/usr/wave/wavedisk/WAVEDATA.SCRIPT': Transport endpoint is not connected
#umount
umount /usr/wave/wavedisk

Because the application uses JSCH to issue the commands, I worked on a non-Java reproducer using SSH.

The problem can be easily re-created with ssh as follows:

root@r3559004:# ssh -t root@localhost "cmsfs-fuse /dev/disk/by-path/ccw-0.0.0190 /CMSFS"
Connection to localhost closed.
root@r3559004:
# ls /CMSFS
ls: cannot access '/CMSFS': Transport endpoint is not connected

Problem analysis will follow but not that is not specific to cmsfs-fuse; the problem might also occur with other fuse file systems that are mounted through an SSH connection.

== Comment: #23 - Hendrik Brueckner - 2016-03-16 07:07:30 ==
After debugging and some code review on the libfuse library, I think that
we identified the root cause. As suggested, the problem is not related
to cmsfs-fuse directly.

The cmsfs-fuse main program calls into the libfuse library() using the
fuse_main() function. The fuse_main() function later calls the
fuse_daemonize() to fork the daemon process to handle the fuse file
system I/O.

The fuse_daemonize() look at follows:

180 int fuse_daemonize(int foreground)
181 {
182 if (!foreground) {
183 int nullfd;
184
185 /*
186 * demonize current process by forking it and killing the
187 * parent. This makes current process as a child of 'init'.
188 */
189 switch(fork()) {
190 case -1:
191 perror("fuse_daemonize: fork");
192 return -1;
193 case 0:
194 break;
195 default:
196 _exit(0);
197 }
198
199 if (setsid() == -1) {
200 perror("fuse_daemonize: setsid");
201 return -1;
202 }
203
204 (void) chdir("/");
205
206 nullfd = open("/dev/null", O_RDWR, 0);
207 if (nullfd != -1) {
208 (void) dup2(nullfd, 0);
209 (void) dup2(nullfd, 1);
210 (void) dup2(nullfd, 2);
211 if (nullfd > 2)
212 close(nullfd);
213 }
214 }
215 return 0;
216 }

The fuse_daemonize() function calls fork() as usual. The child proceeds with setsid() and then redirecting its file descriptors to /dev/null etc. The parent process, simply exits.

The child's functions and the parent's exit creates a subtle race. This is seen with an SSH connection. The SSH command "ssh -t root@localhost "cmsfs-fuse /dev/disk/by-path/ccw-0.0.0190 /CMSFS" calls the cmsfs-fuse on an allocated pseudo-terminal device (-t option).

If the parent exits, the SSH command receives that its command has been executed and closes the connection, that means, it closes the master side of the pseudo-terminal. This causes a HUP signal being sent to the process group on the pseudo-terminal. The child might not have completed the setsid() call and hence becomes terminated. Note that fuse sets up its signal handler later after fuse_daemonize() has complete.

Even if the child has the chance to disassociate from it's parent process group to become it's own process group with setsid(), the child still has the pseudo-terminal opened as stdin, stdout, and stderr. So the pseudo-terminal still behave as controlling terminal and might cause a SIGHUP to be issued at closing the the master side.

To solve the problem, the parent has to wait until the child (the fuse daemon process) has completed its processing, that means, has become its own process group with setsid() and closed any file descriptors pointing to the pseudo-terminal.

For example, using a pipe as follows could solve the problem:

The parent waits on the pipe, then exits:

read(waiter[0], &completed, sizeof(completed));
_exit(0);

The child signals its completion (after redirecting its file descriptors) with:
completed = 1;
write(waiter[1], &completed, sizeof(completed));

== Comment: #24 - Gerald Schaefer - 2016-03-16 08:18:20 ==
The race can also be triggered w/o ssh, by using "setsid -c", and I can also reproduce it w/o cmsfs-fuse but with sshfs:

root@s3545003:# setsid -c sshfs geraldsc@tuxmaker: sshfs/
root@s3545003:
# ls sshfs
ls: cannot access 'sshfs': Transport endpoint is not connected

@Nikratio Nikratio added the bug label Mar 18, 2016
@Nikratio
Copy link
Contributor

Thanks for the thorough analysis! Pull requests to implement this child-parent synchronization are welcome. As a stopgap measure for cmfs, I'd suggest to run the file system in the foreground (-f option to fuse_main).

hbrueckner added a commit to hbrueckner/libfuse that referenced this issue Jun 16, 2016
Mounting a FUSE file system remotely using SSH in combination with
pseudo-terminal allocation (-t), results in "Transport endpoint is
not connected" errors when trying to access the file system contents.

For example:

  # ssh -t root@localhost  "cmsfs-fuse /dev/disk/by-path/ccw-0.0.0190 /CMSFS"
  Connection to localhost closed.
  # ls /CMSFS
  ls: cannot access '/CMSFS': Transport endpoint is not connected

The cmsfs-fuse main program (which can also be any other FUSE file
system) calls into the fuse_main() libfuse library function.
The fuse_main() function later calls fuse_daemonize() to fork the
daemon process to handle the FUSE file system I/O.

The fuse_daemonize() function calls fork() as usual.  The child
proceeds with setsid() and then redirecting its file descriptors
to /dev/null etc.  The parent process, simply exits.

The child's functions and the parent's exit creates a subtle race.
This is seen with an SSH connection.  The SSH command above calls
cmsfs-fuse on an allocated pseudo-terminal device (-t option).

If the parent exits, SSH receives the command completion and closes
the connection, that means, it closes the master side of the
pseudo-terminal.  This causes a HUP signal being sent to the process
group on the pseudo-terminal.  At this point in time, the child might
not have completed the setsid() call and, hence, becomes terminated.
Note that fuse daemon sets up its signal handlers after fuse_daemonize()
has completed.

Even if the child has the chance to disassociate from its parent process
group to become it's own process group with setsid(), the child still
has the pseudo-terminal opened as stdin, stdout, and stderr.  So the
pseudo-terminal still behave as controlling terminal and might cause a
SIGHUP at closing the the master side.

To solve the problem, the parent has to wait until the child (the fuse
daemon process) has completed its processing, that means, has become
its own process group with setsid() and closed any file descriptors
pointing to the pseudo-terminal.

Closes: libfuse#27

Reported-by: Ofer Baruch <oferba@il.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Nikratio pushed a commit that referenced this issue Jun 20, 2016
Mounting a FUSE file system remotely using SSH in combination with
pseudo-terminal allocation (-t), results in "Transport endpoint is
not connected" errors when trying to access the file system contents.

For example:

  # ssh -t root@localhost  "cmsfs-fuse /dev/disk/by-path/ccw-0.0.0190 /CMSFS"
  Connection to localhost closed.
  # ls /CMSFS
  ls: cannot access '/CMSFS': Transport endpoint is not connected

The cmsfs-fuse main program (which can also be any other FUSE file
system) calls into the fuse_main() libfuse library function.
The fuse_main() function later calls fuse_daemonize() to fork the
daemon process to handle the FUSE file system I/O.

The fuse_daemonize() function calls fork() as usual.  The child
proceeds with setsid() and then redirecting its file descriptors
to /dev/null etc.  The parent process, simply exits.

The child's functions and the parent's exit creates a subtle race.
This is seen with an SSH connection.  The SSH command above calls
cmsfs-fuse on an allocated pseudo-terminal device (-t option).

If the parent exits, SSH receives the command completion and closes
the connection, that means, it closes the master side of the
pseudo-terminal.  This causes a HUP signal being sent to the process
group on the pseudo-terminal.  At this point in time, the child might
not have completed the setsid() call and, hence, becomes terminated.
Note that fuse daemon sets up its signal handlers after fuse_daemonize()
has completed.

Even if the child has the chance to disassociate from its parent process
group to become it's own process group with setsid(), the child still
has the pseudo-terminal opened as stdin, stdout, and stderr.  So the
pseudo-terminal still behave as controlling terminal and might cause a
SIGHUP at closing the the master side.

To solve the problem, the parent has to wait until the child (the fuse
daemon process) has completed its processing, that means, has become
its own process group with setsid() and closed any file descriptors
pointing to the pseudo-terminal.

Closes: #27

Reported-by: Ofer Baruch <oferba@il.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Nikratio pushed a commit that referenced this issue Jun 20, 2016
Mounting a FUSE file system remotely using SSH in combination with
pseudo-terminal allocation (-t), results in "Transport endpoint is
not connected" errors when trying to access the file system contents.

For example:

  # ssh -t root@localhost  "cmsfs-fuse /dev/disk/by-path/ccw-0.0.0190 /CMSFS"
  Connection to localhost closed.
  # ls /CMSFS
  ls: cannot access '/CMSFS': Transport endpoint is not connected

The cmsfs-fuse main program (which can also be any other FUSE file
system) calls into the fuse_main() libfuse library function.
The fuse_main() function later calls fuse_daemonize() to fork the
daemon process to handle the FUSE file system I/O.

The fuse_daemonize() function calls fork() as usual.  The child
proceeds with setsid() and then redirecting its file descriptors
to /dev/null etc.  The parent process, simply exits.

The child's functions and the parent's exit creates a subtle race.
This is seen with an SSH connection.  The SSH command above calls
cmsfs-fuse on an allocated pseudo-terminal device (-t option).

If the parent exits, SSH receives the command completion and closes
the connection, that means, it closes the master side of the
pseudo-terminal.  This causes a HUP signal being sent to the process
group on the pseudo-terminal.  At this point in time, the child might
not have completed the setsid() call and, hence, becomes terminated.
Note that fuse daemon sets up its signal handlers after fuse_daemonize()
has completed.

Even if the child has the chance to disassociate from its parent process
group to become it's own process group with setsid(), the child still
has the pseudo-terminal opened as stdin, stdout, and stderr.  So the
pseudo-terminal still behave as controlling terminal and might cause a
SIGHUP at closing the the master side.

To solve the problem, the parent has to wait until the child (the fuse
daemon process) has completed its processing, that means, has become
its own process group with setsid() and closed any file descriptors
pointing to the pseudo-terminal.

Closes: #27

Reported-by: Ofer Baruch <oferba@il.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Nikratio pushed a commit that referenced this issue Jun 20, 2016
Mounting a FUSE file system remotely using SSH in combination with
pseudo-terminal allocation (-t), results in "Transport endpoint is
not connected" errors when trying to access the file system contents.

For example:

  # ssh -t root@localhost  "cmsfs-fuse /dev/disk/by-path/ccw-0.0.0190 /CMSFS"
  Connection to localhost closed.
  # ls /CMSFS
  ls: cannot access '/CMSFS': Transport endpoint is not connected

The cmsfs-fuse main program (which can also be any other FUSE file
system) calls into the fuse_main() libfuse library function.
The fuse_main() function later calls fuse_daemonize() to fork the
daemon process to handle the FUSE file system I/O.

The fuse_daemonize() function calls fork() as usual.  The child
proceeds with setsid() and then redirecting its file descriptors
to /dev/null etc.  The parent process, simply exits.

The child's functions and the parent's exit creates a subtle race.
This is seen with an SSH connection.  The SSH command above calls
cmsfs-fuse on an allocated pseudo-terminal device (-t option).

If the parent exits, SSH receives the command completion and closes
the connection, that means, it closes the master side of the
pseudo-terminal.  This causes a HUP signal being sent to the process
group on the pseudo-terminal.  At this point in time, the child might
not have completed the setsid() call and, hence, becomes terminated.
Note that fuse daemon sets up its signal handlers after fuse_daemonize()
has completed.

Even if the child has the chance to disassociate from its parent process
group to become it's own process group with setsid(), the child still
has the pseudo-terminal opened as stdin, stdout, and stderr.  So the
pseudo-terminal still behave as controlling terminal and might cause a
SIGHUP at closing the the master side.

To solve the problem, the parent has to wait until the child (the fuse
daemon process) has completed its processing, that means, has become
its own process group with setsid() and closed any file descriptors
pointing to the pseudo-terminal.

Closes: #27

Reported-by: Ofer Baruch <oferba@il.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Nikratio pushed a commit that referenced this issue Jun 20, 2016
Mounting a FUSE file system remotely using SSH in combination with
pseudo-terminal allocation (-t), results in "Transport endpoint is
not connected" errors when trying to access the file system contents.

For example:

  # ssh -t root@localhost  "cmsfs-fuse /dev/disk/by-path/ccw-0.0.0190 /CMSFS"
  Connection to localhost closed.
  # ls /CMSFS
  ls: cannot access '/CMSFS': Transport endpoint is not connected

The cmsfs-fuse main program (which can also be any other FUSE file
system) calls into the fuse_main() libfuse library function.
The fuse_main() function later calls fuse_daemonize() to fork the
daemon process to handle the FUSE file system I/O.

The fuse_daemonize() function calls fork() as usual.  The child
proceeds with setsid() and then redirecting its file descriptors
to /dev/null etc.  The parent process, simply exits.

The child's functions and the parent's exit creates a subtle race.
This is seen with an SSH connection.  The SSH command above calls
cmsfs-fuse on an allocated pseudo-terminal device (-t option).

If the parent exits, SSH receives the command completion and closes
the connection, that means, it closes the master side of the
pseudo-terminal.  This causes a HUP signal being sent to the process
group on the pseudo-terminal.  At this point in time, the child might
not have completed the setsid() call and, hence, becomes terminated.
Note that fuse daemon sets up its signal handlers after fuse_daemonize()
has completed.

Even if the child has the chance to disassociate from its parent process
group to become it's own process group with setsid(), the child still
has the pseudo-terminal opened as stdin, stdout, and stderr.  So the
pseudo-terminal still behave as controlling terminal and might cause a
SIGHUP at closing the the master side.

To solve the problem, the parent has to wait until the child (the fuse
daemon process) has completed its processing, that means, has become
its own process group with setsid() and closed any file descriptors
pointing to the pseudo-terminal.

Closes: #27

Reported-by: Ofer Baruch <oferba@il.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Nikratio pushed a commit that referenced this issue Jun 20, 2016
Mounting a FUSE file system remotely using SSH in combination with
pseudo-terminal allocation (-t), results in "Transport endpoint is
not connected" errors when trying to access the file system contents.

For example:

  # ssh -t root@localhost  "cmsfs-fuse /dev/disk/by-path/ccw-0.0.0190 /CMSFS"
  Connection to localhost closed.
  # ls /CMSFS
  ls: cannot access '/CMSFS': Transport endpoint is not connected

The cmsfs-fuse main program (which can also be any other FUSE file
system) calls into the fuse_main() libfuse library function.
The fuse_main() function later calls fuse_daemonize() to fork the
daemon process to handle the FUSE file system I/O.

The fuse_daemonize() function calls fork() as usual.  The child
proceeds with setsid() and then redirecting its file descriptors
to /dev/null etc.  The parent process, simply exits.

The child's functions and the parent's exit creates a subtle race.
This is seen with an SSH connection.  The SSH command above calls
cmsfs-fuse on an allocated pseudo-terminal device (-t option).

If the parent exits, SSH receives the command completion and closes
the connection, that means, it closes the master side of the
pseudo-terminal.  This causes a HUP signal being sent to the process
group on the pseudo-terminal.  At this point in time, the child might
not have completed the setsid() call and, hence, becomes terminated.
Note that fuse daemon sets up its signal handlers after fuse_daemonize()
has completed.

Even if the child has the chance to disassociate from its parent process
group to become it's own process group with setsid(), the child still
has the pseudo-terminal opened as stdin, stdout, and stderr.  So the
pseudo-terminal still behave as controlling terminal and might cause a
SIGHUP at closing the the master side.

To solve the problem, the parent has to wait until the child (the fuse
daemon process) has completed its processing, that means, has become
its own process group with setsid() and closed any file descriptors
pointing to the pseudo-terminal.

Closes: #27

Reported-by: Ofer Baruch <oferba@il.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
spectral54 pushed a commit to spectral54/libfuse that referenced this issue Sep 14, 2022
When initializing a workspace that shares its working copy with a Git
repo (i.e. `jj init --git-repo=.`), we import refs and HEAD when
creating the `WorkspaceCommandHelper` (as we do for all commands when
the working copy is shared). That makes the explicit import we do in
`cmd_init()` unnecessary. It also makes the checkout of HEAD I added
for the fix of libfuse#102 unnecessary. More importantly, as @yuja reported
in libfuse#177, it makes the command crash (at least if the repo is small
enough that the two checkouts happen within a second). I think the
problem is that the second checkout tries to create the same commit
except that the Change ID is different (the problem is not the
predecessors as I speculated in the issue tracker). The fix is to
simply avoid doing the redundant work. We still need a proper fix for
libfuse#27 eventually.

Closes libfuse#177.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants