Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipe-based Context Switching - slave write failed: Broken pipe #1

Closed
GoogleCodeExporter opened this issue May 7, 2015 · 14 comments · Fixed by #39
Closed

Pipe-based Context Switching - slave write failed: Broken pipe #1

GoogleCodeExporter opened this issue May 7, 2015 · 14 comments · Fixed by #39

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. Execute the test harness with ./Run
2. Notice that while the Pipe-based Context Switching test is running, it
fails with the error message: "slave write failed: Broken pipe; aborting"
3. This only happens on certain platforms, as I don't see this occur on all
hardware/software configurations. But I have seen it happen on certain
Ubuntu releases.

I'm currently seeing this with Ubuntu Karmic derivative with the 2.6.30
kernel, i686, running on an ASUS EEE PC with INtel Atom CPU N280 @ 1.66ghz.

Original issue reported on code.google.com by kdlu...@gmail.com on 3 Nov 2009 at 12:00

@GoogleCodeExporter
Copy link
Author

Did you find a solution to this problem?  I'm experiencing it too, but only 
when run from the python subprocess module.

uname:  Linux brasil 2.6.28-18-generic #60-Ubuntu SMP Fri Mar 12 04:40:52 UTC 
2010 i686 GNU/Linux
cpu: GenuineIntel Pentium(R) Dual-Core CPU       T4200  @ 2.00GHz
distro: xubuntu jaunty
python: 2.6.2

Here's a code snippet showing how I'm running it:

        p = subprocess.Popen(cmd, shell=True)
        retval = p.wait()

and cmd looks like this:

/usr/local/bench/unixbench-5.1.2/Run context1 > 
/usr/local/bench/archive/2010-06-07_08:53:55/context1.out 2> 
/usr/local/bench/archive/2010-06-07_08:53:55/context1.err

stderr looks like this:

**********************************************
Run: "Pipe-based Context Switching": slave write failed: Broken pipe; aborting



Thanks!
Jacob

Original comment by jaco...@gmail.com on 7 Jun 2010 at 11:41

@GoogleCodeExporter
Copy link
Author

No, I haven't found a solution in UnixBench or in Python. So, to solve this 
issue, I implemented a work around using a java program. Here is the java 
source, and what I do is package this java code with UnixBench, and then 
compile it during the step where I compile UnixBench, and then simply call this 
java program from Python with subprocess.Popen() using a command of java 
runbench (my java file name). Just modify the String args command with how you 
want to run UnixBench, and this should work, assuming you have a java installed.

runbench.java:
==================================================
import java.io.*;
import java.util.*;


class StreamGobbler extends Thread {

  InputStream is;
  String type;

  StreamGobbler(InputStream is, String type) {
    this.is = is;
    this.type = type;
  }

  public void run() {
    try {
      InputStreamReader isr = new InputStreamReader(is);
      BufferedReader br = new BufferedReader(isr);
      String line=null;
      while((line = br.readLine()) != null)
        System.out.println(type + ">" + line);
    } catch (IOException ioe) {
        ioe.printStackTrace();
        }
  }
}

public class runbench extends Thread {

  static void runbench() {
    try {
      Runtime runtime = Runtime.getRuntime();
      String[] args = new String[] {"bash", "-c", "./Run -c 16"};
      Process p = runtime.exec(args);
      StreamGobbler errorGobbler = new StreamGobbler(p.getErrorStream(), "ERROR");
      StreamGobbler outputGobbler = new StreamGobbler(p.getInputStream(), "OUTPUT");
      errorGobbler.start();
      outputGobbler.start();
      int exitVal = p.waitFor();
      System.out.println("ExitValue: " + exitVal);
    }
  catch (Throwable t) {
    t.printStackTrace();
    }
  }

  public static void main(String[] args) throws IOException {
    runbench();
  }
}
===============================================================

Original comment by kdlu...@gmail.com on 7 Jun 2010 at 11:50

@gstrauss
Copy link
Collaborator

These are probably spurious errors at the end of the timing period when there is a race for the alarm to expire and deliver SIGALRM to the process (both in parent and child, who might still be trying to write() to the other). The solution is probably to ignore SIGPIPE and not to error out if write() fails with EPIPE.

gstrauss added a commit to gstrauss/byte-unixbench that referenced this issue Sep 18, 2016
Addresses "slave write failed: Broken pipe; aborting"

There are two processes that are alternating reading and writing
a sequence number of sizeof(unsigned long) size, which is 4 bytes
on 32-bit ILP32 ABI and 8 bytes on 64-bit LP64 ABI.  The read/write
passing of incrementing sequence number occurs in infinite loop
until an alarm signals each process.  There is a race condition
where a signal delivered to one process might close the pipes while
the second process was still attempting to read or write from the
pipes, and before the second process was interrupted with SIGALRM.

This patch fixes the race condition that occurs at the end of the
test run, after the first SIGALRM is delivered.

This patch does not address the paranoid possibility that read() or
write() of 4 or 8 bytes might theoretically be a partial read() or
write(), but that is extremely unlikely except in the case of a signal
being delivered, and the only signal expected is SIGALRM, and the
processing of SIGALRM by report() function does not return.  (This
patch adds code to ignore SIGPIPE, so SIGALRM is the only expected
signal.)

github: fixes kdlucas#1
@introlive
Copy link

I faced the same issue several weeks ago with the newest version of UnixBench and kernel and trying to find a solution for it.
I was using unixbench in a self-written systemd service in AmazonLinux2 Kernel 5.10 , it randomly report the same errors as mentioned in this thread.

After a long troubleshooting , i finally found the root cause :
When putting unixbench in systemd or cron , there is a default configuration [IgnoreSIGPIPE=true] which caused the pipe context switching not able to terminal the pipe normally which lead to this error.

Solution :
Adding [IgnoreSIGPIPE=false] to corresponding systemd configuration file.

It works perfectly in my case , hope it can help more people.

@gstrauss
Copy link
Collaborator

@introlive thanks for that note.

I'll look to add some code to reset the signal for SIGPIPE, just in case it was inherited with disposition SIG_IGN

@gstrauss
Copy link
Collaborator

@introlive if you're running the latest UnixBench from this repo, it includes #39, which I committed in 2016. That patch intentionally sets signal(SIGPIPE, SIG_IGN); and checks for errno == EPIPE.

Would you please provide more details including exactly what error message you receive? (Saying "same issue" is vague.)
"... which caused the pipe context switching not able to terminal the pipe normally which lead to this error." What is "this error" to which you are referring?

Is the program exiting early for you? I can see that the code could handle EINTR better, but the program is not expecting signals besides running as quickly as it can until it receives SIGALRM from the alarm() that it set.

At the moment, assuming you are running the latest UnixBench code from this git repo, I do not see how or why "Adding [IgnoreSIGPIPE=false] to corresponding systemd configuration file." will make any difference, since SIGPIPE is already explicitly ignored inside src/context1.c.

@introlive
Copy link

@gstrauss
When i said 'same issue/this error' , i am referring the subject "Pipe-based Context Switching - slave write failed: Broken pipe" ,when i was trying to run UnixBench in a systemd script , it pop up this message in log and exit unixbench after that.

Unixbench runs smoothly when i used bash every time , but randomly having "Pipe-based Context Switching - slave write failed: Broken pipe" error when running in systemd.

In systemd , the default behavior is IgnoreSIGPIPE=True if we don't define it , so in my case it's mainly a systemd issue rather than unixbench's , but since the error message is exactly the same as this subject , i raised it here for reference.
systemd-cron/systemd-cron#38

@gstrauss
Copy link
Collaborator

You can see signal(SIGPIPE, SIG_IGN); at https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/context1.c#L59

SIG_IGN signal disposition should be inherited by the child process. Is that not happening for you when running under systemd? Ignoring SIGPIPE must be happening, or else the program would exit due to SIGPIPE were SIGPIPE not caught and then handled or ignored.

==> Since you are getting the error "slave write failed: Broken pipe", the program did not die due to SIGPIPE; SIGPIPE was ignored.

https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/context1.c#L114

                        if ((ret = write(p2[1], (char *)&iter, sizeof(iter))) != sizeof(check)) {
                                if ((ret == -1) && (errno == EPIPE)) {
                                        alarm(0);
                                        report(); /* does not return */
                                }
                                if ((ret == -1) && (errno != 0) && (errno != EINTR))
                                        perror("slave write failed");
                                exit(1);
                        }

"Broken pipe" is EPIPE. You're getting the error "slave write failed: Broken pipe" (which is EPIPE), so ret == -1. Why isn't the condition directly above not calling report() and exiting?

Would you modify the code
perror("slave write failed");
to instead be
fprintf(stderr, "slave write failed %d != EPIPE (%d)\n", errno, EPIPE);
and then recompile unixbench and reproduce the error?

@introlive
Copy link

introlive commented Apr 23, 2023

@gstrauss
Sorry for the delayed response , i just have the chance to do the test you recommended. Below is the result after i edited context1.c , recompile unixbench and remove the line [IgnoreSIGPIPE=false] in systemd service file. The issue reoccurred.

########################################################
Pipe-based Context Switching -- 2 copies
==> "/opt/UnixBench/pgms/context1" 10 2>&1 >> "/opt/UnixBench/results/ip-******.ec2.internal-2023-04-23-03.log"

#### Pass 1


# COUNT0: 1477877
# COUNT1: 1
# COUNT2: lps
# elapsed: 10.006547
# pid: 13748
# status: 0

# COUNT0: 1476783
# COUNT1: 1
# COUNT2: lps
# ERROR: slave write failed 32 != EPIPE (32)
# elapsed: 10.004814
# pid: 13749
# status: 0

After this error "slave write failed 32 != EPIPE (32)" UnixBench exited immediately and won't proceed the rest tests.

@gstrauss
Copy link
Collaborator

Are you compiling UnixBench on the same exact machine on which you are running it? If not, you should.

If you are not doing so, then you should modify the Makefile to avoid using -march=native -mtune=native.

Try compiling with OPTON = -O3 -ffast-math and with no other additions to optimizer flags.

@introlive
Copy link

@gstrauss i compiled it on the same box that running it by default make command without any addition flags.

@gstrauss
Copy link
Collaborator

@gstrauss i compiled it on the same box that running it by default make command without any addition flags.

make OPTON="-O3 -ffast-math"

@introlive
Copy link

introlive commented Apr 26, 2023

@gstrauss Almost the same output but the error came out in Pass 2.

########################################################
Pipe-based Context Switching -- 2 copies
==> "/opt/UnixBench/pgms/context1" 10 2>&1 >> "/opt/UnixBench/results/ip-******.ec2.internal-2023-04-26-01.log"

#### Pass 1


# COUNT0: 1465693
# COUNT1: 1
# COUNT2: lps
# elapsed: 10.004290
# pid: 9844
# status: 0

# COUNT0: 1473294
# COUNT1: 1
# COUNT2: lps
# elapsed: 10.007370
# pid: 9845
# status: 0

#### Pass 2


# COUNT0: 1473899
# COUNT1: 1
# COUNT2: lps
# elapsed: 10.005248
# pid: 9859
# status: 0

# COUNT0: 1483661
# COUNT1: 1
# COUNT2: lps
# ERROR: slave write failed 32 != EPIPE (32)
# elapsed: 10.007729
# pid: 9860
# status: 0

@gstrauss
Copy link
Collaborator

make OPTON="-O0" (That is a capital 'o' followed by a zero '0')

# ERROR: slave write failed 32 != EPIPE (32)
The code should not be reached there since 32 == 32. That code should have exited in the call to report() in the if condition block directly above that block in the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants