Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Caught second signal incorrectly logging SIGTERM (15) for SIGPIPE #13067

Closed
wants to merge 1 commit into from

Conversation

shweshi
Copy link

@shweshi shweshi commented May 13, 2022

Description

This PR fixes the issue of incorrectly showing SIGTERM signal for SIGPIPE as well for Caught second signal in agent.

Problem Statement.

Here is the background of the issue:

If graceful shutdown is enabled for the agent. Agent will start the graceful shutdown on receiving SIGTERM.

{"@level":"info","@message":"Caught","@module":"agent","@timestamp":"2022-05-12T00:04:16.712683Z","signal":15}

While shutting down if it receives a SIGPIPE due to broken pipe.

{"@level":"error","@message":"failed to flush response","@module":"agent.server.raft","@timestamp":"2022-05-12T00:04:21.942772Z","error":"write tcp 192.168.55.0:8300-\u003e192.168.55.2:40833: write: broken pipe"}

It prints the following.

{"@level":"info","@message":"Caught second signal, Exiting","@module":"agent","@timestamp":"2022-05-12T00:04:21.943055Z","signal":15}

As the signal show in 15 (SIGTERM) which is incorrect and leads to confusion. The log is not printing the correct signal recieved.

Proposed Solution

The current code to handle the second signal

	case <-signalCh:
		c.logger.Info("Caught second signal, Exiting", "signal", sig)
		return 1

Here it is catching the new signal a second signal which is actually a broken pipe.
But the logger is printing the sig variable.
Sig variable is actually set in line number 287. So when printing here it takes up the old value.
The old sig variable value is 15 due to sigterm.
The new value won't be set and it will print the 15 and creating confusion.

In order to fix the problem we can catch the new signal and create a new variable secondSignal and assign the value to it. While logging use the secondSignal.

	// Create a new variable which will hold the new sigterm and print it in the logs.
	case s := <-signalCh:
		var secondSignal os.Signal  // Declare a new sig variable.
		secondSignal = s			// Assign the caught sigterm signal.
		c.logger.Info("Caught second signal, Exiting", "signal", secondSignal) // Print the caught signal.
		return 1

Thanks.

Testing & Reproduction steps

  • Start the consul cluster.
  • Enable graceful shutdown for consul agents.
  • Stop one of the consul agent and it will start graceful shutdown. When broken pipe error happens the logs shows incorrect signal 15 it should be 13 for SIGPIPE.

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern
  • checklist folder consulted

… (13)

### Introduction
This PR fixes the issue of incorrectly showing SIGTERM signal for SIGPIPE as well for Caught second signal in agent.

### Problem Statement.
Here is the background of the issue:

If graceful shutdown is enabled for the agent. Agent will start the graceful shutdown on receiving SIGTERM. 
```
{"@Level":"info","@message":"Caught","@module":"agent","@timestamp":"2022-05-12T00:04:16.712683Z","signal":15}
```
While shutting down if it receives a SIGPIPE due to broken pipe.
```
{"@Level":"error","@message":"failed to flush response","@module":"agent.server.raft","@timestamp":"2022-05-12T00:04:21.942772Z","error":"write tcp 192.168.55.0:8300-\u003e192.168.55.2:40833: write: broken pipe"}
```

It prints the following.
```
{"@Level":"info","@message":"Caught second signal, Exiting","@module":"agent","@timestamp":"2022-05-12T00:04:21.943055Z","signal":15}
```

As the signal show in 15 (SIGTERM) which is incorrect and leads to confusion. The log is not printing the correct signal recieved.

### Proposed Solution
The current code to handle the second signal
```
			case <-signalCh:
				c.logger.Info("Caught second signal, Exiting", "signal", sig)
				return 1
```
Here it is catching the new signal a second signal which is actually a broken pipe.
But the logger is printing the sig variable.
Sig variable is actually set in line number 287. So when printing here it takes up the old value.
The old sig variable value is 15 due to sigterm.
The new value won't be set and it will print the 15 and creating confusion.

In order to fix the problem we can catch the new signal and create a new variable secondSignal and assign the value to it. While logging use the secondSignal.
```
			// Create a new variable which will hold the new sigterm and print it in the logs.
			case s := <-signalCh:
				var sig2 os.Signal  // Declare a new sig variable.
				sig2 = s			// Assign the caught sigterm signal.
				c.logger.Info("Caught second signal, Exiting", "signal", sig2) // Print the caught signal.
				return 1
```

Thanks.
@hashicorp-cla
Copy link

hashicorp-cla commented May 13, 2022

CLA assistant check
All committers have signed the CLA.

@Amier3
Copy link
Contributor

Amier3 commented May 24, 2022

Hey @shweshi

Thanks for getting to the root of that issue & the associated PR. Would you be open to adding a test case for this in command/agent/agent_test.go ?

@shweshi
Copy link
Author

shweshi commented Jun 29, 2022

@Amier3 Hey, can you share some ref on how to go about writing tests case for this use case. I checked the existing tests cases and couldn't find a tests case for agent graceful shutdown or tests case for the code related to os signal processing.

As a general idea what i have thought is to:

  1. Start a test agent
  2. Send a SIGPIPE to running test agent
  3. Assert that proper log is printed and agent is shutdown.

A rough code example:

func TestSIGPIPE(t *testing.T) {
	a := agent.StartTestAgent(t, agent.TestAgent{
		HCL: `
			leave_on_terminate = true // enable graceful exit
		`,
	})

	defer a.Shutdown()

        // Send a sigpipe signal to agent

}

@github-actions
Copy link

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

@github-actions github-actions bot added the meta/stale Automatically flagged for inactivity by stalebot label Aug 29, 2022
@github-actions
Copy link

Closing due to inactivity. If you feel this was a mistake or you wish to re-open at any time in the future, please leave a comment and it will be re-surfaced for the maintainers to review.

@github-actions github-actions bot closed this Sep 29, 2022
@thefallentree
Copy link

Hi, can we resurrect this? it seems we do have a issue here

@shweshi
Copy link
Author

shweshi commented Nov 3, 2022

@thefallentree looks like the PR is closed now. This issue is still there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta/stale Automatically flagged for inactivity by stalebot
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants