Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drainer: exit with non-zero code for some errors #1012

Merged
merged 6 commits into from
Nov 22, 2020

Conversation

csuzhangxc
Copy link
Member

What problem does this PR solve?

fix #1010

fix #1011

What is changed and how it works?

exit with non-zero code if s.syncer.Start or s.heartbeat returned errors.

Check List

Tests

Code changes

  • Has exported function/method change

Side effects

Related changes

  • Need to cherry-pick to the release branch
  • Need to be included in the release note

Release note

  • exit drainer process with non-zero code if upstream PD is down or apply DDL/DML to downstream failed

@csuzhangxc
Copy link
Member Author

/run-all-tests

1 similar comment
@csuzhangxc
Copy link
Member Author

/run-all-tests

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

@@ -287,6 +292,7 @@ func (s *Server) Start() error {
defer func() { go s.Close() }()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's race here and

	go func() {
		err2 := <-bs.Errors()
		log.Error("drainer reported error", zap.Error(err2))
		bs.Close()
		os.Exit(2)
	}()

both close server and if later one gets called later it may return and exit before the first one finishes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about just make Server.Start() return the the error of s.syncer.Start() if it returns error

@@ -269,6 +273,7 @@ func (s *Server) Start() error {
go func() {
for err := range errc {
log.Error("send heart failed", zap.Error(err))
s.reportError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems no need to quit once it failed to update the statue here, the replication can still work.
how about only failed and quit it failed to update status for a long time?

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

2 similar comments
@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

@csuzhangxc
Copy link
Member Author

@july2993 PTAL again

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

@csuzhangxc csuzhangxc merged commit 83984f4 into pingcap:master Nov 22, 2020
@csuzhangxc csuzhangxc deleted the drainer-failure branch November 22, 2020 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants