Skip to content

Conversation

@mariomac
Copy link

Some nodes, such as the GRPC+Protobuf server, are not very friendly to shutdown signals from Kubernetes, and may take up to 1 minute before Kubernetes eventually kills the pods. This makes undeployments and redeployments to be very slow.

This PR captures the SIGTERM signals and stops the main goroutine when they are received. The pipeline runs in background so it is automatically stopped when the main goroutine ends.

@mariomac mariomac requested review from KalmanMeth and eranra April 22, 2022 12:52
@codecov-commenter
Copy link

Codecov Report

Merging #188 (717de58) into main (6eea033) will decrease coverage by 0.10%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main     #188      +/-   ##
==========================================
- Coverage   58.48%   58.38%   -0.11%     
==========================================
  Files          58       58              
  Lines        3365     3371       +6     
==========================================
  Hits         1968     1968              
- Misses       1268     1274       +6     
  Partials      129      129              
Flag Coverage Δ
unittests 58.38% <0.00%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/flowlogs-pipeline/main.go 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6eea033...717de58. Read the comment docs.

@eranra
Copy link
Collaborator

eranra commented Apr 24, 2022

@mariomac thanks for adding this PR ... there is a centralized process to exit all threads in FLP ... the code is here: https://github.com/netobserv/flowlogs-pipeline/blob/main/pkg/pipeline/utils/exit.go#L46 can you use the RegisterExitChannel function here https://github.com/netobserv/flowlogs-pipeline/blob/main/pkg/pipeline/utils/exit.go#L34 so that when we get SIGTERM signal the pipeline will exit (as well as all other threads in the FLP process). This should be very similar to what you added just using that centralized process exit code

}()

// Subscribe to signals for terminating the program.
signal.Notify(stopper, os.Interrupt, syscall.SIGTERM)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be integrated to work with exiting all the other threads that exit when SIGINT or SIGTERM are received. See SetupElegantExit() in flowlogs-pipeline/pkg/pipeline/utils/exit.go

@mariomac
Copy link
Author

mariomac commented Apr 25, 2022

Thanks for the info, @eranra & @KalmanMeth ! I didn't realized this elegant shutdown mechanism.

I adapted the GRPC ingester to work with it and also changed the mechanism itself to use the standard practices in Golang:

Instead of a buffered bool channel, I created an unbuffered struct{} channel (the empty type, as the actual value is not imporant).

Also, the common practice when you want to use a channel as a barrier is not to send a message and read it, but close it. When you try to read a closed channel, the read operation will continue. This actually allows to share a single channel that is closed once and will allow the rest of instances to block until it is closed.

In this PR, I am still using one single channel for ingester, but in a later PR we can share the same channel for all the instances that want the elegant shutdown.

Copy link
Collaborator

@eranra eranra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mariomac ... thanks for improving the mechanism ... very nice and clean ;-)

@mariomac mariomac merged commit 43ca3a2 into netobserv:main Apr 25, 2022
@mariomac mariomac deleted the shutdown branch April 25, 2022 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants