-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for removal of agent configs #11
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go is still kinda foreign to me. Won't be able to a good review until I actually start contributing to this, but nothing obvious stood out.
Just left some random comments that I'll look at when online again or will chat to you.
EnsureRunning(ctx context.Context) | ||
// EnsureRunningState is called after installation of an agent and after each call to ProcessConfig | ||
// It must ensure the agent process is running if configs and executable are available | ||
// It must also ensure that that the process is stopped if no configuration remains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting. By agent, I assume you mean telegraf or something?
It seems fair enough, the only thing I wonder is mostly for Windows cases, SCOM often monitors for processes stopping and will alert automatically. I believe this is unexpected service terminations rather than a clean stop, but maybe worth keeping in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. One job of Envoy is to be an "agent manager" where telegraf and filebeat are the two agents supported so far.
This comment is referring to how the Envoy watches the agent's child process. The SCOM scenario is not quite related, but still an interesting one that we should implement with a general "process monitor".
return false | ||
} | ||
_, ok := err.(*noAppliedConfigsError) | ||
return ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to ask about this. Does err.(*noAppliedConfigsError)
only return true of false? seems odd if it does, and seems odd that ok
is the variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a type assertion which can result in either one value (and panic, if the wrong type) or two values where the second is a boolean indicating if the type assertion was "ok" (hence the naming convention).
Go is at least consistent in that the map indexing operator also supports an optional, second boolean result value and the convention is "ok" there too.
@@ -113,7 +128,7 @@ func downloadExtractTarGz(outputPath, url string, exePath string) error { | |||
|
|||
_, err = io.Copy(file, tarReader) | |||
if err != nil { | |||
file.Close() | |||
_ = file.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have to assign a variable anytime the function returns something?
is this just a best practice thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntelliJ was flagging this as a warning because the returned error was not being considered. Since errors on closing files tend to be uninteresting, I took the cheesy solution of explicitly ignoring the returned value with a blank identifier.
// Returns true if the configuration was applied | ||
func handleContentConfigurationOp(op *telemetry_edge.ConfigurationOp, configInstancePath string) bool { | ||
switch op.GetType() { | ||
case telemetry_edge.ConfigurationOp_CREATE, telemetry_edge.ConfigurationOp_MODIFY: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this how indentation is supposed to work with go switch statements? do we actually use a linter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than a linter I'd like for us to enable the "gofmt" option on commit. With that said, yeah, apparently this is the official indentation style for switch statements, but not what I personally prefer :).
case telemetry_edge.ConfigurationOp_REMOVE: | ||
err := os.Remove(configInstancePath) | ||
if err != nil { | ||
if os.IsNotExist(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check what this does. is it seeing if the error returned is a does not exist error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Since error
in Go is an opaque interface type the convention seems to be providing help functions like that one to test for the type of error. It's a little awkward but a consistent idiom.
@jjbuchan if you get a chance, a quick re-review of this would be good |
@jjbuchan sorry for the re-review request earlier...now it's really ready for a re-review |
What
Implements https://jira.rax.io/browse/SALUS-61 by supporting the "remove" operation within a configuration instruction initiated by the Ambassador.
How
Refactored the code in the filebeat and telegraf agent code into the new
handleContentConfigurationOp
function inagents.go
. The handling of the remove operation was also added into the combined method, which in turn removes the files named for the ID of the agent config.The existing
EnsureRunning
function ofSpecificAgentRunner
was renamed toEnsureRunningState
and generalized to support both starting and stopping of agents. Agents are stopped when no configs remain for the agent to run. FYI some agents, like telegraf, actually fail with an error when run without any input plugins configured.I also refactored some handling of the agent process out from the specific agent handling code and into
CommandHandler
. That was mainly to allow for further mocking of that interface during unit tests ofProcessConfig
andEnsureRunningState
rather than relying on executing real processes, etc.How to test
Why
n/a
TODO
none