-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor of the Agent and the source loading #6632
Conversation
1f813a1
to
c88b4f6
Compare
logstash-core/lib/logstash/errors.rb
Outdated
class ConfigLoadingError < Error; end | ||
class InvalidSourceLoaderSettingError < Error; end | ||
class PipelineActionError < Error; end | ||
class NonReloadablePipelineError < StandardError; end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will clean these errors, I do not use them anymore.
@@ -0,0 +1,402 @@ | |||
# encoding: utf-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be split into smaller file maybe?
Also for the reviewer pay attention to the |
I see already one failure for the |
I've pushed a fix to help with the flaky test. |
337f0ea
to
c942213
Compare
begin | ||
logger.debug("Executing action", :action => action) | ||
action_result = action.execute(@pipelines) | ||
converge_result.add(action, action_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could probably add a logger statement here when the action fail
. I have mixed feeling if we should go with debug
or error
,
I tried running a simple pipeline and got an error:
|
I tested how auto reload reacts to a broken configuration, and the logging isn't very useful: I started logstash with a good configuration:
and after a few events I changed it to:
And the output is:
|
Adding a comment to the end of the config breaks the loading of the pipeline:
on logstash 5.2.1:
on this PR:
|
reload metrics seem to be broken:
while on 5.2.1:
|
It seems most pipeline actions are assuming the main pipeline id instead of the configured one. I changed the
Same during shutdown:
|
it seems to be possible to start a pipeline that is not reloadable with -r
in logstash 5.2.1:
in this PR:
|
the --config.debug flag is now ignored even when using --log.level=debug, which means there is no way to see the configuration that logstash is loading. Example run in this gist |
Thanks for catching all theses issues, I will check the suite to see if are covering them. |
The reload metrics they seems to work on my ends, what configuration were you using? {
"last_error": null,
"successes": 0,
"last_success_timestamp": null,
"last_failure_timestamp": null,
"failures": 0
} {
"last_error": null,
"successes": 1,
"last_success_timestamp": "2017-03-14T21:37:27.788Z",
"last_failure_timestamp": null,
"failures": 0
} also failures also update the metrics. {
"last_error": {
"message": "Something is wrong with your configuration.",
"backtrace": [
"/Users/ph/es/logstash/logstash-core/lib/logstash/config/mixin.rb:130:in `config_init'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/outputs/base.rb:63:in `initialize'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/output_delegator_strategies/shared.rb:3:in `initialize'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/output_delegator.rb:19:in `initialize'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/pipeline.rb:95:in `plugin'",
"(eval):12:in `initialize'",
"org/jruby/RubyKernel.java:1079:in `eval'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/pipeline.rb:64:in `initialize'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/pipeline_action/reload.rb:30:in `execute'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/agent.rb:292:in `converge_state'",
"org/jruby/RubyArray.java:1613:in `each'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/agent.rb:277:in `converge_state'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/agent.rb:150:in `converge_state_and_update'",
"org/jruby/ext/thread/Mutex.java:149:in `synchronize'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/agent.rb:148:in `converge_state_and_update'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/agent.rb:101:in `execute'",
"org/jruby/RubyProc.java:281:in `call'",
"/Users/ph/es/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/interval.rb:18:in `interval'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/agent.rb:90:in `execute'",
"/Users/ph/es/logstash/logstash-core/lib/logstash/runner.rb:297:in `execute'",
"org/jruby/RubyProc.java:281:in `call'",
"/Users/ph/es/logstash/vendor/bundle/jruby/1.9/gems/stud-0.0.22/lib/stud/task.rb:24:in `initialize'"
]
},
"successes": 1,
"last_success_timestamp": "2017-03-14T21:37:27.788Z",
"last_failure_timestamp": "2017-03-14T21:38:00.693Z",
"failures": 1
} |
For that, until we have the changes for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay - test pass for me.
Minor comments aside, Outstanding work PH!!
LGTM
logger.debug("Count not fetch the configuration to converge, will retry", :message => results.error, :retrying_in => @reload_interval) | ||
return | ||
else | ||
raise "Count not fetch the configuration, message: #{results.error}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "Could not ..." and line137 too?
java_version = LogStash::Util::JavaVersion.version | ||
|
||
if LogStash::Util::JavaVersion.bad_java_version?(java_version) | ||
raise LogStash::BootstrapCheckError, "Java version 1.8.0 or later is required. (You are running: #{java_version})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me that we should use a function like LogStash::Util::JavaVersion.good_java_version
to not hard code the "Java version 1.8.0 or later..." here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the actual code that raised the exection inside the JavaVersion module and call validate_java_version!
instead, this make all the logic encapsulated in the same file and the bootstrap_check is just a thin wrapper around that.
class DefaultConfig | ||
def self.check(settings) | ||
if settings.get("config.string").nil? && settings.get("path.config").nil? | ||
raise LogStash::BootstrapCheckError, I18n.t("logstash.runner.missing-configuration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will Joao's comment be affected if we use pipelines.yml
?
end | ||
|
||
def <=>(other) | ||
(ORDERING.index(self.class) <=> ORDERING.index(other.class)).nonzero? || self.pipeline_id <=> other.pipeline_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to return -1, 0, 1. e.g.
(ORDERING.index(self.class) <=> ORDERING.index(other.class)).nonzero? ? self.pipeline_id <=> other.pipeline_id : 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't need to check the pipeline_id
, since we cannot execute 2 differents action for the same pipeline_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take that back, it make cleaner debug.
5d23803
to
3b62df2
Compare
@guyboertje @jsvd I've pushed the changes from the review and rebased the PR with master. |
are the integration ci failures related to the bug you saw with sudo? |
@jsvd Not at all, I think they are in part related to #6754, I am stress testing them locally on a linux vm. For now, I don't think we have a big problem in this PR. Yesterday, the PR was green then I've pushed some small changes and it when back to red, the changes shouldn't have any effect on the failling tests. |
LGTM, squash and merge 🤗 🚢 🏆 |
As a note, I am able to get some of the integration failures locally on a linux container. |
@jsvd I take my last comment back, one of the integration test is failing for a good reason.
This behavior is a bit strange if you take 5.X, (/tmp/foobar doesn't exist)
The behavior in this branch is: You provided settings that don't work and we fails. Since we target 6.0, WDYT? I more into make it fails, but I wonder if there any negative side effect. What's strange, this test work in some run. :( |
I think it's pretty easy to say that it should fail in the case of I'm ok with failing if there are no files even if -e is present 👍 |
@jsvd @ph that's not gonna work when we install via packages. We've seen in the past that users install LS via deb/rpm and proceed to start it from the binary install location (and not using service scripts). When this happens,
This test encapsulated the behavior I described ^^ |
Can you check what's the behavior if |
I've fixed the -e -f, in a8b4307 |
dce4931
to
4ca2603
Compare
I've rebased this monstrous PR, I have run the rats test on my linux box over the weekend and seems to only get the intermittent failures that you already working on fixing @jsvd. |
This PR introduces majors changes inside Logstash, to help build future features like supporting multiple pipeline or java execution. The previous implementation of the agent made the class hard to refactor or add new feature to it, because it needed to know about too much internal of the pipeline and how the configuration existed. The changes includes: - Externalize the loading of the configuration file using a `SourceLoader` - The source loader can support multiple sources and will aggregate them. - We keep some metadata about the original file so the LIR can give better feedback. - The Agent now ask the `SourceLoader` to know which configuration need to be run - The Agent now uses a converge state strategy to handle: start, reload, stop - Each actions executed on the pipeline are now extracted into their own classes to help with migration to a new pipeline execution. - The pipeline now has a start method that handle the tread - Better out of the box support for multiple pipeline (monitoring) - Refactor of the spec of the agent
3e22fe4
to
f139489
Compare
Pier-Hugues Pellerin merged this into the following branches!
|
Note for reviewers:
This PR includes both the changes of the source loading to make it a bit more flexible and to allow universal plugin to add new source.
ref: #6514 and #6620
There is a lot a new code for the following reasons:
agent_spec
test.Our current implementation of the Agent in Logstash does a bit too much:
This PR attempt to clean up a bit of the execution model and how things are linked together, (3 -> 7)
In this code, the agent never interact directly with the configuration file, instead it uses a
SourceLoader
class. For the Agent, the SourceLoader is the source of truth.if its not returned by the loader, the agent should not run it.
Testing Strategy
I tried to do minimum changes to the current
agent_spec
file, but I've decided to create 2 other spec with using less mocking and threading.Scenario with auto-reload on
[:main, :heavy_work, :foobar]
[]
StateResolver
how to converge to the required pipelines, the following actions are scheduled:[:heavy_work(config_hash changed), :foobar]
[:main, :heavy_work, :foobar]
StateResolver
how to converge to the required pipelines, the following actions are scheduled:A normal scenario without auto reload
[:main, :heavy_work, :foobar]
[]
StateResolver
how to converge to the required pipelines, the following actions are scheduled:Both of theses scenario use the same path, the only difference is the refresh call when auto reload is on.
Each for the
Action
are encapsulated into their own class for easier testing and swapping of implementation, the first implementationtarget the current ruby pipeline, but this will allow us to experiment with a java pipeline and have a different interface.
This change allow us to test without using any threads in the
agent_spec
or in theaction
.