-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supervision trees and test coverage #178
Conversation
5929227
to
b999ae0
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #178 +/- ##
===========================================
+ Coverage 47.64% 73.72% +26.08%
===========================================
Files 23 29 +6
Lines 976 1043 +67
===========================================
+ Hits 465 769 +304
+ Misses 511 274 -237 ☔ View full report in Codecov by Sentry. |
0b5ee86
to
fb78f93
Compare
fb78f93
to
c51b111
Compare
9799274
to
01eee4c
Compare
This serves two purposes: first, no process is left unsupervised, and second, solves the potential bottleneck a single 'gen_event' process could become at handling all the notifications.
01eee4c
to
0c9c96c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice PR, and it's cool to see that coverage has increased. but there are few things that must be addressed.
669259a
to
1fc070d
Compare
1fc070d
to
db869dc
Compare
4cccc76
to
200ff48
Compare
test/controller_SUITE.erl
Outdated
@@ -32,7 +32,9 @@ all_tests() -> | |||
stop_running_scenario_with_no_users_immediately_terminates, | |||
stop_running_scenario_with_users_stays_in_finished, | |||
stop_running_scenario_with_users_eventually_terminates, | |||
scenario_with_state_and_crashing_in_terminate_run_fine | |||
scenario_with_state_and_crashing_in_terminate_run_fine, | |||
scenario_without_start_never_runs_users, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please suggest some better name
amoc_telemetry:execute([coordinator, TelemetryEvent], #{count => 1}, #{name => Name}). | ||
|
||
-spec order_plan([normalized_coordination_item()]) -> [normalized_coordination_item()]. | ||
order_plan(Items) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a test for this function.
I must say that the version with lists:partition/2
was more verbose.
The main idea of this function is that it defines an order of execution, the rest code must ensure that the order is kept and not broken. all other comments are irrelevant here. internal implementation can change, but execution order must remain as defined here. also, with the latest changes we no longer rely on the childrens' order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we were doing before was respective the relative order given by the user between non-all
elements and all
elements, and simply moving the all
items to the end. It is the same effect now, so this new code sets an equivalent order that it was already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I know that it's the same, but lists:partition/2 version was easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, reverted then 👌🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the comment for order_plan/1 function:
The main idea of this function is that it defines an order of execution, the rest code must ensure that the order is kept and not broken. all other comments are irrelevant here. internal implementation can change, but execution order must remain as defined here. also, with the latest changes we no longer rely on the children's order.
I would suggest a comment like this:
This function defines the execution order of the events that must be processed synchronously (on reset/timeout/stop). we need to ensure that 'all' items are executed after 'non-all'.
Note that '{coordinate, _}' events are exceptional and their processing is done asynchronously.
The order of All plan items must be preserved as in the original plan, the same applies to NonAll items.
telemetry:attach_many(?HANDLER, TelemetryEvents, TelemetryHandler, ?CONFIG). | ||
|
||
stop() -> | ||
meck:reset(?HANDLER). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite a confusing name for the function. there should be a separate stop function and a separate reset function.
|
||
end_per_suite(_) -> | ||
application:stop(amoc), | ||
telemetry_helpers:stop(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently telemetry_helpers:stop()
doesn't unload meck module, also we want to reset meck module on init_per_testcase to ensure that test cases do not affect each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to. Test cases are running in parallel here, and also, when checking history we're filtering for the testcase name.
test/amoc_coordinator_SUITE.erl
Outdated
meck:unload(). | ||
|
||
init_per_testcase(_, Config) -> | ||
meck:reset(?MOCK_MOD), | ||
meck:reset(?TELEMETRY_HANDLER), | ||
telemetry_helpers:stop(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must be telemetry_helpers:reset()
application:stop(telemetry), | ||
Sup = ?config(sup, Config), | ||
Sup ! terminate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be nice to call telemetry_helpers:stop() here, even if we do meck:unload() later
test/throttle_SUITE.erl
Outdated
Rate =:= maps:get(rate, Metadata, undefined) andalso | ||
Interval =:= maps:get(interval, Metadata, undefined) | ||
end, | ||
lists:any(LowRateEvent, TelemetryEvents). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asserting is missing
test/throttle_SUITE.erl
Outdated
TelemetryEvents = telemetry_helpers:get_calls([amoc, throttle]), | ||
LowRateEvent = fun({EventName, Measurements, Metadata}) -> | ||
Name =:= EventName andalso | ||
1 =:= maps:get(Count, Measurements, 1) andalso |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we checking here equality to the default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking that the key exists in the map, which wtf there's a maps:is_key/2
, sorry xD
test/throttle_SUITE.erl
Outdated
%% Helpers | ||
assert_telemetry_event(Name, Count, Throttle, Rate, Interval) -> | ||
TelemetryEvents = telemetry_helpers:get_calls([amoc, throttle]), | ||
LowRateEvent = fun({EventName, Measurements, Metadata}) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename into IsLowRateEventFn
interval := 100, | ||
delay_between_executions := 50}, | ||
State), | ||
assert_telemetry_event([amoc, throttle, process], error, ?FUNCTION_NAME, 2, 100). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'error'
atom is supplied as a Count
parameter, this doesn't seem correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, that 'error' value for Count looks like a mistake at first glance. I would rather rename Count into a LogLevel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it to Measurement for consistency with the map.
interval := 1, | ||
delay_between_executions := 10}, | ||
State), | ||
assert_telemetry_event([amoc, throttle, process], error, ?FUNCTION_NAME, 1, 1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'error'
atom is supplied as a Count
parameter, this doesn't seem correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
4b363e2
to
6b17632
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR is really awsome.
This complex PR has a very important purpose: introspection.
I want to be able to visualise, using for example observer or Livebook's supervision trees Explorer, the architecture of my load test. All coordinators and throttle processes are always supervised, named, and can easily be found.
It also serves a very small purpose of performance, as the coordinator for example is now not a single gen_event with a list of handlers to run sequentially, but, a pool of servers each one with its own handler, that can run in parallel.
Coordinators now have a top-level dynamic supervisor, where upon
amoc_coordinator:start/N
, it is requested to start a new pool of coordinators, which are one per handler plus an extra one for timeouts. All these are properly supervised using aone_to_one
static spec and can be restarted automatically as well as viewed using introspection.Throttles now have a top-level static supervisor supervising the controller, the pg-group, and a dynamic pooler:
amoc_throttle:start/N
now requests the pooler to start a pool with the given specs, which is here now a static supervisor with the specs of the requested workers, which again will be restarted automatically and can be viewed using introspection.I also raised code coverage to these two subsystems using the occasion that I'm touching them :)