Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervision trees and test coverage #178

Merged
merged 23 commits into from
Jan 15, 2024
Merged

Supervision trees and test coverage #178

merged 23 commits into from
Jan 15, 2024

Conversation

NelsonVides
Copy link
Collaborator

@NelsonVides NelsonVides commented Dec 31, 2023

This complex PR has a very important purpose: introspection.

I want to be able to visualise, using for example observer or Livebook's supervision trees Explorer, the architecture of my load test. All coordinators and throttle processes are always supervised, named, and can easily be found.

It also serves a very small purpose of performance, as the coordinator for example is now not a single gen_event with a list of handlers to run sequentially, but, a pool of servers each one with its own handler, that can run in parallel.


Coordinators now have a top-level dynamic supervisor, where upon amoc_coordinator:start/N, it is requested to start a new pool of coordinators, which are one per handler plus an extra one for timeouts. All these are properly supervised using a one_to_one static spec and can be restarted automatically as well as viewed using introspection.

Throttles now have a top-level static supervisor supervising the controller, the pg-group, and a dynamic pooler: amoc_throttle:start/N now requests the pooler to start a pool with the given specs, which is here now a static supervisor with the specs of the requested workers, which again will be restarted automatically and can be viewed using introspection.


I also raised code coverage to these two subsystems using the occasion that I'm touching them :)

@NelsonVides NelsonVides force-pushed the supervision_trees branch 6 times, most recently from 5929227 to b999ae0 Compare December 31, 2023 09:55
@codecov-commenter
Copy link

codecov-commenter commented Dec 31, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (33cce56) 47.64% compared to head (6b17632) 73.72%.

Files Patch % Lines
src/throttle/amoc_throttle_controller.erl 87.50% 5 Missing ⚠️
src/coordinator/amoc_coordinator_sup.erl 88.46% 3 Missing ⚠️
src/coordinator/amoc_coordinator.erl 95.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master     #178       +/-   ##
===========================================
+ Coverage   47.64%   73.72%   +26.08%     
===========================================
  Files          23       29        +6     
  Lines         976     1043       +67     
===========================================
+ Hits          465      769      +304     
+ Misses        511      274      -237     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@NelsonVides NelsonVides force-pushed the supervision_trees branch 2 times, most recently from 0b5ee86 to fb78f93 Compare December 31, 2023 13:41
@NelsonVides NelsonVides changed the title Supervision trees Supervision trees and test coverage Dec 31, 2023
@NelsonVides NelsonVides marked this pull request as ready for review December 31, 2023 13:51
@NelsonVides NelsonVides force-pushed the supervision_trees branch 4 times, most recently from 9799274 to 01eee4c Compare January 3, 2024 17:15
This serves two purposes: first, no process is left unsupervised, and second,
solves the potential bottleneck a single 'gen_event' process could become at
handling all the notifications.
Copy link
Collaborator

@DenysGonchar DenysGonchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice PR, and it's cool to see that coverage has increased. but there are few things that must be addressed.

test/amoc_SUITE.erl Outdated Show resolved Hide resolved
test/amoc_SUITE.erl Outdated Show resolved Hide resolved
test/amoc_SUITE.erl Outdated Show resolved Hide resolved
test/amoc_SUITE.erl Outdated Show resolved Hide resolved
test/controller_SUITE.erl Outdated Show resolved Hide resolved
test/throttle_SUITE.erl Show resolved Hide resolved
test/throttle_SUITE.erl Outdated Show resolved Hide resolved
test/throttle_SUITE.erl Outdated Show resolved Hide resolved
test/throttle_SUITE.erl Outdated Show resolved Hide resolved
test/throttle_SUITE.erl Outdated Show resolved Hide resolved
@NelsonVides NelsonVides force-pushed the supervision_trees branch 4 times, most recently from 669259a to 1fc070d Compare January 13, 2024 16:13
@@ -32,7 +32,9 @@ all_tests() ->
stop_running_scenario_with_no_users_immediately_terminates,
stop_running_scenario_with_users_stays_in_finished,
stop_running_scenario_with_users_eventually_terminates,
scenario_with_state_and_crashing_in_terminate_run_fine
scenario_with_state_and_crashing_in_terminate_run_fine,
scenario_without_start_never_runs_users,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please suggest some better name

amoc_telemetry:execute([coordinator, TelemetryEvent], #{count => 1}, #{name => Name}).

-spec order_plan([normalized_coordination_item()]) -> [normalized_coordination_item()].
order_plan(Items) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a test for this function.
I must say that the version with lists:partition/2 was more verbose.

The main idea of this function is that it defines an order of execution, the rest code must ensure that the order is kept and not broken. all other comments are irrelevant here. internal implementation can change, but execution order must remain as defined here. also, with the latest changes we no longer rely on the childrens' order.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we were doing before was respective the relative order given by the user between non-all elements and all elements, and simply moving the all items to the end. It is the same effect now, so this new code sets an equivalent order that it was already.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I know that it's the same, but lists:partition/2 version was easier to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, reverted then 👌🏽

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the comment for order_plan/1 function:

The main idea of this function is that it defines an order of execution, the rest code must ensure that the order is kept and not broken. all other comments are irrelevant here. internal implementation can change, but execution order must remain as defined here. also, with the latest changes we no longer rely on the children's order.

I would suggest a comment like this:

This function defines the execution order of the events that must be processed synchronously (on reset/timeout/stop). we need to ensure that 'all' items are executed after 'non-all'.
Note that '{coordinate, _}' events are exceptional and their processing is done asynchronously.
The order of All plan items must be preserved as in the original plan, the same applies to NonAll items.

telemetry:attach_many(?HANDLER, TelemetryEvents, TelemetryHandler, ?CONFIG).

stop() ->
meck:reset(?HANDLER).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite a confusing name for the function. there should be a separate stop function and a separate reset function.


end_per_suite(_) ->
application:stop(amoc),
telemetry_helpers:stop(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently telemetry_helpers:stop() doesn't unload meck module, also we want to reset meck module on init_per_testcase to ensure that test cases do not affect each other.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to. Test cases are running in parallel here, and also, when checking history we're filtering for the testcase name.

meck:unload().

init_per_testcase(_, Config) ->
meck:reset(?MOCK_MOD),
meck:reset(?TELEMETRY_HANDLER),
telemetry_helpers:stop(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be telemetry_helpers:reset()

application:stop(telemetry),
Sup = ?config(sup, Config),
Sup ! terminate,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be nice to call telemetry_helpers:stop() here, even if we do meck:unload() later

Rate =:= maps:get(rate, Metadata, undefined) andalso
Interval =:= maps:get(interval, Metadata, undefined)
end,
lists:any(LowRateEvent, TelemetryEvents).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserting is missing

TelemetryEvents = telemetry_helpers:get_calls([amoc, throttle]),
LowRateEvent = fun({EventName, Measurements, Metadata}) ->
Name =:= EventName andalso
1 =:= maps:get(Count, Measurements, 1) andalso
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we checking here equality to the default value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are checking that the key exists in the map, which wtf there's a maps:is_key/2, sorry xD

%% Helpers
assert_telemetry_event(Name, Count, Throttle, Rate, Interval) ->
TelemetryEvents = telemetry_helpers:get_calls([amoc, throttle]),
LowRateEvent = fun({EventName, Measurements, Metadata}) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename into IsLowRateEventFn

interval := 100,
delay_between_executions := 50},
State),
assert_telemetry_event([amoc, throttle, process], error, ?FUNCTION_NAME, 2, 100).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'error' atom is supplied as a Count parameter, this doesn't seem correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, that 'error' value for Count looks like a mistake at first glance. I would rather rename Count into a LogLevel.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it to Measurement for consistency with the map.

interval := 1,
delay_between_executions := 10},
State),
assert_telemetry_event([amoc, throttle, process], error, ?FUNCTION_NAME, 1, 1).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'error' atom is supplied as a Count parameter, this doesn't seem correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Copy link
Collaborator

@DenysGonchar DenysGonchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is really awsome.

@DenysGonchar DenysGonchar merged commit 55a86b4 into master Jan 15, 2024
6 checks passed
@DenysGonchar DenysGonchar deleted the supervision_trees branch January 15, 2024 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants