Supervision trees and test coverage #178

NelsonVides · 2023-12-31T09:23:34Z

This complex PR has a very important purpose: introspection.

I want to be able to visualise, using for example observer or Livebook's supervision trees Explorer, the architecture of my load test. All coordinators and throttle processes are always supervised, named, and can easily be found.

It also serves a very small purpose of performance, as the coordinator for example is now not a single gen_event with a list of handlers to run sequentially, but, a pool of servers each one with its own handler, that can run in parallel.

Coordinators now have a top-level dynamic supervisor, where upon amoc_coordinator:start/N, it is requested to start a new pool of coordinators, which are one per handler plus an extra one for timeouts. All these are properly supervised using a one_to_one static spec and can be restarted automatically as well as viewed using introspection.

Throttles now have a top-level static supervisor supervising the controller, the pg-group, and a dynamic pooler: amoc_throttle:start/N now requests the pooler to start a pool with the given specs, which is here now a static supervisor with the specs of the requested workers, which again will be restarted automatically and can be viewed using introspection.

I also raised code coverage to these two subsystems using the occasion that I'm touching them :)

codecov-commenter · 2023-12-31T09:57:58Z

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (33cce56) 47.64% compared to head (6b17632) 73.72%.

Files	Patch %	Lines
src/throttle/amoc_throttle_controller.erl	87.50%	5 Missing ⚠️
src/coordinator/amoc_coordinator_sup.erl	88.46%	3 Missing ⚠️
src/coordinator/amoc_coordinator.erl	95.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #178       +/-   ##
===========================================
+ Coverage   47.64%   73.72%   +26.08%     
===========================================
  Files          23       29        +6     
  Lines         976     1043       +67     
===========================================
+ Hits          465      769      +304     
+ Misses        511      274      -237

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This serves two purposes: first, no process is left unsupervised, and second, solves the potential bottleneck a single 'gen_event' process could become at handling all the notifications.

DenysGonchar

This is nice PR, and it's cool to see that coverage has increased. but there are few things that must be addressed.

test/amoc_SUITE.erl

test/controller_SUITE.erl

test/throttle_SUITE.erl

src/throttle/amoc_throttle_controller.erl

…ilure

DenysGonchar · 2024-01-14T00:18:58Z

test/controller_SUITE.erl

@@ -32,7 +32,9 @@ all_tests() ->
     stop_running_scenario_with_no_users_immediately_terminates,
     stop_running_scenario_with_users_stays_in_finished,
     stop_running_scenario_with_users_eventually_terminates,
-     scenario_with_state_and_crashing_in_terminate_run_fine
+     scenario_with_state_and_crashing_in_terminate_run_fine,
+     scenario_without_start_never_runs_users,


please suggest some better name

DenysGonchar · 2024-01-14T00:31:10Z

src/coordinator/amoc_coordinator.erl

+    amoc_telemetry:execute([coordinator, TelemetryEvent], #{count => 1}, #{name => Name}).
+
+-spec order_plan([normalized_coordination_item()]) -> [normalized_coordination_item()].
+order_plan(Items) ->


please add a test for this function.
I must say that the version with lists:partition/2 was more verbose.

The main idea of this function is that it defines an order of execution, the rest code must ensure that the order is kept and not broken. all other comments are irrelevant here. internal implementation can change, but execution order must remain as defined here. also, with the latest changes we no longer rely on the childrens' order.

What we were doing before was respective the relative order given by the user between non-all elements and all elements, and simply moving the all items to the end. It is the same effect now, so this new code sets an equivalent order that it was already.

yeah, I know that it's the same, but lists:partition/2 version was easier to read.

All right, reverted then 👌🏽

please update the comment for order_plan/1 function:

The main idea of this function is that it defines an order of execution, the rest code must ensure that the order is kept and not broken. all other comments are irrelevant here. internal implementation can change, but execution order must remain as defined here. also, with the latest changes we no longer rely on the children's order.

I would suggest a comment like this:

This function defines the execution order of the events that must be processed synchronously (on reset/timeout/stop). we need to ensure that 'all' items are executed after 'non-all'. Note that '{coordinate, _}' events are exceptional and their processing is done asynchronously. The order of All plan items must be preserved as in the original plan, the same applies to NonAll items.

DenysGonchar · 2024-01-14T00:42:12Z

test/telemetry_helpers.erl

+    telemetry:attach_many(?HANDLER, TelemetryEvents, TelemetryHandler, ?CONFIG).
+
+stop() ->
+    meck:reset(?HANDLER).


this is quite a confusing name for the function. there should be a separate stop function and a separate reset function.

DenysGonchar · 2024-01-14T00:48:56Z

test/throttle_SUITE.erl

+
+end_per_suite(_) ->
+    application:stop(amoc),
+    telemetry_helpers:stop(),


currently telemetry_helpers:stop() doesn't unload meck module, also we want to reset meck module on init_per_testcase to ensure that test cases do not affect each other.

We shouldn't need to. Test cases are running in parallel here, and also, when checking history we're filtering for the testcase name.

DenysGonchar · 2024-01-14T00:51:59Z

test/amoc_coordinator_SUITE.erl

    meck:unload().

 init_per_testcase(_, Config) ->
    meck:reset(?MOCK_MOD),
-    meck:reset(?TELEMETRY_HANDLER),
+    telemetry_helpers:stop(),


must be telemetry_helpers:reset()

DenysGonchar · 2024-01-14T00:53:30Z

test/amoc_coordinator_SUITE.erl

    application:stop(telemetry),
+    Sup = ?config(sup, Config),
+    Sup ! terminate,


that would be nice to call telemetry_helpers:stop() here, even if we do meck:unload() later

DenysGonchar · 2024-01-14T00:56:17Z

test/throttle_SUITE.erl

+                           Rate =:= maps:get(rate, Metadata, undefined) andalso
+                           Interval =:= maps:get(interval, Metadata, undefined)
+                   end,
+    lists:any(LowRateEvent, TelemetryEvents).


asserting is missing

DenysGonchar · 2024-01-14T00:59:42Z

test/throttle_SUITE.erl

+    TelemetryEvents = telemetry_helpers:get_calls([amoc, throttle]),
+    LowRateEvent = fun({EventName, Measurements, Metadata}) ->
+                           Name =:= EventName andalso
+                           1 =:= maps:get(Count, Measurements, 1) andalso


are we checking here equality to the default value?

We are checking that the key exists in the map, which wtf there's a maps:is_key/2, sorry xD

DenysGonchar · 2024-01-14T01:01:22Z

test/throttle_SUITE.erl

+%% Helpers
+assert_telemetry_event(Name, Count, Throttle, Rate, Interval) ->
+    TelemetryEvents = telemetry_helpers:get_calls([amoc, throttle]),
+    LowRateEvent = fun({EventName, Measurements, Metadata}) ->


please rename into IsLowRateEventFn

DenysGonchar · 2024-01-14T01:05:06Z

test/throttle_SUITE.erl

+                   interval := 100,
+                   delay_between_executions := 50},
+                 State),
+    assert_telemetry_event([amoc, throttle, process], error, ?FUNCTION_NAME, 2, 100).


'error' atom is supplied as a Count parameter, this doesn't seem correct.

It is: https://hexdocs.pm/amoc/telemetry.html#throttle-process-internals

I mean, that 'error' value for Count looks like a mistake at first glance. I would rather rename Count into a LogLevel.

Renamed it to Measurement for consistency with the map.

DenysGonchar · 2024-01-14T01:05:16Z

test/throttle_SUITE.erl

+                   interval := 1,
+                   delay_between_executions := 10},
+                 State),
+    assert_telemetry_event([amoc, throttle, process], error, ?FUNCTION_NAME, 1, 1).


'error' atom is supplied as a Count parameter, this doesn't seem correct.

DenysGonchar

this PR is really awsome.

NelsonVides force-pushed the supervision_trees branch 6 times, most recently from 5929227 to b999ae0 Compare December 31, 2023 09:55

NelsonVides force-pushed the supervision_trees branch 2 times, most recently from 0b5ee86 to fb78f93 Compare December 31, 2023 13:41

NelsonVides changed the title ~~Supervision trees~~ Supervision trees and test coverage Dec 31, 2023

NelsonVides force-pushed the supervision_trees branch from fb78f93 to c51b111 Compare December 31, 2023 13:45

NelsonVides marked this pull request as ready for review December 31, 2023 13:51

NelsonVides requested a review from DenysGonchar December 31, 2023 13:51

NelsonVides force-pushed the supervision_trees branch 4 times, most recently from 9799274 to 01eee4c Compare January 3, 2024 17:15

NelsonVides added 4 commits January 4, 2024 13:46

Shorten directories names

d13b8f0

Ensure coordinators are supervised and pooled

44b946d

This serves two purposes: first, no process is left unsupervised, and second, solves the potential bottleneck a single 'gen_event' process could become at handling all the notifications.

Ensure throttlers are supervised and pooled

4b9ef78

Increase test coverage

0c9c96c

NelsonVides force-pushed the supervision_trees branch from 01eee4c to 0c9c96c Compare January 4, 2024 12:47

DenysGonchar requested changes Jan 13, 2024

View reviewed changes

Rework behaviour to fail scenario if callbacks are not exported

23daec3

NelsonVides force-pushed the supervision_trees branch 4 times, most recently from 669259a to 1fc070d Compare January 13, 2024 16:13

NelsonVides commented Jan 13, 2024

View reviewed changes

src/throttle/amoc_throttle_controller.erl Outdated Show resolved Hide resolved

NelsonVides force-pushed the supervision_trees branch from 1fc070d to db869dc Compare January 13, 2024 16:52

NelsonVides added 9 commits January 14, 2024 00:36

Encapsulate coordinator order logic

27e81a2

Extract telemetry test helpers

4fa95d6

Improve test quality of throttle tests

de1b5ef

Clean throttle process indentation

f10a653

Ensure coordinator and throttle supervisors kill the whole node on fa…

1ae2e73

…ilure

Fix change_rate return value

379244d

Encapsulate PG_SCOPE in amoc_throttle_process

5365725

Update copyright

7e18e79

Test changing plan twice is rejected

200ff48

NelsonVides force-pushed the supervision_trees branch from 4cccc76 to 200ff48 Compare January 13, 2024 23:36

DenysGonchar reviewed Jan 14, 2024

View reviewed changes

Apply review

b1701ef

NelsonVides requested a review from DenysGonchar January 14, 2024 08:52

NelsonVides added 2 commits January 15, 2024 08:58

Revert to partition instead of order for coordinator plan

daed582

Rename count parameter in telemetry test helper

6b17632

NelsonVides force-pushed the supervision_trees branch from 4b363e2 to 6b17632 Compare January 15, 2024 07:58

DenysGonchar approved these changes Jan 15, 2024

View reviewed changes

DenysGonchar merged commit 55a86b4 into master Jan 15, 2024
6 checks passed

DenysGonchar deleted the supervision_trees branch January 15, 2024 08:02

NelsonVides mentioned this pull request Feb 22, 2024

Throttles/rework run and sending logic #180

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supervision trees and test coverage #178

Supervision trees and test coverage #178

NelsonVides commented Dec 31, 2023 •

edited

codecov-commenter commented Dec 31, 2023 •

edited

DenysGonchar left a comment

DenysGonchar Jan 14, 2024

DenysGonchar Jan 14, 2024 •

edited

NelsonVides Jan 14, 2024

DenysGonchar Jan 14, 2024

NelsonVides Jan 14, 2024

DenysGonchar Jan 15, 2024

DenysGonchar Jan 14, 2024

DenysGonchar Jan 14, 2024

NelsonVides Jan 14, 2024

DenysGonchar Jan 14, 2024

DenysGonchar Jan 14, 2024

DenysGonchar Jan 14, 2024

DenysGonchar Jan 14, 2024

NelsonVides Jan 14, 2024

DenysGonchar Jan 14, 2024

DenysGonchar Jan 14, 2024

NelsonVides Jan 14, 2024

DenysGonchar Jan 14, 2024

NelsonVides Jan 14, 2024

DenysGonchar Jan 14, 2024

NelsonVides Jan 14, 2024

DenysGonchar left a comment

Supervision trees and test coverage #178

Supervision trees and test coverage #178

Conversation

NelsonVides commented Dec 31, 2023 • edited

codecov-commenter commented Dec 31, 2023 • edited

Codecov Report

DenysGonchar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DenysGonchar Jan 14, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DenysGonchar left a comment

Choose a reason for hiding this comment

NelsonVides commented Dec 31, 2023 •

edited

codecov-commenter commented Dec 31, 2023 •

edited

DenysGonchar Jan 14, 2024 •

edited