Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference implementation for EEP 56 #4638

Merged
merged 13 commits into from Apr 16, 2021
Merged

Reference implementation for EEP 56 #4638

merged 13 commits into from Apr 16, 2021

Conversation

juhlig
Copy link
Contributor

@juhlig juhlig commented Mar 17, 2021

This PR reflects the current state of and will be updated with EEP 56.

As the EEP has not settled down fully, this PR contains no additional tests and documentation (yet).

@HansN HansN added enhancement priority:high team:PS Assigned to OTP team PS team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI labels Mar 18, 2021
@HansN HansN removed the testing currently being tested, tag is used by OTP internal CI label Mar 19, 2021
@juhlig
Copy link
Contributor Author

juhlig commented Mar 23, 2021

Last commit adds test cases:

  • auto-shutdown with transient significant children
  • auto-shutdown with temporary significant children
  • (no) auto-shutdown when a significant child is a bystander, ie is restarted because of a sibling's death
  • escalation of auto-shutdown; a (top) supervisor has another supervisor as a significant child, and this (child) supervisor has a significant worker as a child; the test ensures that the exit of the worker child causes an auto-shutdown of the (child) supervisor, and this in turn causes an auto-shutdown of the (top) supervisor
  • checks for the map form of child specs to ensure that the new significant flag is validated
  • check for the new auto_shutdown sup flag to ensure that it is validated

Checks for nonsensical combinations, like having significant => true in the child specs of an auto_shutdown => never supervisor or having significant => true in combination with restart => permanent are currently not tested because it is in our opinion not fully settled if those should be allowed and ignored, or forbidden altogether. A question posted by @Maria-12648430 on the EEPs mailing list yielded only a single reply, which was in favor of forbidding this.

@juhlig
Copy link
Contributor Author

juhlig commented Mar 24, 2021

Latest commit refines and augments test cases.

@IngelaAndin
Copy link
Contributor

I and @HansN feel positive about the way this turning out. As we are waiting for OTP-24 RC2 we can not test it in the builds just yet. However I have run the tests myself with cover and I got some uncovered lines.

do_auto_shutdown(_Child, State=#state{auto_shutdown = never}) ->
--
735 | 4020 | {ok, State};
736 |   | do_auto_shutdown(Child, State) when not ?is_significant(Child)->
737 | 1 | {ok, State};
738 |   | do_auto_shutdown(_Child, State=#state{auto_shutdown = any_significant}) ->
739 | 7 | {shutdown, State};
740 |   | do_auto_shutdown(_Child, State=#state{auto_shutdown = all_significant})
741 |   | when ?is_simple(State) ->
742 | :-( | case dyn_size(State) of
743 |   | 0 ->
744 | :-( | {shutdown, State};
745 |   | _ ->
746 | :-( | {ok, State}
747 |   | end;
748 |   | do_auto_shutdown(_Child, State=#state{auto_shutdown = all_significant}) ->
749 | 4 | case
750 |   | children_any(
751 |   | fun
752 |   | (_, #child{pid = undefined}) ->
753 | 3 | false;
754 |   | (_, #child{significant = true}) ->
755 | 2 | true;
756 |   | (_, _) ->
757 | :-( | false
758 |   | end,
759 |   | State#state.children
760 |   | )
761 |   | of
762 |   | true ->
763 | 2 | {ok, State};
764 |   | false ->
765 | 2 | {shutdown, State}
766 |   | end.

@juhlig
Copy link
Contributor Author

juhlig commented Mar 26, 2021

Hi @IngelaAndin,

I and @HansN feel positive about the way this turning out.

Great =D

However I have run the tests myself with cover and I got some uncovered lines.

Yes, the test suite is not 100% finished yet, I'm extending it and fill the gaps as time permits.

@Maria-12648430
Copy link
Contributor

@IngelaAndin

I and @HansN feel positive about the way this turning out.

Nice :) I'm quite pleased about how this is rolling along, too.

I have been told that an upcoming OTB meeting will look into this. Other things aside, we would appreciate a decision (or preference or something) about the issue of allowing (ignoring) or forbidding option combinations that make no sense, like restart => permanent and significant => true, or auto_shutdown => never and significant => true.
The current state of this PR takes the simpler approach to allow those. A question posted on the eeps list yielded only one answer (by @okeuday, I believe) and that one was in favor of forbidding. If they were to be forbidden we will need more tests, and it needs to be mentioned in the documentation (which we haven't started on yet).

@IngelaAndin
Copy link
Contributor

@Maria-12648430 we will take it up on the meeting (next week).

@Maria-12648430
Copy link
Contributor

@IngelaAndin thanks :)

@IngelaAndin IngelaAndin added the testing currently being tested, tag is used by OTP internal CI label Mar 26, 2021
@juhlig
Copy link
Contributor Author

juhlig commented Mar 26, 2021

@IngelaAndin I augmented the existing and added a new test, and the lines you pointed out are now covered. Depending on the decision concerning pointless option combinations, more tests will be added later to ensure the desired behavior. Also, there should be tests for upgrading, but that also depends, to a degree, on how pointless combinations are to be treated, so these will be added later, too.

@IngelaAndin
Copy link
Contributor

Hi, thanks for the update. I will enable the test again. (The are automatically disabled if you push something new).

I can also report that your branch was breaking the test case upgrade_supervisor in the release_handler_SUITE of the sasl application and supervisor_incorrect_return in behaviour_SUITE of the dialyzer application. So you need to look into that.

@juhlig
Copy link
Contributor Author

juhlig commented Mar 29, 2021

Last commit fixes the two failing tests pointed out by @IngelaAndin, and adds tests for upgrading auto_shutdown sup flags and significant child spec flags.

@juhlig
Copy link
Contributor Author

juhlig commented Mar 30, 2021

Last commit handles invalid combinations of auto_shutdown and significant values.

@HansN HansN self-assigned this Mar 31, 2021
@HansN
Copy link
Contributor

HansN commented Mar 31, 2021

There was an OTB yesterday that decided that:

  • We are positive to EEP-0056 and want to try to get it into OTP 24 in May and the release candidate OTP-24rc3 (mid April).
  • We want supported combinations of parameters to be as limited as possible. I.e. only support the obvious use cases (no corner cases).
  • Attempts to use unsupported combinations should result in error during setup.
  • Documentation of the supported combinations is important to have before introduction.

Please also change relevant parts in the supervisor documentation in "Design Principles". The file path is
$ERL_TOP/system/doc/design_principles/sup_princ.xml
and the URI at erlang.org is:
https://erlang.org/doc/design_principles/sup_princ.html

@Maria-12648430
Copy link
Contributor

  • We are positive to EEP-0056 and want to try to get it into OTP 24 in May

Great 🥰

and the release candidate OTP-24rc3 (mid April).

That should be ample time, because this...

  • We want supported combinations of parameters to be as limited as possible. I.e. only support the obvious use cases (no corner cases).
  • Attempts to use unsupported combinations should result in error during setup.

... has already been done in @juhlig's last commit from yesterday, and this...

  • Documentation of the supported combinations is important to have before introduction.

Please also change relevant parts in the supervisor documentation in "Design Principles". The file path is
$ERL_TOP/system/doc/design_principles/sup_princ.xml
and the URI at erlang.org is:
https://erlang.org/doc/design_principles/sup_princ.html

... is what I'm starting on right now :)
I'll probably finish the documentation early next week. The implementation itself is ready for a review.

@Maria-12648430
Copy link
Contributor

@garazdawi my last commit addresses all the documentation changes you requested. Or rather, I did my best to address them ;)

The only thing I didn't provide are examples for how an automatic shutdown takes place. That would require some illustration with images I think, I'll leave that for later. Is there some sort of guideline for images in the documentation that I should know of?

@garazdawi
Copy link
Contributor

Is there some sort of guideline for images in the documentation that I should know of?

We don't really have a guideline as such, but recently we have used Dia to draw things and then set up a makefile rule that can create the .png. The .png needs to be committed into git as we don't want to depend on imagemagic in order to create the docs.

@Maria-12648430
Copy link
Contributor

We don't really have a guideline as such, but recently we have used Dia to draw things and then set up a makefile rule that can create the .png. The .png needs to be committed into git as we don't want to depend on imagemagic in order to create the docs.

I can look into that next week I think.

Anyway, unless there are new change requests, we're now pretty much done here, right?

@HansN
Copy link
Contributor

HansN commented Apr 15, 2021

I have added 912d1fe to the nightly tests now. If there are not any errors and no one complains, I'll merge this PR tomorrow and we could open a new one with the items left:

  1. The error message in supervisor:validSignificant/3 (needs more thinking/discussions)
  2. Examples for how an automatic shutdown takes place.

Have I forgot anything?
Is @garazdawi satisfied with the fixes committed so far?

Copy link
Contributor

@garazdawi garazdawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can merge this. I do however think that we can do a better job in the documentation, especially the documentation in the "OTP Design Principles".

In the "Stopping a Child Process" section I think it would be beneficial to include a small description about who can terminate a child (i.e. an external entity or the child itself) with a description of how you go about it for both scenarios. Maybe there should be two subsections here?

The "Supervisor Behaviour" chapter is not meant to be a restatement of what you can find in the supervisor reference manual, but rather a guide about how to use the features, when you should use them, and things to keep in mind.

lib/stdlib/doc/src/supervisor.xml Outdated Show resolved Hide resolved
Co-authored-by: Lukas Larsson <garazdawi@gmail.com>
@Maria-12648430
Copy link
Contributor

Yes, we can merge this. I do however think that we can do a better job in the documentation, especially the documentation in the "OTP Design Principles".

In the "Stopping a Child Process" section I think it would be beneficial to include a small description about who can terminate a child (i.e. an external entity or the child itself) with a description of how you go about it for both scenarios. Maybe there should be two subsections here?

The "Supervisor Behaviour" chapter is not meant to be a restatement of what you can find in the supervisor reference manual, but rather a guide about how to use the features, when you should use them, and things to keep in mind.

Sorryyyy, I'm doing the best I can 😢😢😢
j/k 😉

But seriously, I find it surprisingly difficult to fit new things like this into the existing document :( In the end, it probably boils down to a bigger restructuring/rewrite project for the chapter, something that requires quite some time and effort and many review cycles.

@HansN
Copy link
Contributor

HansN commented Apr 15, 2021

I think that re-writing documentation parts that are more of a general kind is out of scope for a "simple" add-a-feature PR like this. But I agree that the supervisor Design Principles would benefit from a general brush-up, but that is a separate task in my opinion.

@Maria-12648430
Copy link
Contributor

@HansN my point exactly :)

@IngelaAndin
Copy link
Contributor

IngelaAndin commented Apr 15, 2021

Just had time to catch up on this. I agree with @HansN that we should merge this and make sure to make a new PR that enhances the error messages in time for OTP-24, but I really like to see the implementation tested in RC3. I think that clear error messages are way more important than backwards compatible error messages. Error reasons are many times specified as term() for this purpose. In the ssl application we have some errors that are specified as {alert_atom(), term()} as in those case we know that the error will always result in a TLS alert that you might want to match on. But normal error reasons are for debugging and should not be matched on.

While I agree with @garazdawi that it is important with good docs I also agree with @hans that we can not really expect @Maria-12648430 to solve all our legacy problems with the existing documentation as part of this PR. Hopfully @Maria-12648430 would be interested to collaborate to make further improvements, as this will also benefit the new feature added here.

@garazdawi
Copy link
Contributor

It was not my intention to say that I expected a rewrite of the entire "Supervisor Behaviour" chapter. My wording was apparently poor as all three of you seem to have interpreted it as such. My apologies.

This PR adds new functionality around how supervisors terminate and how child processes can trigger such termination. As such I think that it would be reasonable to take a larger look at those specific sections in the "Supervisor Behaviour" chapter.

@garazdawi
Copy link
Contributor

garazdawi commented Apr 15, 2021

As an example of what I am missing right now is a discussion under "Stopping a Child Process" of why it is a bad idea to call supervisor:terminate_child/2 function from within the child to be terminated.

@IngelaAndin
Copy link
Contributor

@garazdawi I then think that those smaller document enhancements could come with the improved error messages for OTP-24. But I still would like us to merge the code before RC3 so that we get as much regression test as possible ;)

@garazdawi
Copy link
Contributor

@garazdawi I then think that those smaller document enhancements could come with the improved error messages for OTP-24. But I still would like us to merge the code before RC3 so that we get as much regression test as possible ;)

Yes, I agree.

@HansN HansN merged commit 07cced1 into erlang:master Apr 16, 2021
@HansN
Copy link
Contributor

HansN commented Apr 16, 2021

A big thank you for the PR!

@lhoguin
Copy link
Contributor

lhoguin commented Apr 16, 2021

Congratulations to everyone involved! It's a big one!

@Maria-12648430
Copy link
Contributor

Good morning everyone :)

It was not my intention to say that I expected a rewrite of the entire "Supervisor Behaviour" chapter. My wording was apparently poor as all three of you seem to have interpreted it as such. My apologies.

@garazdawi or maybe my wording was bad first ^^; When I brought up rewriting, it was not because I assumed that you wanted me to do it right here and now. What I really wanted to say is that I can't make my additions fit nicely into the existing chapter and a rewrite (or brush-up, as @HansN put it) might be the best way to make it nice and consistent again.

Hopfully @Maria-12648430 would be interested to collaborate to make further improvements, as this will also benefit the new feature added here.

@IngelaAndin Sure, I'll do my best :) Deadline for documentation is end of april, I believe?

@Maria-12648430
Copy link
Contributor

Oh, yay, merged =D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement priority:high team:PS Assigned to OTP team PS team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants