Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broker: refactoring / cleanup #2182

Closed
chu11 opened this issue Jun 11, 2019 · 8 comments
Closed

broker: refactoring / cleanup #2182

chu11 opened this issue Jun 11, 2019 · 8 comments

Comments

@chu11
Copy link
Member

chu11 commented Jun 11, 2019

While working on #1036 (PR #2181), some of the broker code was a little tricky and could use some cleanup/refactoring. I can't exactly see the best way to do it, but just writing down some thoughts/ideas before I forget them. I will add more as I come up on it

  1. I was hoping to do something like this for PR broker: go through main cleanup path, do not call log_err_exit() #2181:
if (create_A () < 0)
   goto cleanup_A;
if (create_B () < 0)
  goto cleanup_B;

...

cleanup_B:
  destroy_B();
cleanup_A:
  destroy_A();

but there are a lot of interdependencies between things in the broker which made this hard. While some of this is probably unavoidable, I think some could be made simpler by combining the create and init functions into one, or removing stop or unregister or similar functions and put them into destroy.

For example, I don't think:

    if (!(ctx.publisher = publisher_create ()))
        oom ();

and then later

    if (publisher_set_flux (ctx.publisher, ctx.h) < 0)
        log_err_exit ("publisher_set_flux");
    if (publisher_set_sender (ctx.publisher, "handle_event",
                              (publisher_send_f)handle_event, &ctx) < 0)
        log_err_exit ("publisher_set_sender");

is necessary. It could just be a single publisher_create() call that gets initialized with the right stuff. Not sure how many can be done this way. I think attrs_create() and init_attrs() can probably be combined. And probably heartbeat_set_flux() and heartbeat_stop() isn't necessary either.

  1. There is some ordering to calls that is important that is not initially obvious. The "register_attrs" calls are in different locations for example (one due to t2008-althash.t errors #1063). I'm not entirely sure what could be done here. Perhaps internally in the attr library, it could recognize settings via the command line having "precedence", thus the ordering of the "register_attrs" calls is no longer important.

  2. the sigwatchers list and signal handling could probably be abstracted away into a "class" (maybe the signal masking too).

  3. the shutdown_grace is initialized at a random point in the middle after overlay setup. Perhaps could be abstracted away inside the shutdown "class" (combined with part 1 above, where it's initialized at creation of the shutdown object) (Edit: Generically, move initialization / settings into the "objects". runlevel rc config another possible example). (Edit, shutdown_grace specifically handled in broker: various cleanup + refactoring #2194).

@chu11
Copy link
Member Author

chu11 commented Jun 11, 2019

  1. remove lingering log_err_exit()/log_msg_exit()/log_errn_exit() and xzmalloc()/xstrdup()/xasprintf() calls and oom() calls.

@chu11
Copy link
Member Author

chu11 commented Jun 13, 2019

  1. with many / most cleanup now going through the broker main(), perhaps various callback functions can be moved into the respective "objects". Any object dependencies can be set accordingly (e.g. create a new function runlevel_set_overlay(), overlay_set_attrs()). Again, this may be helped via part 1 above, if all "objects" are init-ed at a single time.

@chu11
Copy link
Member Author

chu11 commented Jun 14, 2019

  1. There is inconsistency on using / passing the broker rank around (e.g. modhash_set_rank() vs flux_get_rank()). I think the broker (generally speaking) should pass around the rank so it is more clear what objects require the rank and therefore must be initialized / started AFTER the overlay has been setup. As a fallout, create_dummyattrs() can probably be removed. (Edit: OR the fact it can be retrieved via flux_t *h is good abstraction? I did an experimental patch doing what I wrote here and I'm not sure it was a net win).

@chu11
Copy link
Member Author

chu11 commented Jun 14, 2019

  1. some "objects" like ping and exec are setup via an initialize routine (e.g. ping_initialize()) and create a "ping context" which is stored in the flux handle. The destruction of the "ping context" is ultimately destroyed when the flux handle is closed. This can be difficulty to follow since other objects like heartbeat or hello must be destroyed via a specific call (e.g. heartbeat_destroy()). Could be made consistent to remove this confusion.

@chu11
Copy link
Member Author

chu11 commented Jun 14, 2019

  1. heartbeat object is both a sender and receiver, but sending a heartbeat is only on rank 0. Perhaps object should be split up.

  2. module object stores heartbeat object, but only uses heartbeat object to get heartbeat epoch. Could get via heartbeat-epoch attribute instead, thus avoiding dependency on heartbeat object.

@chu11
Copy link
Member Author

chu11 commented Jun 19, 2019

  1. In the few cases that dependency of calls matters (e.g. a register attrs call comes after overlay setup), can split up register attrs calls into pre/post calls, e.g. register_attrs_post_overlay_setup()

  2. I think there are several circumstances to remove callbacks and/or put them into specific locations to eliminate some of the "where is this called" and ordering of function call requirements. One that comes to mind is create_broker_rundir() could be passed to boot_pmi() and boot_config() directly and could be called after overlay_init() is called. This would remove overlay_set_init_callback() and remove the non-obvious requirement that overlay_set_init_callback() has to be called before boot_pmi()/boot_config() are called.

@chu11 chu11 closed this as completed Jun 19, 2019
@chu11 chu11 reopened this Jun 19, 2019
@grondo
Copy link
Contributor

grondo commented Feb 15, 2022

A lot of work has been done in the broker over the intervening years, my guess a lot of these specific examples and the more generic issue of "refactor / cleanup the broker" no longer applies. Can we close this issue? Specific cleanup or refactoring issues could be introduced in individual issues if they still apply.

@chu11
Copy link
Member Author

chu11 commented Feb 15, 2022

yeah, lets close.

@chu11 chu11 closed this as completed Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants