Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create alternate bootstrap method to support system instance startup. #1236

Closed
morrone opened this Issue Oct 12, 2017 · 6 comments

Comments

Projects
None yet
2 participants
@morrone
Copy link
Contributor

commented Oct 12, 2017

When we start up the flux system instances, they will need to be able to bootstrap their network without the help of an enclosing instance that offers PMI. So we'll need an alternate, non-PMI bootstrap mechanism. The rough plan to get there is:

  1. Refactor and cleanup the code to allow parameterized selection of a bootstrap mechanism.
  2. Add an alternate bootstrap mechanism.
@garlick

This comment has been minimized.

Copy link
Member

commented Oct 13, 2017

It would be more useful IMHO to describe the proposed new bootstrap method here, then the refactoring necessary to get there will be more clear.

@morrone

This comment has been minimized.

Copy link
Contributor Author

commented Oct 13, 2017

I think that making the the overlay network startup code more modular and configurable doesn't really require a plan for the next bootstrap mechanism. That work will benefit any future bootstrap mechanism.
And by doing that work, I'm developing a more thorough understanding of the code and the issues involved, which will allow me to develop the plan for the first new alternative boot strap mechanism.

@garlick

This comment has been minimized.

Copy link
Member

commented Oct 13, 2017

I'm not expecting to be reviewing a lot of changes to the broker as a result of this activity, just so we're on the same page.

There's significant cleanup and refactoring needed in the broker to accomplish some of the things we need for S4, and I suspect based on our past interactions that we are not on the same page about what those things are or how to get there. Therefore, I'm asking you to state a more precise goal like "load overlay parameters rank, size, fanout, and TBON peer URI's from config file instead of PMI" and then chart a course to get there that involves a minimum of broker refactoring.

I don't think we want to be arguing about the design of these other high level items in PR's. That will waste a lot of energy. That's where we will end up if the changes you propose are not fairly targeted. Am I making sense?

@morrone

This comment has been minimized.

Copy link
Contributor Author

commented Nov 21, 2017

I am putting setting this aside while I work on higher priority flux-sched tasks.

The next planned step was to break up boot_pmi() into multiple functions, some of which would be hooks into the particular boot strap mechanism (pmi, config file, etc.).

@morrone morrone removed their assignment Nov 21, 2017

@garlick

This comment has been minimized.

Copy link
Member

commented Jan 12, 2018

I was thinking of bringing over the TOML config class from flux-security and using it to create a new bootstrap method which establishes a static mapping of ranks to TBON endpoints in a config file.

With that in place, we could bootstrap a system instance on a cluster, starting brokers with systemd. There would of course be the initial caveats that all nodes would have to come up before bootstrapping is complete, and that any nodes going down would cause hangs. But it might be a good step forward anyway.

I'm taking a small raspberry pi cluster with me to Tahoe next week and thought I might have a go at this. @morrone if you have thoughts beyond what you wrote above about refactoring for multiple bootstrap methods, please share. I'll probably try to keep the refactoring to a minimum in an initial PR.

@morrone

This comment has been minimized.

Copy link
Contributor Author

commented Jan 17, 2018

Nothing more from me at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.