Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early loop detector: dependency loop remover from cache #12

Closed
wants to merge 14 commits into from

Conversation

xaionaro
Copy link

Related to: https://bugs.gentoo.org/show_bug.cgi?id=391945

This detector just calculates loops, solves weak loops (with at least one use or after dependency), and print information about all found loops (including unsolvable).

Solving is just breaking the loop with removing one or few use/after dependencies.

This detector cannot solve loops fully based on "need" dependencies. To prevent hang up in this case, it's need to use the later loop detector (#13) or solve it manually (sysadmin will be notified about any loop).

Presentation in PDF: https://github.com/xaionaro/documentation/raw/master/openrc/earlyloopdetector/early-loop-detection.pdf

@qnikst
Copy link
Contributor

qnikst commented Jan 23, 2014

Can I double Williams suggestion to squash commits, it will be much easier to review and understand them. Moreover it will make project history clean and maintainable.

Please try give a try to git rebase -i. If you want I may ping youon irc tomorrow with concrete questions and instructions. Thanks!

@xaionaro
Copy link
Author

@williamh, how can I print errors outside of librc? It requires to change it's API/ABI to pass back error strings. Is that ok?

@xaionaro
Copy link
Author

@qnikst, I've squashed all commits.

@qnikst
Copy link
Contributor

qnikst commented Jan 23, 2014

Thanks. But all commits was definitely an overkill.

@xaionaro
Copy link
Author

@qnikst, I can break to a few commits. But I don't know how, because this commit presents only one feature and logically cannot be posed by parts.

@qnikst
Copy link
Contributor

qnikst commented Jan 23, 2014

Then everything is ok, thanks.

@xaionaro
Copy link
Author

@xaionaro
Copy link
Author

I hope @heroxbd will pass this patch into Debian's OpenRC. Also I hope that such test will convince Gentoo Developers about the patch efficiency :)

@xaionaro
Copy link
Author

I've updated the solver. The presentation is expected soon.

I hope this's the last fix.

@xaionaro
Copy link
Author

19:00:56 < heroxbd>  * Solving the loop by breaking single u> rmnologin.
19:00:56 < heroxbd>  * Solving the loop by breaking single a> rmnologin.
Added option "rc_loopsolver_enable" option to be able to enable or disable
the early dependency loop solver.
@williamh
Copy link
Contributor

Here are my thoughts about taking this upstream.

I see a lot of brace nesting that is not part of control structures like if/while/for/etc, why is that done?

Adding the rc.conf switches the way they are implemented breaks the public API by changing the call to rc_deptree_update. Also, I don't know of a reason you would want this to be configurable. I recommend always having it on.

Librc code uses fprintf(stderr, ...) to print some messages, so you can use that to print out warnings for the dependency issues; you don't need to include einfo.h

I am not familiar with search.h, so is it available in environments other than glibc? What about on the bsds and Minix?

@xaionaro
Copy link
Author

This is a lot of code to add to librc-depend.c. Is it possible to put most of it in a separate file, e.g. librc-depsolver.c and have the code in librc-depend.c call the function(s) it needs to call?

Yes. Ok, you right. I'll do it.

I see a lot of brace nesting that is not part of control structures like if/while/for/etc, why is that done?

It's just a way to:

  • Create temporary variables (only for this block).
  • To separate code for logical blocks. It makes code more readable as for me.

I can remove it. It doesn't matter. It's just my vision of good coding practice :).

Adding the rc.conf switches the way they are implemented breaks the public API by changing the call to rc_deptree_update. Also, I don't know of a reason you would want this to be configurable. I recommend always having it on.

You mean option rc_loopsolver_warnings? This feature was requested by zigo or @heroxbd, I don't remember. Seems, this issue should be discussed on #openrc.

Librc code uses fprintf(stderr, ...) to print some messages, so you can use that to print out warnings for the dependency issues; you don't need to include einfo.h

Ok. :)

I am not familiar with search.h, so is it available in environments other than glibc? What about on the bsds and Minix?

It's "SVr4, POSIX.1-2001" according to the manpage of "man 3 tsearch".

@williamh
Copy link
Contributor

I am referring to rc_loopsolver_enable and rc_loopsolver_warnings. Dependency issues can be severe enough that they can lock up your boot process if rc_parallel=yes, so I pretty strongly believe that this functionality should always be on so people get the warnings when the cache is built.

@williamh
Copy link
Contributor

One more thing I thought about.

Please do not automatically break any dependencies. Instead, warn about them. I would rather let users break dependencies themselves than make the decision about how to break them.

@heroxbd
Copy link
Contributor

heroxbd commented Nov 24, 2014

FYI, rc_loopsolver_warnings is introduced when there are too many loops in Debian LSB scripts around $all dependency. The boot console gets cluttered. It was a request from zigo.

@williamh
Copy link
Contributor

williamh commented Dec 4, 2014

Here are more suggestions for cleaning up this code:

  • CHAINS_CHECK_SIZE should not be #define'd; macros like this make code much harder to debug later.
  • Please remove the {} blocks that only exist to scope variables; it is fine to have variables local at the function level.
  • Please split the configurability into a separate pull request. That is the piece which breaks the public API.
  • Please do not call ewarn and eerror from librc code. Instead, just use fprintf(stderr, ...).

@williamh
Copy link
Contributor

williamh commented Dec 4, 2014

There is another option we could consider besides breaking loops
What do you think about marking all services involved in a dependency loop as failed rather than trying to break the loop? that seems a bit more reasonable than trying to start parts of the loop.

@xaionaro
Copy link
Author

xaionaro commented Dec 6, 2014

What do you think about marking all services involved in a dependency loop as failed rather than trying to break the loop? that seems a bit more reasonable than trying to start parts of the loop.

Hm. It's harder. The patchs works only on stage of preparing deps cache. How to fail a service on this stage?

It's required to add a code into the boot stage to recheck loops and fail services. I can do that in this year.

However as for me we should leave possibility to solve loops automatically. I can make it configurable (no breaks / break only "uses" / break "uses" and "afters").

@williamh
Copy link
Contributor

williamh commented Dec 6, 2014

Hmm, let's not worry about trying to throw out the loops then.

WRT breaking loops automatically, I'm still not quite with you, because breaking loops will definitely leave things in an inconsistent state.

If a has after b and b has after a for example, there is no possible way to determine whether a or b should start first, so if you start a you could be wrong, and vice versa.

The same is true for any other loop.

@xaionaro
Copy link
Author

xaionaro commented Dec 7, 2014

If a has after b and b has after a for example, there is no possible way to determine whether a or b should start first, so if you start a you could be wrong, and vice versa.

I understand examples. Can you give a specific (concrete?) example when the loop solver will harm?

The same is true for any other loop.

Disagree. Usually (in my experience) loops contains all dep types. So the "use" dependency will be broken, while "after" and "need" will be kept.

WRT breaking loops automatically, I'm still not quite with you

I think our discussion about this issue is not constructive. We both didn't changed our minds not a jot. May be the language barrier doesn't allow me to clearly explain my vision. Or may be I'm just to stupid to understand your vision. IDK.

@xaionaro
Copy link
Author

xaionaro commented Dec 7, 2014

P.S.: I think you just should decide to remove the loop solver and don't waste your time on me. Honestly. :)

I mean it's obvious that you known what is good for OpenRC much-much better, anyway. And you shouldn't waste you time on to proving something to somebody. I'll respect any dicision.

@williamh
Copy link
Contributor

williamh commented Dec 8, 2014

I don't feel like I'm wasting time; we both agree that there is definitely a problem that needs to be fixed, so hang in there with me. :-)

Have you been able to put together the text description of how your code works? That may make it make more sense to me.

William

@williamh
Copy link
Contributor

@xaionaro:
Can you meet with myself or @dwfreed sometime on irc to chat again about this code? I'm not sure where we left it. We both agree that there is a problem we need to solve, so let's get together again and chat about it. :-)

@heroxbd
Copy link
Contributor

heroxbd commented Nov 10, 2015

@xaionaro, Hey Dimitry, long time no see. MIssed you a lot. Glad to tell that we have OpenRC 0.18.3 in Debian. Your updates has resolved all the build issues on ppc64el and kFreeBSD.

@williamh, William, sorry for being idle for long. The point of breaking a loop is not only for Debian, but to make OpenRC more robust against configuration mistakes. Such a mistake as dependency loop can be fetal (unable to boot). Dimitry has made the point that the loop solver breaks the weakest dependence in the loop, which makes sense.

The only concern for me is the code complexity, which I think, yes, it should be addressed cooperatively.

@xaionaro
Copy link
Author

Hello. Sorry that I still didn't fix the coding style of the patch as @williamh requested long time ago. I just have no time. It's quite bad years in my university right now (partly due to geopolitical problems), so I have to work a lot and more. :(

@stalkerg
Copy link

I suppose we should do something with this pull request.

@williamh
Copy link
Contributor

I'm against trying to remove loops behind the sceens; we have no way of knowing for sure what the relationships of the services should be. I would rather just warn about them and allow users to fix them.

@stalkerg
Copy link

stalkerg commented Mar 12, 2017

I would rather just warn about them and allow users to fix them.

For me it's ok but I worry about "rc_parallel=no" by default. It's not normal for modern init system.
OpenRC must push users to write strict and right rules.

For my opinion, you can reject this pull request.

@xaionaro
Copy link
Author

xaionaro commented Mar 13, 2017

I can add an option to not break loops automatically, today if required.

On the one hand this patch was written for Debian and it works there well enough, so as for me mission accomplished. On the other hand, I'd like to be able to say people that OpenRC can safely boot in parallel mode (and do not hang), so I'd like to see this patch be merged to the upstream code.

@stalkerg
Copy link

I can add an option to not break loops automatically, today if required.

At least it's good functional. But I don't know what thinking @williamh about it.

we have no way of knowing for sure what the relationships of the services should be.

If init system don't know about relationships then who knows? My main goal enables parallel by default.
What should we do for that?

@williamh
Copy link
Contributor

@xaionaro I don't want to break loops at all, so please just have this patch print error messages if loops are detected; I think loops indicate bugs which should be fixed by the init script authors. Also, please rebase the patch on current master. I'm definitely interested in this, so I'll take a look once it is updated.

@stalkerg What I meant above is, if we get into a deadlock situation, there is not really a smart way to resolve it automatically. We don't really know what services do, so we can't really decide which service to start first to break the deadlock. I think it is a better choice to warn that there is a loop when we are building the deptree but let the script author take care of it.

@stalkerg
Copy link

Maybe it's destiny but I have catch deadlock with parallel=on after upgrade to 0.24.

@williamh
Copy link
Contributor

I am uncomfortable with this pr because we should warn admins about loops and make them resolve the loops themselves rather than randomly drop dependencies. If you want to do that, feel free to update this and re-open it. Also, please avoid breaking the API. Thanks much.

@williamh williamh closed this Jul 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants