Permalink
Browse files

blog update

  • Loading branch information...
1 parent 0e6a2f9 commit 0ce2ff2e7ea44fe621c83091d4458c6b396522a9 @qnikst committed Aug 8, 2013
Showing with 1,297 additions and 34 deletions.
  1. +170 −0 drafts/1.html
  2. +5 −5 index.html
  3. +4 −0 posts.html
  4. +186 −0 posts/2013-08-08-openrc-supervision-using-cgroups.html
  5. +120 −27 rss.xml
  6. +56 −0 tags/OpenRC.html
  7. +132 −0 tags/OpenRC.xml
  8. +4 −0 tags/gentoo.html
  9. +120 −1 tags/gentoo.xml
  10. +4 −0 tags/linux.html
  11. +120 −1 tags/linux.xml
  12. +56 −0 tags/programming.html
  13. +132 −0 tags/programming.xml
  14. +56 −0 tags/supervision.html
  15. +132 −0 tags/supervision.xml
View

Large diffs are not rendered by default.

Oops, something went wrong.
View
@@ -30,6 +30,10 @@
Recent posts
<ul>
<li>
+ <a href="./posts/2013-08-08-openrc-supervision-using-cgroups.html">Supervision in pure OpenRC using cgroup subsystem.</a>
+ - <em>July 31, 2013</em> - by <em>Alexander Vershilov</em>
+</li>
+<li>
<a href="./posts/2013-04-11-using-tqueues-in-conduit.html">Using queues in conduits</a>
- <em>April 11, 2013</em> - by <em>Alexander Vershilov</em>
</li>
@@ -65,14 +69,10 @@
<a href="./posts/2013-01-20-automata.html">2013-01-20-automata</a>
- <em>January 20, 2013</em> - by <em>Alexander Vershilov</em>
</li>
-<li>
- <a href="./posts/2013-01-19-announcing-imagemagick.html">announcing imagemagick-hs</a>
- - <em>January 19, 2013</em> - by <em>Alexander Vershilov</em>
-</li>
</ul>
-<p>Browse: <a href="./tags/cgroups.html">cgroups (1)</a>, <a href="./tags/gentoo.html">gentoo (1)</a>, <a href="./tags/hakyll.html">hakyll (1)</a>, <a href="./tags/haskell.html">haskell (7)</a>, <a href="./tags/latex.html">latex (1)</a>, <a href="./tags/linux.html">linux (1)</a>, <a href="./tags/pam.html">pam (1)</a>, <a href="./tags/phys.html">phys (1)</a>, <a href="./tags/projects.html">projects (1)</a>, <a href="./tags/resourcet.html">resourcet (1)</a>, <a href="./tags/univ.html">univ (1)</a>, <a href="./tags/web.html">web (1)</a></p>
+<p>Browse: <a href="./tags/OpenRC.html">OpenRC (1)</a>, <a href="./tags/cgroups.html">cgroups (1)</a>, <a href="./tags/gentoo.html">gentoo (2)</a>, <a href="./tags/hakyll.html">hakyll (1)</a>, <a href="./tags/haskell.html">haskell (7)</a>, <a href="./tags/latex.html">latex (1)</a>, <a href="./tags/linux.html">linux (2)</a>, <a href="./tags/pam.html">pam (1)</a>, <a href="./tags/phys.html">phys (1)</a>, <a href="./tags/programming.html">programming (1)</a>, <a href="./tags/projects.html">projects (1)</a>, <a href="./tags/resourcet.html">resourcet (1)</a>, <a href="./tags/supervision.html">supervision (1)</a>, <a href="./tags/univ.html">univ (1)</a>, <a href="./tags/web.html">web (1)</a></p>
<footer>
Site generated using <a href="http://jaspervdj.be/hakyll">Hakyll</a> using <a href="http://johnmacfarlane.net/pandoc/">pandoc</a>
View
@@ -30,6 +30,10 @@
<h1>All posts</h1>
<ul>
<li>
+ <a href="./posts/2013-08-08-openrc-supervision-using-cgroups.html">Supervision in pure OpenRC using cgroup subsystem.</a>
+ - <em>July 31, 2013</em> - by <em>Alexander Vershilov</em>
+</li>
+<li>
<a href="./posts/2013-04-11-using-tqueues-in-conduit.html">Using queues in conduits</a>
- <em>April 11, 2013</em> - by <em>Alexander Vershilov</em>
</li>
@@ -0,0 +1,186 @@
+<!DOCTYPE html>
+<html>
+<head>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+ <title>Qnikst blog - Supervision in pure OpenRC using cgroup subsystem.</title>
+ <!-- Bootstrap -->
+ <link href="../css/bootstrap.min.css" rel="stylesheet" media="screen">
+ <style>
+ body {
+ padding-top: 60px; /* 60px to make the container go all the way to the bottom of the topbar */
+ }
+ </style>
+ <script src="http://code.jquery.com/jquery-latest.js"></script>
+ <script src="../js/bootstrap.min.js"></script>
+
+</head>
+<body>
+ <div class="navbar navbar-fixed-top navbar-inverse">
+ <div class="navbar-inner">
+ <a class="brand" href="../">Qnikst blog</a>
+ <ul class="nav ">
+ <li class="active"><a href="../">Home</a></li>
+ <li><a href="../posts.html">Blog</a></li>
+ <li><a href="../projects.html">Projects</a></li>
+ <li><a href="../contact.html">Contacts</a></li>
+ </ul>
+ </div>
+ </div>
+ <div class="container">
+ <div class="page-header">
+ <h1>Supervision in pure OpenRC using cgroup subsystem. <br /><small><strong>July 31, 2013</strong></small></h1>
+</div>
+
+<h2 id="abstract">Abstract</h2>
+<p>This post describes how it’s possible to improve cgroup support in OpenRC to support user hooks, and shows a way to create basic supervision daemon based on cgroups.</p>
+<p>This post describes OpenRC-0.11/0.12_beta and some things can differ in later versions. Please notify me to post updates here if you find a differences.</p>
+<h2 id="introduction">Introduction</h2>
+<h3 id="the-problem">The problem</h3>
+<p>In a general case there are many services that should be run and restarted if they fails. There are many other subproblems like when we should restart services and when not. Many existing systems can solve those issues but have different trade-offs. In this post I’ll try to present a simple mechanism that allowes to create basic supervision and other nice things.</p>
+<h3 id="idea">Idea</h3>
+<p>Linux kernel provides a mechanism to track groups of processes - <code>Cgroups</code>. All process childs will belong to the same cgroups and that groups are easily trackable from user space. If you want to understand cgroups better you may read following docs <a href="https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt">cgroups</a>. Cgroups provides a way of setting limits and controlling groups, that is also usefull but at this moment it’s out of the scope.</p>
+<p>When all processes dies kernel will call ‘release_notify_agent’ script and will provide a path to cgroup, this may be used to remove empty cgroups and make some additional actions.</p>
+<p>Idea is that we can check service state to understand if we need to restart it.</p>
+<h2 id="details">Details</h2>
+<h3 id="implementation">Implementation</h3>
+<p>Here are improvements and files that should be added to OpenRC to provide required functionallity.</p>
+<h4 id="restart-daemon">Restart daemon</h4>
+<p>First we need to create a deamon for restarting a services, because we can’t start service from agent, as it has <code>PF_NO_SETAFFINITY</code> flag and thus cgroups will not work for any of it’s children. So lets have a very simple daemon, it will be extended in the next posts</p>
+<pre class="sourceCode bash"><code class="sourceCode bash"><span class="co">#!/bin/sh</span>
+<span class="kw">if [</span> <span class="ot">$#</span> <span class="ot">-lt</span> 1<span class="kw"> ]</span> ; <span class="kw">then</span>
+ <span class="kw">echo</span> <span class="st">&quot;usage is </span><span class="ot">$0</span><span class="st"> &lt;path to fifo&gt;&quot;</span>
+ <span class="kw">exit</span> 1
+<span class="kw">fi</span>
+
+<span class="kw">while [</span> <span class="ot">-p</span> <span class="ot">$1</span><span class="kw"> ]</span> ; <span class="kw">do</span>
+ <span class="kw">while</span> <span class="kw">read</span> <span class="ot">line</span> ; <span class="kw">do</span>
+ <span class="kw">echo</span> <span class="st">&quot;rc-service </span><span class="ot">$line</span><span class="st">&quot;</span><span class="kw">;</span>
+ <span class="kw">done</span> <span class="kw">&lt;</span><span class="ot">$1</span>
+<span class="kw">done</span></code></pre>
+<h4 id="release-notify-agent-improvement">Release notify agent improvement</h4>
+<p>Current release notify agent is very simple idea is to extend it to support user hooks. There are some different way to do it:</p>
+<ol style="list-style-type: decimal">
+<li>Add it to the service state. Requires hook in a script</li>
+<li>Create static structure in a filesystem</li>
+</ol>
+<p>We will use 2. as it’s simplier and doesn’t lead to a init script hacking. We will have following file structure:</p>
+<p>In /etc/conf.d/cgroups there will be hooks ‘cgroup-release’ for default one ‘service-name.cgroup-release’ for service specific one. Here is my example.</p>
+<pre><code>/etc/conf.d/cgroups/
+|-- cgroup-release # default release hook
+|-- service1.cgroup-release -&gt; service-restart.cgroup-release # service release hook
+`-- service-restart.cgroup-release # example script
+</code></pre>
+<p>This approach doesn’t scale on a multiple hooks but it may be improved after discussion with upstream. Each script can return $RC_CGROUP_CONTINUE exit code, so cgroup will be not deleted after a hook.</p>
+<p>Here is script itself (newer version can be found on <a href="https://github.com/qnikst/openrc/blob/cgroups.release_notification/sh/cgroup-release-agent.sh.in">github</a>):</p>
+<pre class="sourceCode bash"><code class="sourceCode bash"><span class="ot">PATH=</span>/bin:/usr/bin:/sbin:/usr/sbin
+<span class="ot">cgroup=</span>/sys/fs/cgroup/openrc
+<span class="ot">cgroup_rmdir=</span>1
+<span class="ot">RC_SVCNAME=${1}</span>
+
+<span class="kw">if [</span> <span class="ot">-n</span> <span class="st">&quot;</span><span class="ot">${RC_SVCNAME}</span><span class="st">&quot;</span><span class="kw"> ]</span> ; <span class="kw">then</span>
+<span class="ot">hook=</span>@SYSCONFDIR@/conf.d/cgroups/<span class="ot">${RC_SVCNAME}</span>.cgroup-release
+<span class="kw">[</span> <span class="ot">-f</span> <span class="st">&quot;</span><span class="ot">$hook</span><span class="st">&quot;</span> <span class="ot">-a</span> <span class="ot">-x</span> <span class="st">&quot;</span><span class="ot">$hook</span><span class="st">&quot;</span><span class="kw"> ]</span> <span class="kw">||</span> <span class="ot">hook=</span>@SYSCONFDIR@/conf.d/cgroups/cgroup-release;
+<span class="kw">if [</span> <span class="ot">-x</span> <span class="st">&quot;</span><span class="ot">$hook</span><span class="st">&quot;</span><span class="kw"> ]</span>; <span class="kw">then</span>
+<span class="st">&quot;</span><span class="ot">$hook</span><span class="st">&quot;</span> <span class="kw">cleanup</span> <span class="st">&quot;</span><span class="ot">$RC_SVCNAME</span><span class="st">&quot;</span> <span class="kw">||</span> <span class="kw">case</span> <span class="ot">$?</span><span class="kw"> in</span> <span class="ot">$RC_CGROUP_CONTINUE</span><span class="kw">)</span> <span class="ot">cgroup_rmdir=</span>0<span class="kw">;;</span> <span class="kw">esac</span> ;
+<span class="kw">fi</span>
+<span class="kw">else</span>
+<span class="ot">cgroup_rmdir=</span>1
+<span class="kw">fi</span>
+
+<span class="kw">if [</span> <span class="ot">${cgroup_rmdir}</span> <span class="ot">-a</span> <span class="ot">-d</span> <span class="ot">${cgroup}</span>/<span class="st">&quot;</span><span class="ot">$1</span><span class="st">&quot;</span><span class="kw"> ]</span>; <span class="kw">then</span>
+<span class="kw">for</span> <span class="ot">$c</span> <span class="kw">in</span> <span class="kw">/sys/fs/cgroup/*</span> <span class="kw">;</span> <span class="kw">do</span>
+<span class="kw">rmdir</span> <span class="st">&quot;</span><span class="ot">${c}</span><span class="st">&quot;</span>/openrc_<span class="st">&quot;</span><span class="ot">$1</span><span class="st">&quot;</span>
+<span class="kw">done</span>;
+<span class="kw">rmdir</span> <span class="ot">$cgroup</span>/<span class="st">&quot;</span><span class="ot">${1}</span><span class="st">&quot;</span>
+<span class="kw">fi</span></code></pre>
+<p>Restart service script. This script simply checks service state and if it’s 32 (service failed) then start a new instance and set <code>$RC_CGROUP_CONTINUE</code></p>
+<pre class="sourceCode bash"><code class="sourceCode bash"><span class="co">#!/bin/sh</span>
+<span class="co"># This script is run for service that need to be restarted</span>
+<span class="co"># if it's last process leaves cgroup.</span>
+
+<span class="ot">action=$1</span>
+<span class="ot">service=$2</span>
+
+<span class="kw">if [</span> x<span class="ot">$action</span> <span class="ot">==</span> x<span class="st">&quot;cleanup&quot;</span><span class="kw"> ]</span> ; <span class="kw">then</span>
+
+ <span class="kw">rc-service</span> <span class="ot">$service</span> status <span class="kw">&gt;</span> /dev/null
+ <span class="kw">case</span> <span class="ot">$?</span><span class="kw"> in</span>
+ 32<span class="kw">)</span>
+ <span class="kw">/etc/init.d/</span><span class="ot">${service}</span> <span class="kw">-d</span> restart
+ <span class="kw">exit</span> <span class="ot">$RC_CGROUP_CONTINUE</span>
+ <span class="kw">;;</span>
+ *<span class="kw">)</span>
+ <span class="kw">return</span> 0<span class="kw">;;</span>
+ <span class="kw">esac</span>
+<span class="kw">fi</span></code></pre>
+<h3 id="other-solutions">Other solutions</h3>
+<p>The general supervision is quite complicated problems as there are many conditions when we can think that our service failed, like:</p>
+<ul>
+<li>main process dies</li>
+<li>all service children dies</li>
+<li>service to not write logs for some time</li>
+<li>big resource memory/cpu consuming</li>
+<li>service to not respond on logs for some time</li>
+<li>log fd is closed.</li>
+</ul>
+<p>Some of the options can be translated to another, like big resource consuming can be translated to process death by setting correct limits. And process death (and in some cases even children death) can be tracked by log fd (in case of a process in background).</p>
+<p>One more thing that you may need complicated hooks, that have a state do decide what to do with failed service, like do not restart if it was failed many times in a small time period.</p>
+<p>So full features system will be very complicated so non-specialized subsystems address only a part of a problem domain. Here are some examples for other supervision systems:</p>
+<ul>
+<li>monit</li>
+<li>s6</li>
+<li>daemon-tools</li>
+<li>angel</li>
+<li>systemd</li>
+<li>upstart</li>
+</ul>
+<h2 id="related-work">Related work</h2>
+<ol style="list-style-type: decimal">
+<li>work on inclusion of a user hooks to OpenRC release agent.</li>
+<li>improve restart script to track really dead services that can be restart</li>
+</ol>
+<h2 id="conclusions-and-futher-work">Conclusions and futher work</h2>
+<p>It’s possible to create a very simple and extensible supervision system on the top of OpenRC, by extending notification systems. Also there are more usecases for it, like:</p>
+<pre><code>* adding system wide notification mechanism via dbus
+* additional logging system</code></pre>
+<hr />
+<div class="pull-right">
+ <em>Alexander Vershilov</em>
+ <a href="http://creativecommons.org/licenses/by-nc-sa/3.0"><img src="http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" /></a>
+</div>
+<br class="clearfix" />
+
+
+<div id="disqus_thread"></div>
+ <script type="text/javascript">
+ /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
+ var disqus_shortname = 'qnikst'; // required: replace example with your forum shortname
+
+ (function() {
+ var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+ dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+ (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+ })();
+ </script>
+ <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
+ <a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
+
+
+ <footer>
+ Site generated using <a href="http://jaspervdj.be/hakyll">Hakyll</a> using <a href="http://johnmacfarlane.net/pandoc/">pandoc</a>
+ </footer>
+ </div>
+<script type="text/javascript">
+ // <noscript> я очень хочу вас посчитать, напишите комментарий хотя бы, пожааалуйста </noscript>
+var _gaq = _gaq || [];
+_gaq.push(['_setAccount', 'UA-38941774-1']);
+_gaq.push(['_trackPageview']);
+
+(function() {
+ var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+ ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+ var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+})();
+</script>
+</body>
+</html>
Oops, something went wrong.

0 comments on commit 0ce2ff2

Please sign in to comment.