Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hw_pbxa9: Hello_client do not fail with an exception, if the hello_server terminates #1626

Closed
twischer opened this issue Jul 8, 2015 · 2 comments

Comments

@twischer
Copy link

twischer commented Jul 8, 2015

With linux_x86, foc_x86_32, nova_x86_32, ... the hello_client terminates with "[init -> hello_client] void* abort(): abort called", if the hello_server terminates. But with hw_pbxa9 the hello_client do not terminate with an exception.

I used the following patch to terminate the hello_server of the hello_tutorial repo:

diff --git a/repos/hello_tutorial/src/hello/server/main.cc b/repos/hello_tutorial/src/hello/server/main.cc
index 6dc2771..df1eb94 100644
--- a/repos/hello_tutorial/src/hello/server/main.cc
+++ b/repos/hello_tutorial/src/hello/server/main.cc
@@ -18,6 +18,7 @@
 #include <root/component.h>
 #include <hello_session/hello_session.h>
 #include <base/rpc_server.h>
+#include <timer_session/connection.h>

 namespace Hello {

@@ -54,6 +55,7 @@ namespace Hello {

 using namespace Genode;

+
 int main(void)
 {
    /*
@@ -83,11 +85,11 @@ int main(void)
    enum { STACK_SIZE = 4096 };
    static Rpc_entrypoint ep(&cap, STACK_SIZE, "hello_ep");

-   static Hello::Root_component hello_root(&ep, &sliced_heap);
+   Hello::Root_component hello_root(&ep, &sliced_heap);
    env()->parent()->announce(ep.manage(&hello_root));

-   /* We are done with this and only act upon client requests now. */
-   sleep_forever();
+   static Timer::Connection timer;
+   timer.msleep(2000);

    return 0;
 }

To test it I run

 make run/hello

on the different base platforms.

I need this exception to automatically reconnect a client to another service with the same capability, if the first service terminates.

@nfeske
Copy link
Member

nfeske commented Jul 22, 2015

A client is not supposed to get notified if the server stops working. It would be up to the common parent of both client and server to propagate such information if needed. But currently, there are no such scenarios.

I presume, the goal behind your investigation is the resilience against faulty servers. E.g., if a network driver dies, you'd like to reconnect to a fresh instance of the driver. If so, the proper way to build such a scenario would be to interpose the connection between the client and server by another component, let's call it "failsafe monitor". Such a failsafe monitor would start the flaky server as a child. It would also provide the session interface of the server to the outside. If a client opens a new session, the failsafe monitor would open a session at its child and forward all session operations to the child. Additionally, it could install a signal handler for receiving unexpected exceptions (like segmentation faults) produced by the child. When receiving such a signal, the failsafe monitor could restart the child. From the clients perspective, this restart remains transparent.

I recently opened an issue for a NIC failsafe monitor, see #1592

Following this approach, the client does not need to take special precautions with regard to the availability of the server. It just assumes that the server is responsive. If this assumption is violated, it is the ultimate will of their common parent. In my opinion, the attempt to build-in resilience into each client is a futile approach anyway. First, it increases the complexity of each client. And second, the handler code for responding to rare events (like a disappearing server) would remain largely untested anyway.

To better understand the relationship between client and server, please consider the Section "3.2.4. Client-server relationship" of the Genode documentation http://genode.org/documentation/genode-foundations-15-05.pdf.

@skalk
Copy link
Member

skalk commented Sep 3, 2015

Following above discussion, I would like to close this issue. If there is a plan to build a generic solution for a fault-tolerance protocol, I would vote for a new issue with a corresponding title.

@skalk skalk closed this as completed Sep 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants