You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With linux_x86, foc_x86_32, nova_x86_32, ... the hello_client terminates with "[init -> hello_client] void* abort(): abort called", if the hello_server terminates. But with hw_pbxa9 the hello_client do not terminate with an exception.
I used the following patch to terminate the hello_server of the hello_tutorial repo:
diff --git a/repos/hello_tutorial/src/hello/server/main.cc b/repos/hello_tutorial/src/hello/server/main.cc
index 6dc2771..df1eb94 100644
--- a/repos/hello_tutorial/src/hello/server/main.cc+++ b/repos/hello_tutorial/src/hello/server/main.cc@@ -18,6 +18,7 @@
#include <root/component.h>
#include <hello_session/hello_session.h>
#include <base/rpc_server.h>
+#include <timer_session/connection.h>
namespace Hello {
@@ -54,6 +55,7 @@ namespace Hello {
using namespace Genode;
+
int main(void)
{
/*
@@ -83,11 +85,11 @@ int main(void)
enum { STACK_SIZE = 4096 };
static Rpc_entrypoint ep(&cap, STACK_SIZE, "hello_ep");
- static Hello::Root_component hello_root(&ep, &sliced_heap);+ Hello::Root_component hello_root(&ep, &sliced_heap);
env()->parent()->announce(ep.manage(&hello_root));
- /* We are done with this and only act upon client requests now. */- sleep_forever();+ static Timer::Connection timer;+ timer.msleep(2000);
return 0;
}
To test it I run
make run/hello
on the different base platforms.
I need this exception to automatically reconnect a client to another service with the same capability, if the first service terminates.
The text was updated successfully, but these errors were encountered:
A client is not supposed to get notified if the server stops working. It would be up to the common parent of both client and server to propagate such information if needed. But currently, there are no such scenarios.
I presume, the goal behind your investigation is the resilience against faulty servers. E.g., if a network driver dies, you'd like to reconnect to a fresh instance of the driver. If so, the proper way to build such a scenario would be to interpose the connection between the client and server by another component, let's call it "failsafe monitor". Such a failsafe monitor would start the flaky server as a child. It would also provide the session interface of the server to the outside. If a client opens a new session, the failsafe monitor would open a session at its child and forward all session operations to the child. Additionally, it could install a signal handler for receiving unexpected exceptions (like segmentation faults) produced by the child. When receiving such a signal, the failsafe monitor could restart the child. From the clients perspective, this restart remains transparent.
I recently opened an issue for a NIC failsafe monitor, see #1592
Following this approach, the client does not need to take special precautions with regard to the availability of the server. It just assumes that the server is responsive. If this assumption is violated, it is the ultimate will of their common parent. In my opinion, the attempt to build-in resilience into each client is a futile approach anyway. First, it increases the complexity of each client. And second, the handler code for responding to rare events (like a disappearing server) would remain largely untested anyway.
Following above discussion, I would like to close this issue. If there is a plan to build a generic solution for a fault-tolerance protocol, I would vote for a new issue with a corresponding title.
With linux_x86, foc_x86_32, nova_x86_32, ... the hello_client terminates with "[init -> hello_client] void* abort(): abort called", if the hello_server terminates. But with hw_pbxa9 the hello_client do not terminate with an exception.
I used the following patch to terminate the hello_server of the hello_tutorial repo:
To test it I run
on the different base platforms.
I need this exception to automatically reconnect a client to another service with the same capability, if the first service terminates.
The text was updated successfully, but these errors were encountered: