Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful exception handling in Notifier thread #907

Open
HSE83 opened this issue Sep 22, 2020 · 7 comments
Open

Graceful exception handling in Notifier thread #907

HSE83 opened this issue Sep 22, 2020 · 7 comments

Comments

@HSE83
Copy link

HSE83 commented Sep 22, 2020

The notifier module does not handle exceptions well. Some drivers (in this case) throw an exception if there is no other participant on the bus yet (messages are not ACKed) which effectively kills the listener thread of the Notifier class. If asyncio is used (loop=valid object) this leads to a gap and possible messages are lost.

A list of errors to be suppressed should be added to the Notifier class, enabling the user to disable certain errors. Alternatively, (at least) the iscan module needs to be rewritten not to forward any errors.

The following workaround is in place here:

class Notifier(can.Notifier):
    def _rx_thread(self, bus):
        while self._running:
            try:
                super()._rx_thread(bus)
            except can.interfaces.iscan.IscanError:
                pass
@zariiii9003
Copy link
Collaborator

Did you try implementing the on_error() method in your Listener class?

@HSE83
Copy link
Author

HSE83 commented Sep 22, 2020

I am rather new to Python but as far as I understand it, it does not make a difference if on_error() is implemented or not:

    def _rx_thread(self, bus: BusABC):
        msg = None
        try:
            while self._running:
                if msg is not None:
                    with self._lock:
                        if self._loop is not None:
                            self._loop.call_soon_threadsafe(
                                self._on_message_received, msg
                            )
                        else:
                            self._on_message_received(msg)
                msg = bus.recv(self.timeout)
        except Exception as exc:
            self.exception = exc
            if self._loop is not None:
                self._loop.call_soon_threadsafe(self._on_error, exc)
                raise
            elif not self._on_error(exc):
                raise

If bus.recv throws and error, execution leaves the "while self._running" loop and jumps to the "except" block. There, on_error() is called. Depending on whether or not asyncio is used you have the chance to prevent the error from being raised again. Nevertheless, the _rx_thread function is then left - which means the thread exists (afaik - this information is taken from the threading documentation). If the thread exists, no further messages are read from the bus.

Even if the except block was inside the while loop there is no chance to prevent the error from being raised again because of the delayed execution of on_error() in another thread. So if asyncio is used, things will unfortunately always crash.

So in my opinion two things should be changed: First, error handling needs to be inside the while loop and secondly, the default case if self._loop is set must be to post the error to the loop and ignore it. The on_error() function can then decide if the error is severe enough to stop the listener by calling stop() or just leave it running. In that case maybe a slight delay (sleep) should probably be added to not call a failing bus.recv function too often.

@zariiii9003
Copy link
Collaborator

You are right. But usually there is no point in keeping the thread alive if there is a bus error.
What kind of errors are raised by iscan here?

Pull requests are always welcome if you would like to improve the Notifier.

@HSE83
Copy link
Author

HSE83 commented Sep 25, 2020

We are using the iscan module for device testing. Some values must be reported by the (CAN-enabled) bootloader of a device which can only be triggered while powering the device on for the first time. So to get this message, the only device able to ACK messages from the PC must be turned off before - hence there is no ACK by definition. So the following happens:

  1. DUT is turned off to invoke bootloader
  2. PC sends "tester present" signal to device
  3. No ACK is received, iscan transitions. iscan driver sets bus to "BUS OFF"
  4. Notifier thread calls "read" and gets killed error because of try-exception block
  5. Reaction on "on_error" is to re-open the bus
  6. No message is received because the bootloader has already sent the information somewhere between steps 3/4

If we just ignore the error, the "read" function gets regularly called - it reports BUS OFF twice and is operational again after that. The message from the bootloader is then not missed.

I will see with my boss if I am allowed to propose a fix.

@felixdivo
Copy link
Collaborator

@HSE83 Were you able to get permission? Seems like the Bus initialization should only return once the bus is actually ready.

@felixdivo
Copy link
Collaborator

@HSE83 do you still plan on proposing a fix? :)

@T1bbY
Copy link

T1bbY commented May 11, 2021

@HSE83 Do you have some updates on this problem?

Would be nice if we can get this fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants