Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Untreated error callbacks when using websockets #361

Closed
bvanelli opened this issue Oct 4, 2022 · 4 comments · Fixed by #375
Closed

Untreated error callbacks when using websockets #361

bvanelli opened this issue Oct 4, 2022 · 4 comments · Fixed by #375

Comments

@bvanelli
Copy link
Contributor

bvanelli commented Oct 4, 2022

Turns out some errors are not handled and show full traceback when using websockets. For example this one, when I restart the nats server:

ERROR:nats.aio.client:nats: encountered error
Traceback (most recent call last):
  File "C:\Users\brunno.vanelli\PycharmProjects\base-python\venv\lib\site-packages\nats\aio\[client.py](http://client.py/)", line 2035, in _read_loop
    await self._ps.parse(b)
  File "C:\Users\brunno.vanelli\PycharmProjects\base-python\venv\lib\site-packages\nats\protocol\[parser.py](http://parser.py/)", line 93, in parse
    self.buf.extend(data)
TypeError: can't extend bytearray with int

Or this one, that I could not reproduce but I guess is related to reconnect:

ERROR:nats.aio.client:nats: encountered error
ERROR:asyncio:Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x0000022EB5AE54C0>
ERROR:asyncio:Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x0000022EB34A3130>

If anyone finds more of those please append to the issue.

I'll submit a merge request once I do more testing and find more edge cases.

@Rajiv91
Copy link

Rajiv91 commented Jun 26, 2023

Hey @bvanelli i'm seing the same error with the websocket that you reported under some conditions:

TypeError: can't extend bytearray with int
But this doesn't trigger the error nor the disconnection cb, it just stops the loop that parses the messages from the server.
The code says that if falls in some of those parse exceptions : it will stop running and its task has to be rescheduled.
But i'm not seing such a thing happening to reschedule within client.py, i'm talking specifically about the task:

self._reading_task = asyncio.get_running_loop().create_task( self._read_loop() )
So at the end when this happens my nats connection hangs and i have to reboot my python process to stablish it again and send the messages successfully. Do you know if this is a known issue that someone else is working on? or when they say that "its task has to be rescheduled" they mean the user should mess with the client.py code to reschedule the task in some place, because I can't find where they do it.

@bvanelli
Copy link
Contributor Author

@Rajiv91 Which version are you using? Do you have some short code describing the issue?

Also, it is a known issue that if you have an error callback that raises an exception, it could cause problems you described.

@Rajiv91
Copy link

Rajiv91 commented Jun 27, 2023

Hi @bvanelli thanks for your reply.
I'm using 2.2.0, let me test with the latest one. Unfortunately i don't have a short code, i don't have access on the server (to which i'm subscribing my nats client) that randomly is sending me ints instead of bytes tha cause the nats-py to hang but i can give you more details about the issue.
The main problem is the _read_loop inside nats/aio/client.py:

    async def _read_loop(self) -> None:
        """
        Coroutine which gathers bytes sent by the server
        and feeds them to the protocol parser.
        In case of error while reading, it will stop running
        and its task has to be rescheduled.
        """
        while True:
            try:
                should_bail = self.is_closed or self.is_reconnecting
                if should_bail or self._transport is None:
                    break
                if self.is_connected and self._transport.at_eof():
                    err = errors.UnexpectedEOF()
                    await self._error_cb(err) 
                    await self._process_op_err(err)
                    break
                b = await self._transport.read(DEFAULT_BUFFER_SIZE)
                await self._ps.parse(b)
            except errors.ProtocolError:
                await self._process_op_err(errors.ProtocolError())
                break
            except OSError as e:
                await self._process_op_err(e)
                break
            except asyncio.CancelledError:
                break
            except Exception as ex:
                _logger.error('nats: encountered error', exc_info=ex)
                break

So I am receiving and parsing the "PING" and "PONG" fine from the nats server, but randomly the other side instead of send me the bytes it sends me an int=1001 which makes the parse throw an error but due to it's a type error it falls in:

except Exception as ex:
      _logger.error('nats: encountered error', exc_info=ex)
      break 

Printing: TypeError: can't extend bytearray with int
Which makes the read_loop breaks and this error doesn't trigger the error callback nor the disconnection callback and causes my connection goes in limbo, I can't get it from here.
I think we could catch that error and call the self._process_op_err(e) to recover the connection, like this:

            except TypeError:
                await self._process_op_err(e)
                break

@bvanelli
Copy link
Contributor Author

@Rajiv91 I believe I solved it on my MR:

https://github.com/nats-io/nats.py/pull/375/files

There, I added a check for the disconnect cause. This should solve the issue you are describing. Let me know if it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants