Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BDK Crashes occasionally, stack trace attached. #256

Closed
dky opened this issue Feb 10, 2022 · 6 comments · Fixed by #260
Closed

BDK Crashes occasionally, stack trace attached. #256

dky opened this issue Feb 10, 2022 · 6 comments · Fixed by #260
Milestone

Comments

@dky
Copy link

dky commented Feb 10, 2022

Hey guys, we regularly receive exceptions when running the bot. We are using the bot framework 2.0.1. I've just bumped us up to 2.1.0 but curious if anyone had insight what this stack indicates.

Traceback (most recent call last):
  File "/home/dky/.pyenv/versions/3.9.6/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/dky/.pyenv/versions/3.9.6/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/dky/git/bot/src/__main__.py", line 131, in <module>
    asyncio.run(run())
  File "/home/dky/.pyenv/versions/3.9.6/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/dky/.pyenv/versions/3.9.6/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/home/dky/git/bot/src/__main__.py", line 119, in run
    await datafeed_loop.start()
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/service/datafeed/abstract_datafeed_loop.py", line 99, in start
    await self._run_loop()
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/service/datafeed/abstract_datafeed_loop.py", line 149, in _run_loop
    await self._run_loop_iteration()
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/service/datafeed/datafeed_loop_v1.py", line 54, in _run_loop_iteration
    events = await self._read_datafeed()
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/retry/_asyncio.py", line 118, in async_wrapped
    return await fn(*args, **kwargs)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/retry/_asyncio.py", line 80, in __call__
    do = await self.iter(retry_state=retry_state)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/retry/_asyncio.py", line 41, in iter
    should_retry = await self.retry(retry_state=retry_state)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/retry/strategy.py", line 119, in read_datafeed_retry
    raise exception
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/retry/_asyncio.py", line 83, in __call__
    result = await fn(*args, **kwargs)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/service/datafeed/datafeed_loop_v1.py", line 63, in _read_datafeed
    events = await self._datafeed_api.v4_datafeed_id_read_get(id=self._datafeed_id,
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/core/client/trace_id.py", line 46, in add_x_trace_id_header
    return await func(*args, **kwargs)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/gen/api_client.py", line 195, in __call_api
    response_data = await self.request(
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/gen/rest.py", line 190, in GET
    return await self.request("GET", url,
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/symphony/bdk/gen/rest.py", line 165, in request
    r = await self.pool_manager.request(**args)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 913, in start
    self._continue = None
  File "/home/dky/git/bot/env/lib/python3.9/site-packages/aiohttp/helpers.py", line 718, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
@dky
Copy link
Author

dky commented Feb 11, 2022

This looks like an issue with our Symphony pod going offline or having connection issues (Observed the same blip today). Is there any way we can add error handling to make sure the bot reconnects vs sitting on TimeoutError?

@symphony-youri
Copy link
Contributor

Hi @dky

Could you confirm that the bot is indeed stopping and not recovering from the error? Normally we should have a retry policy in place and the datafeed loop should not stop.

@dky
Copy link
Author

dky commented Feb 15, 2022

@symphony-youri Yes, it hangs indefinitely and our users get super frustrated. The only way to recover is ctrl-c to break out of the loop and re-run the bot. All open to anything you would need on my end to see why the retry is failing. This happened to us just this Friday when our Pod got rebooted or something.

symphony-youri added a commit to symphony-youri/symphony-api-client-python that referenced this issue Feb 15, 2022
It looks like the existing logic to retry on timeouts or client errors
was not working as expected. For instance the DF loop would exit on
timeouts.

The aiohttp client can raise different errors
(https://docs.aiohttp.org/en/stable/client_reference.html#client-exceptions)
and also asyncio.TimeoutError.

I tried locally by misconfiguring the hostname (it cannot be resolved)
and the port (triggers a timeout).

However this means that retries will be performed upon startup even if
the hostname cannot be resolved for instance. Which is not 100% like the
Java BDK who has a different logic for authentication and datafeed
retries. The problem is how sessions are refreshed, here lazily so we
have two nested @Retry calls making it difficult to have different
strategies.

Fixes finos#256
@symphony-youri
Copy link
Contributor

It looks like the retry logic is just wrong and not catching the proper errors. I opened #260 to address that.

@dky
Copy link
Author

dky commented Feb 15, 2022

Thanks! Hope to see fix released soon and thanks for the help!

@symphony-youri
Copy link
Contributor

We will have to release a 2.2.1 with this change, it should be there shortly, hopefully by the end of the week.

@symphony-youri symphony-youri added this to the 2.2.1 milestone Feb 15, 2022
symphony-youri added a commit to symphony-youri/symphony-api-client-python that referenced this issue Feb 25, 2022
It looks like the existing logic to retry on timeouts or client errors
was not working as expected. For instance the DF loop would exit on
timeouts.

The aiohttp client can raise different errors
(https://docs.aiohttp.org/en/stable/client_reference.html#client-exceptions)
and also asyncio.TimeoutError.

I tried locally by misconfiguring the hostname (it cannot be resolved)
and the port (triggers a timeout).

However this means that retries will be performed upon startup even if
the hostname cannot be resolved for instance. Which is not 100% like the
Java BDK who has a different logic for authentication and datafeed
retries. The problem is how sessions are refreshed, here lazily so we
have two nested @Retry calls making it difficult to have different
strategies.

Fixes finos#256

(cherry picked from commit 5112b21)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants