New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BDK Crashes occasionally, stack trace attached. #256
Comments
This looks like an issue with our Symphony pod going offline or having connection issues (Observed the same blip today). Is there any way we can add error handling to make sure the bot reconnects vs sitting on TimeoutError? |
Hi @dky Could you confirm that the bot is indeed stopping and not recovering from the error? Normally we should have a retry policy in place and the datafeed loop should not stop. |
@symphony-youri Yes, it hangs indefinitely and our users get super frustrated. The only way to recover is ctrl-c to break out of the loop and re-run the bot. All open to anything you would need on my end to see why the retry is failing. This happened to us just this Friday when our Pod got rebooted or something. |
It looks like the existing logic to retry on timeouts or client errors was not working as expected. For instance the DF loop would exit on timeouts. The aiohttp client can raise different errors (https://docs.aiohttp.org/en/stable/client_reference.html#client-exceptions) and also asyncio.TimeoutError. I tried locally by misconfiguring the hostname (it cannot be resolved) and the port (triggers a timeout). However this means that retries will be performed upon startup even if the hostname cannot be resolved for instance. Which is not 100% like the Java BDK who has a different logic for authentication and datafeed retries. The problem is how sessions are refreshed, here lazily so we have two nested @Retry calls making it difficult to have different strategies. Fixes finos#256
It looks like the retry logic is just wrong and not catching the proper errors. I opened #260 to address that. |
Thanks! Hope to see fix released soon and thanks for the help! |
We will have to release a 2.2.1 with this change, it should be there shortly, hopefully by the end of the week. |
It looks like the existing logic to retry on timeouts or client errors was not working as expected. For instance the DF loop would exit on timeouts. The aiohttp client can raise different errors (https://docs.aiohttp.org/en/stable/client_reference.html#client-exceptions) and also asyncio.TimeoutError. I tried locally by misconfiguring the hostname (it cannot be resolved) and the port (triggers a timeout). However this means that retries will be performed upon startup even if the hostname cannot be resolved for instance. Which is not 100% like the Java BDK who has a different logic for authentication and datafeed retries. The problem is how sessions are refreshed, here lazily so we have two nested @Retry calls making it difficult to have different strategies. Fixes finos#256 (cherry picked from commit 5112b21)
Hey guys, we regularly receive exceptions when running the bot. We are using the bot framework 2.0.1. I've just bumped us up to 2.1.0 but curious if anyone had insight what this stack indicates.
The text was updated successfully, but these errors were encountered: