-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Handling network errors
Network errors can happen at many levels, on the local side, on the remote or somewhere along the way.
It is hard for applications to identify when an error occurs as it is not always something that can be identified in real time but rather when an additional event happens.
Network errors are a problem which each application have to deal with. Each application for its own use case and desired service level.
For something to "break" locally we need to close the network socket. This is something that PTB doesn't do voluntarily and basically if you've encountered such a case, it's a bug.
There is a suspicion that Telegram are closing connections on some conditions. Whether it is true or not we can't really tell. The application should still be able to deal with it.
- A flaky network connection can cause connections to seem alive, although they are useless.
- Datacenters or ISPs which cut (probably inactive) network connections.
- Azure drops inactive connections after 4 minutes by default (https://azure.microsoft.com/en-us/blog/new-configurable-idle-timeout-for-azure-load-balancer/).
There is no one way to handle networking issues. Each bot has its own usage pattern and unique network behaviour.
Some bots will require providing answer to user requests within a limited time, while others can take all the time they need. Some bots will be ok with "losing" a sent message once in a while, while others must ensure that the message was sent within a limited time frame. Some bots send big files, while others only receive small files. Etc., etc.
In addition, the network connection takes an important factor. Hosting your bot in a region with poor Internet (bandwidth / packet lost / disconnections) is the obvious example, but it is not the only problem. In case the bot is hosted "far" from the Telegram Bot servers you'll experience greater latency and possibly more packet loss. TCP should be able to handle packet loss, however, there are side effects for that (when will TCP identify the loss of a packet? what will be the window size? etc.).
Hosting the bot on a slow server can also affect performance and may be observed as networking issues.
Once you've understood the usage pattern of your bot (that includes the QoS) you can go into fine-tuning the ptb library for your needs.
PTB performs HTTPS requests using the telegram.request.BaseRequest
interface.
The method BaseRequest.do_request
accepts four parameters for controlling timeouts:
-
read_timeout
specifies the maximum amount of time (in seconds) to wait for a response from Telegram’s server -
write_timeout
specifies the maximum amount of time (in seconds) to wait for a write operation to complete (in terms of a network socket; i.e. POSTing a request or uploading a file) -
connect_timeout
specifies the maximum amount of time (in seconds) to wait for a connection attempt to a Telegram server -
pool_timeout
specifies the maximum amount of time (in seconds) to wait for a connection to become available from the connection pool
The built-in request.HTTPXRequest
(which implements BaseRequest
) uses a default of 5 seconds for {read, write, connect}_timeout
and 1 second for pool_timeout
.
The write_timeout
is overwritten to 20 seconds when using send methods which attach files (send_audio
, send_document
, etc.).
When sending big files, calling Bot.send_*(write_timeout=BIGGER_VALUE)
might be a good idea.
When using the standard HTTPXRequest
, changing the defaults of {read, write, connect, pool}_timeout
& connect_timeout
is done when initializing the Application
.
Application.builder().token("Token").read_timeout(7).get_updates_read_timeout(42).build()
Note that there is both ApplicationBuilder.read_timeout()
and ApplicationBuilder.get_updates_read_timeout
(and similarly for the other timeouts) since PTB uses two different request objects for Bot.get_updates
and all other bot methods.
See also the wiki page on the builder pattern.
When using Application.run_polling()
/Updater.start_polling()
, getUpdates
is achieved using Long Polling.
This means that the get_updates
request is kept alive for a (Telegram defined) timeout of X seconds to respond with new Update
(s).
Only if updates are available before the X seconds have passed, the request is closed right away and get_updates
is called again.
This also means that if a network error occurs during that timeout, it will be discovered only after the timeout has passed.
To be specific, when using the raw API (i.e. using Bot
directly), the value you pass for timeout
is added to the value of read_timeout
and that is the exact timing when the error will be detected.
In that mode, the default value of timeout
is 0 and the default value of read_timeout
is 2.
If you're using Updater.start_polling
/Application.run_polling()
then the default value of timeout
is overridden to 10. That means that the average time to identify network error is 6 seconds.
Depending on the SLA you'd like to provide, if you suffer from frequent errors during getUpdates
consider lowering the timeout
value to catch the errors faster.
Please take in mind that too short polling intervals may have undesirable side effects:
- Telegram servers might conceive it as abusive and block your bot.
- Increase the load on your server and network connection.
Depending on your OS you may be able to set socket options to perform low level tweaks.
In the default networking backend, HTTPXRequest
, there is a corresponding parameter for this.
When a network error occurs, be prepared to catch the raised exception and handle it according to your policy (do you want to retry? ignore? other?) or use PTBs built-in mechanism for exception handling. Note that for (network) errors that happen as part of Application.run_polling
/Updater.start_polling
are processed via the error_callback
parameter of Updater.start_polling
rather than Application.process_error
. If you whish to handle also those as part of Application.process_error
, you can e.g. use
def error_callback(error: Exception, application: Application) -> None:
application.create_task(application.process_error(update=None, error=error))
for that parameter. See also this thread for some discussion.
If you think of another way to improve stability from within ptb, please contact us (the maintainers).
- Wiki of
python-telegram-bot
© Copyright 2015-2024 – Licensed by Creative Commons
- Architecture Overview
- Builder Pattern for
Application
- Types of Handlers
- Working with Files and Media
- Exceptions, Warnings and Logging
- Concurrency in PTB
- Advanced Filters
- Storing data
- Making your bot persistent
- Adding Defaults
- Job Queue
- Arbitrary
callback_data
- Avoiding flood limits
- Webhooks
- Bot API Forward Compatiblity
- Frequently requested design patterns
- Code snippets
- Performance Optimizations
- Telegram Passport
- Bots built with PTB
- Automated Bot Tests