-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Session threads left running after Client.stop, occurs in long running process #68
Comments
This is the bulk of what my program is doing (cut down) for testing purposes. I am experiencing this on windows. import threading
import time
from pyrogram import Client, RawUpdateHandler, Message
from pyrogram.api import types as api_types
from pyrogram.client import message_parser
import uuid
app = Client("pyrogram.session", api_id=999999, api_hash="hash_here")
def update_handler(client, update, users: dict, chats: dict):
if not isinstance(update, (api_types.UpdateNewChannelMessage, api_types.UpdateNewMessage)):
return
update_message: api_types.Message = update.message
parsed_message: Message = message_parser.parse_message(
client,
update.message,
users,
chats
)
if isinstance(update_message.media, (api_types.MessageMediaPhoto, api_types.MessageMediaDocument)):
m_name = str(uuid.uuid4())
print('Download media: downloads/'+m_name)
client.download_media(parsed_message, file_name=m_name, block=False)
print(update_message.message)
app.add_handler(RawUpdateHandler(update_handler))
app.start()
app.idle()
time.sleep(5)
for i in (x for x in threading.enumerate() if x.name != 'MainThread'):
print(i.name) |
Hi, thanks for the detailed report!
Hm, currently this can be true only for the NetWorker thread because Session.stop() isn't waiting for it to finish, and would be a really rare case. I think de33247 is gonna fix this. For any other thread there was already a call to join(). Regarding threads count. I have a welcome bot running H24 on a linux box and I often monitor its resources usage, but the threads count never exceeded the maximum amount possible. I also tried stopping with CTRL+C and restarting it successfully right now after being online for days, so this could really be a windows only problem. Having said this, looks like Session.start() got called twice in a row somehow and the references to the previous threads were lost. but I can't really see any case where this can happen. Did you notice any weird behaviour when handling updates? These extra threads started (which, again, I've seen none in linux and might be a windows only issue) should mess with the queues. About your snippet, I tried to run that on windows and simulating connection drops multiple times on a rather busy account but I wasn't able to replicate the issue. |
I begin to get "AUTH_BYTES_INVALID" API errors from the download worker, or rather the issue starts with that and maybe for other API exceptions. I've put this at the top of my update handler in my project and the test program I gave: print("Update thread: " + threading.current_thread().name)
print("Other threads: " +
(',\n' + ' ' * 15).join(x.name for x in threading.enumerate() if x.name != 'MainThread')) After a while I get the following: A bunch of good updates where the active threads look normal,
In the above it looks like: NextSaltThread, PingThread, RecvThread, and NetWorker#1 get leaked. I get "AUTH_BYTES_INVALID" pretty frequently after this occurs, and each time it happens more threads are spawned. I do have one lock on a threading.Lock, followed by an inter-process lock inside my update handler that happens when adding message content into a full-text index. That could be aggravating this, I'm still narrowing it down since it takes a while to occur. For now I've removed my synchronization stuff to see if that's whats getting the threads to hang, if so I can probably hoist indexing out into a background thread instead of locking in the update handler. |
quick update. I had both the test program I attached to this issue above, and my project running with my locking logic taken out while writing the above post. Both appear to have started exhibiting the same behavior after "AUTH_BYTES_INVALID" was encountered. I am currently running your master branch installed in development mode. |
I think I see what might be happening, a pyrogram.api.errors.error.UnknownError exception is raised from here and not handled. The session created just before it is never stopped which leaks the threads. pyrogram/pyrogram/client/client.py Line 2700 in debc459
|
Nice and detailed explanations, really appreciated!
Good, you got it! I now do really think this is the cause of the issue. Can I ask how many medias you are downloading in a given time frame, say, 5 minutes? This is just to know how often the new media download sessions are started. #40 is fixing some other session-related problems but I think it can be applied to this as well and that would also make it possible to completely avoid AUTH_BYTES_INVALID errors. I didn't merge it yet because with the current implementation of Session it would create quite a lot of threads for a single client (that's because there are 5 main data centers plus some other CDN, and there could be a session for each of them). |
Hi, this issue should be fixed now. To test it right away simply install the master branch version, but be aware that there were quite some new changes as well. Relevant commit: 83febf2 |
Just to answer this, about 1 file per minute in the 80 minute span I measured mid day. Seems like caching the session object would definitely help with flood waits and pitfalls, I'll try it out and see if anything crops up. |
The real cause of Now you can safely use more than one download worker. I've tested against https://t.me/bestmemes with 10 download workers and I was able to download all medias (~8k / 815 MB) in about 5 minutes. |
I see, that makes sense. I did notice though in master that the session might still manage to leak threads if this throws something (since the try block was taken out): pyrogram/pyrogram/client/client.py Line 2997 in 83febf2
I'm not sure if that's still relevant given what it could possibly throw now, but it might be a good idea to shut it down. |
Hmm, right, but it has become really unlikely to fail now. If you want a possibility for that to happen again you need to restart the whole Client hundreds of times in a short period (which you don't ever do, unless you are deliberately stress testing things). #40 was getting flood waits after about 400 downloads, and that means that Export/Import was called 400 times. Now it is called only once per Client run, per session. I think that adding the sessions to the media_sessions before even starting them would allow the Client to raise some error in case something really bad happens (like telegram goes down at the same time pyrogram wants to start a new media session (any other re-connections after a session has been successfully connected are already handled)). What do you think? |
I think things might get a little complicated if a session that is not started gets into the cache (if the program is going to continue running after that). Like if starting the session fails, but it is in the cache already and not running. You might have to attempt to start it when it eventually gets pulled out of the cache next, and also take into account which sessions are not started when shutting them all down later on. Could be easier to guarantee that only started sessions exist in the cache since they can be re-used immediately without prior setup. Caching them immediately after they start successfully and before they are given any tasks could work, |
@EriHoss is right, the session can know how to clean itself in case for some reason it fails to start, hence it makes much more sense to cache sessions right after they start. Also, clear the media_sessions dict when stopping to get rid of closed sessions.
That's correct and makes much more sense, thanks! ^^
Regarding this, I forgot to clear the session cache dict (f29b8bb fixes this). The idea is to discard every media session once you stop the client and recreate them the next start. Media sessions are also cheap enough now and their keys can only be used to download/upload media, nothing else. So, I think that keeping the sessions online and cleaning when stopping the client is enough for the current implementation. As I said, for ideas on how to improve this even further is better to keep another issue as enhancement, but if you have other concerns let me know. Oh, and thanks again for your detailed bug reports! |
Threads created in Session.start() appear to be left running in a loop after stopping a client that has been running for some time.
On CTRL-C this causes the program to hang, because the threads are not deamon threads and continue to keep the program alive.
It appears that new threads might be getting created before the previous successfully exited when session reconnects while running?
In my project I have added something like this for debugging:
Occasionally after a long run I get this output on CTRL-C (put some debug prints in Syncer.remove because I thought it might be a deadlock there but it is not):
I think the problem might be in Session.start() itself but I am not certain.
The text was updated successfully, but these errors were encountered: