Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I receive socket error: EOF when trying to use IDLE #234

Closed
afontenot opened this issue Mar 17, 2017 · 10 comments
Closed

I receive socket error: EOF when trying to use IDLE #234

afontenot opened this issue Mar 17, 2017 · 10 comments

Comments

@afontenot
Copy link

afontenot commented Mar 17, 2017

Using the latest version of imapclient, with the yandex.com IMAP server. Yandex email accounts are free if you want to test this.

My code:

while True:
    client.noop()
    client.select_folder(FOLDER)
    client.idle()
    try:
        newmail = client.idle_check(15 * 60)
        client.idle_done()
        if newmail:
            messagelist = client.search(['RECENT'])
            process_messages(client, messagelist)
    except Exception as e:
        print(e)
        # connection failed, get a new one
        client = get_connection(USER, PASS, SERVER)
        continue

It works most of the time but will randomly choke once or twice per hour with the aforementioned "socket error: EOF" (which I think is coming from imaplib). I suppose this could be a problem with the Yandex server, but that seems unlikely? Am I doing something stupid?

@mjs
Copy link
Owner

mjs commented Mar 18, 2017

As far as I can see, you're not doing anything stupid :)

It's hard to say but given the intermittent and infrequent nature of the problem I'm guessing that the cause is more likely to be a networking issue. In order to diagnose this further you could try adding client.debug = True at the top of your code. That will cause protocol traffic to be printed and might provide some insights.

If you're comfortable with tcpdump, it would also be useful to get a dump of the network traffic for the IMAP connection. This will let us know if the Yandex servers are dropping the connection.

I'm pretty short on time at the moment but I can try and replicate if you aren't able to troubleshoot further.

One side issue: selecting a folder is often an expensive operation (server side). You might want to move the select_folder call to after the get_connection call in the exception handling block. There's no need to re-select the folder each loop.

@mjs
Copy link
Owner

mjs commented Mar 18, 2017

Another thing: you might want to look inside newmail before looking for new messages. There's a number of events that can be reported during IDLE. New messages are just one of them.

@afontenot
Copy link
Author

Yandex mandates SSL, so I used sslsplit instead of tcpdump. Hope that's fine.

Program output:

24:54.732118 > b'KIML10 IDLE'                                                                                       
24:54.956562 < b'+ idling'                                                                                          
39:55.184876 < b'KIML10 OK IDLE terminated'                                                                         
39:55.185006 > b'KIML11 NOOP'
39:55.419715 < b'KIML11 OK NOOP Completed.'
39:55.419963 > b'KIML12 SELECT "TESTFOLDER"'
39:55.696877 < b'* FLAGS (\\Answered \\Seen \\Draft \\Deleted $Forwarded)'
39:55.697103 < b'* 0 EXISTS'
39:55.697181 < b'* 0 RECENT'
39:55.697274 < b'* OK [PERMANENTFLAGS (\\Answered \\Seen \\Draft \\Flagged \\Deleted $Forwarded \\*)] Limited'
39:55.697373 < b'* OK [UIDNEXT 38] Ok'
39:55.697463 < b'* OK [UIDVALIDITY 1489362776] Ok'
39:55.961045 < b'KIML12 OK [READ-WRITE] SELECT Completed.'
39:55.961468 > b'KIML13 IDLE'
39:56.185364 < b'+ idling'
socket error: EOF

Sslstrip output:

2017-03-18 06:39:55 UTC [127.0.0.1]:41398 -> [87.250.251.124]:993 (13):
KIML13 IDLE
2017-03-18 06:39:56 UTC [87.250.251.124]:993 -> [127.0.0.1]:41398 (10):
+ idling
2017-03-18 06:40:35 UTC [87.250.251.124]:993 -> [127.0.0.1]:41496 (138):
* OK Yandex IMAP4rev1 at imap20m.mail.yandex.net:993 ready to talk with ::ffff:xxx.xxx.xxx.xxx:47738, 2017-Mar-18 09:40:34, Yed0LO15umI1
2017-03-18 06:40:35 UTC [127.0.0.1]:41496 -> [87.250.251.124]:993 (18):
IJFI0 CAPABILITY
2017-03-18 06:40:35 UTC [87.250.251.124]:993 -> [127.0.0.1]:41496 (126):
* CAPABILITY IMAP4rev1 CHILDREN UNSELECT LITERAL+ NAMESPACE XLIST BINARY UIDPLUS ENABLE ID AUTH=PLAIN AUTH=XOAUTH2 IDLE MOVE
2017-03-18 06:40:35 UTC [87.250.251.124]:993 -> [127.0.0.1]:41496 (32):
IJFI0 OK CAPABILITY Completed.
2017-03-18 06:40:35 UTC [127.0.0.1]:41496 -> [87.250.251.124]:993 (68):
IJFI1 LOGIN xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
2017-03-18 06:40:35 UTC [87.250.251.124]:993 -> [127.0.0.1]:41496 (129):
* CAPABILITY IMAP4rev1 CHILDREN UNSELECT LITERAL+ NAMESPACE XLIST BINARY UIDPLUS ENABLE ID IDLE MOVE
IJFI1 OK LOGIN Completed.

Looks like the server is resending the greeting unexpectedly? Any reason why it should do that?

@afontenot
Copy link
Author

Oh, sorry, of course the sslstrip output doesn't show the TCP packets, and you need those. I managed to get tcpdump working by using stunnel to handle SSL. It looks like you guessed right, the server is closing the connection while we're idling.

I'm including the pcap in case it helps. snip2.zip

@mjs
Copy link
Owner

mjs commented Mar 18, 2017

Ha! I ended up creating a Yandex account and am running a test with tcpdump running right now :)

You're right that I was interested in the TCP packets. There's no real need to get inside the SSL data - just the TCP level information is probably enough.

... and I just reproduced the EOF. I had added some printing of timestamps to your example and for me at least it looks like the connection was dropped almost exactly 10 mins after the IDLE was started. I'm guessing that Yandex is dropping inactive connections after 10 mins. If so, this mildly violates the IDLE spec as connections in IDLE are supposed to live for at least 29 mins.

Unfortunately I screwed up my tcpdump filter so I've got the wrong packets. Looking at your capture the timing between the IDLE start and the connection being killed appears to be ~90s so maybe my hypothesis is wrong and there's something else happening. It does look like that Yandex is closing the connection though.

@afontenot
Copy link
Author

Looking at your capture the timing between the IDLE start and the connection being killed appears to be ~90s so maybe my hypothesis is wrong and there's something else happening. It does look like that Yandex is closing the connection though.

I believe I've tried making the IDLE connections shorter, in the 2-5 minute range, and that didn't really help. My best guess right now is that IDLE connections eventually die for some reason on Yandex, but if you restart them often enough, you're less likely to encounter the hiccup. I'm running a test with a 5 minute idle time right now, I'll see what I can find.

@afontenot
Copy link
Author

Update - connection died in the same way after about 45 minutes. I'm not convinced the IDLE connection length is a factor...

@mjs
Copy link
Owner

mjs commented Mar 18, 2017

You're right that the IDLE time doesn't appear to be a factor. I was misled by my first attempt which happened to land on almost exactly 10 mins. I've just seen another EOF after 6 minutes.

I guess the right thing to do is handle these failures robustly in your program. Although Yandex seems to be dropping connections a little more than they should, network connections can fail at any time for for any number of reasons and our software has to be prepared for that.

@afontenot
Copy link
Author

I guess the right thing to do is handle these failures robustly in your program.

That's what I've been trying to do with the exception code that gets me a new connection before restarting the loop. Is there a better way to handle that?

Although Yandex seems to be dropping connections a little more than they should, network connections can fail at any time for for any number of reasons and our software has to be prepared for that.

True - but it doesn't make very much sense for Yandex to just kill a long running IMAP connection. I wonder if I should report this to them?

@NicolasLM
Copy link
Collaborator

We run IDLE connections on a wide variety of providers, most of them work well by reactivating the IDLE connection every 15 minutes.

For ~5% of the accounts however, one of these two things happen:

  • The server closes the connection very early, we've seen some of them closing it after exactly 1 minute of inactivity.
  • The server drops the connection without closing it, resulting in a timeout when trying to reactivate it after the 15 minutes of TTL.

The solution found is to keep track of the bad actors and fallback to polling for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants