Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idle_check can see incomplete lines #519

Open
mjs opened this issue Aug 22, 2023 · 8 comments
Open

idle_check can see incomplete lines #519

mjs opened this issue Aug 22, 2023 · 8 comments

Comments

@mjs
Copy link
Owner

mjs commented Aug 22, 2023

From #464:

I've found something else that may be related: I connected with tls=FALSE and ran a network capture to see what was actually on the wire. I'm seeing these errors now:

Traceback (most recent call last):
  File "./debug-protocol-error.py", line 23, in <module>
    responses = server.idle_check(timeout=30)
  File "/home/david/.local/lib/python3.8/site-packages/imapclient/imapclient.py", line 175, in wrapper
    return func(client, *args, **kwargs)
  File "/home/david/.local/lib/python3.8/site-packages/imapclient/imapclient.py", line 940, in idle_check
    line = self._imap._get_line()
  File "/usr/lib/python3.8/imaplib.py", line 1164, in _get_line
    raise self.abort('socket error: unterminated line: %r' % line)
imaplib.IMAP4.abort: socket error: unterminated line: b'* 91 FETCH (FLAGS (\\Deleted \\Seen \\Recent)'

In the network capture, there are two packets which are re-assembled to form the message seen in the error. The only thing in the second packet is \r\n. I have not looked at the code here but is it possible that it's attempting to process a packet that hasn't been fully assembled, which presumably would be in imaplib judging from the trace?

@mjs
Copy link
Owner Author

mjs commented Aug 22, 2023

The issue is likely that IMAPClient is using imaplib.IMAP4's _get_line while the socket is in non-blocking mode which it isn't really designed for. IMAPClient will like need to call IMAP4.readline() and do end of line handling itself instead

@axoroll7
Copy link
Contributor

axoroll7 commented Aug 27, 2023

readline should return a complete line. I am afraid there may be others unknown side-effects. imaplib2 uses its own line reader for this.

Since there is no concurrency, why using non-blocking sockets instead of blocking sockets with timeout ? I don't know why it was originally written this way. Maybe it was necessary in older python versions. Maybe concurrency was the next step. But can this be reconsidered today ?

EDIT 1: linguistic errors
EDIT 2: Sorry for my ignorance

@mjs
Copy link
Owner Author

mjs commented Aug 27, 2023

My memory is fuzzy but I'm pretty sure it was done like this because timeouts weren't available with older Python versions. I'm definitely open to rethinking the whole approach to IDLE now that we are only concerned with more modern Python versions.

@axoroll7
Copy link
Contributor

axoroll7 commented Aug 30, 2023

A draft :

_idle_raw_buffer:bytearray
_idle_lines_buffer:List[bytes]

def idle_check(self, timeout=None):

    sock = self.socket()

    end = 0.0
    if timeout is not None:
        end = time.monotonic() + float(timeout)

    raw_buffer = self._idle_raw_buffer
    lines_buffer = self._idle_lines_buffer

    selector = selectors.DefaultSelector()

    # make the socket non-blocking so the timeout can be
    # implemented for this call
    sock.settimeout(None)
    sock.setblocking(0)
    selector.register(sock, selectors.EVENT_READ)

    try:
        socket_error = None
        resps = []
        while not lines_buffer:

            events = selector.select(timeout)
            if not events:
                break

            i = 0
            while not lines_buffer:
                try:
                    buf = sock.recv(8096)
                except BlockingIOError:
                    if i == 0:
                        # the socket should be readable, but it is not
                        # always the case. To prevent a tight loop,
                        # we sleep for one second.
                        time.sleep(1)
                    # the socket was fully drained
                    break
                except socket.error as ex:
                    socket_error = ex
                    break
                if not buf:
                    # EOF
                    break
                while True:
                    newline_pos = buf.find(b"\n")
                    if newline_pos > -1:
                        l_buf = buf[:newline_pos+1] # +1 to get \n
                        r_buf = buf[newline_pos+1:] # +1 to remove \n
                        buf = r_buf
                        raw_buffer += l_buf
                        line = bytes(raw_buffer)
                        lines_buffer.append(line)
                        raw_buffer.clear()
                    else:
                        break
                raw_buffer += buf

                i += 1
                # imaplib._MAXLINE / 8k ~= 125
                if i > 125:
                    i = 0
                    # if the socket is not yet drained after 125*8k bytes,
                    # the elapsed time is checked
                    if timeout is not None and end - time.monotonic() < 0.0:
                        break

            if socket_error:
                break

            if timeout is not None:
                timeout = end - time.monotonic()
                if timeout < 0.0:
                    break

        while lines_buffer:
            line = lines_buffer.pop(0)
            if len(line)>imaplib._MAXLINE:
                raise imaplib.IMAP4.error(
                    "got more than %d bytes" % imaplib._MAXLINE)
            if not line.endswith(b"\r\n"):
                raise imaplib.IMAP4.abort(
                    "socket error: unterminated line: %r" % line)
            line = line[:-2]
            resps.append(_parse_untagged_response(line))
        return resps
    finally:
        selector.unregister(sock)
        sock.setblocking(1)
        self._set_read_timeout()

def idle_done(self):
    logger.debug("< DONE")
    self._imap.send(b"DONE\r\n")

    sock = self.socket()
    raw_buffer = self._idle_raw_buffer

    remaining_line = None
    if raw_buffer and raw_buffer[-1:] == b"\r":
        # if the last character is a carriage return, the line end may
        # be truncated
        next_char = sock.read(1)
        if next_char == b"\n":
            remaining_line = bytes(raw_buffer[:-1])
            raw_buffer.clear()
        else:
            raw_buffer += next_char
    # if ``raw_buffer`` is not empty, the next line is truncated.
    # By concatenating the ``raw_buffer`` and the next line, the full line is
    # recovered.
    if raw_buffer:
        raw_buffer += sock._imap._get_line()
        remaining_line = bytes(raw_buffer)
        raw_buffer.clear()

    returned_value = self._consume_until_tagged_response(self._idle_tag, "IDLE")
    if remaining_line is not None:
        returned_value[1].insert(0,_parse_untagged_response(remaining_line))
    return returned_value

Not tested.

Non-blocking socket utilized, because I was afraid to use blocking socket and being confronted with something new.

Custom line reader, seems ok for me.

Removed poll and select to use selectors, easier to maintain.

Will edit this draft often.

A socket error should be raised ?

@axoroll7
Copy link
Contributor

@mjs

Should I continue in this direction ? Should I do idle_done too, or not, because blocking sockets are better, because they are easier ? Have you already started on your side, and should I stop ? I am open to suggestions.

@axoroll7
Copy link
Contributor

python/cpython#51571

I don't have time to look it up, but it seems a readline can discard data. The issue is not closed. A non-blocking socket seems safer.

@mjs
Copy link
Owner Author

mjs commented Sep 3, 2023

Thanks for digging into this @axoroll7 . I don't have anything significant in progress for this so if you want to continue with solving this please go ahead.

I do wonder if we can get away from using non-blocking sockets and use a custom line reading function with a socket timeout. We could side step the issues with readline by implementing it ourselves.

@mjs
Copy link
Owner Author

mjs commented Sep 28, 2023

Heads up: I've got a rough prototype working which simply uses timeouts with custom line reading and buffering. I'll try to get it tidied up for review over the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants