Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: raw readinto() returned invalid length -1 (should have been between 0 and 8192) #368

Closed
laike9m opened this issue Oct 14, 2016 · 15 comments · Fixed by #370
Closed

Comments

@laike9m
Copy link

laike9m commented Oct 14, 2016

After running service for a while, this error occurred. Not sure if you can reproduce without incoming requests, but it happens to me every time.

Here's my code

class FetchService:
    name = "fetch_userdata_service"

    @timer(interval=86400)
    def update_token(self):
        # update token every day
        if os.path.isfile(TOKEN_FILE):
            os.remove(os.path.join(current_path, TOKEN_FILE))
            print("Old token removed")
        client.login_in_terminal(username, password)
        client.save_token('token.pkl')
        print("New token saved")

    @http('GET', '/ctrl/stop')
    def stop_service(self, request):
        runner.kill()

    @http('GET', '/<string:userhash>')
    def get_method(self, request, userhash):
        print("request userhash: " + userhash)
        u = client.people(userhash)
        # some lines ignored
        return json.dumps(data, separators=(',', ':'), ensure_ascii=False)

if __name__ == '__main__':
    logger.info("\n\nNameko Service Start...\n")
    runner = ServiceRunner(config={})
    runner.add_service(FetchService)
    runner.start()
    runner.wait()

The client is something similar to an HTTP client that make requests. And here's error log:

INFO:__main__:

Nameko Service Start...

INFO:nameko.runners:starting services: fetch_userdata_service
ERROR:nameko.containers:<ServiceContainer [fetch_userdata_service] at 0x7fce53b92ac8> thread exited with error
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/nameko/containers.py", line 477, in _handle_thread_exited
    gt.wait()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/greenthread.py", line 175, in wait
    return self._exit_event.wait()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/event.py", line 125, in wait
    current.throw(*self._exc)
  File "/usr/local/lib/python3.5/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 719, in process_request
    proto.__init__(sock, address, self)
  File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
    self.handle()
  File "/usr/lib/python3.5/http/server.py", line 424, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 330, in handle_one_request
    self.raw_requestline = self.rfile.readline(self.server.url_length_limit)
OSError: raw readinto() returned invalid length -1 (should have been between 0 and 8192)
INFO:nameko.containers:killing <ServiceContainer [fetch_userdata_service] at 0x7fce53b92ac8> due to raw readinto() returned invalid length -1 (should have been between 0 and 8192)
WARNING:nameko.containers:killing 2 managed thread(s)
WARNING:nameko.containers:killing managed thread `run`
WARNING:nameko.containers:killing managed thread `<unknown>`
INFO:nameko.runners:stopping services: fetch_userdata_service
CRITICAL:__main__:Uncaught exception
Traceback (most recent call last):
  File "service.py", line 89, in <module>
    runner.wait()
  File "/usr/local/lib/python3.5/dist-packages/nameko/runners.py", line 115, in wait
    SpawningProxy(self.containers, abort_on_error=True).wait()
  File "/usr/local/lib/python3.5/dist-packages/nameko/utils.py", line 179, in spawning_method
    return list(fail_fast_imap(pool, call, self._items))
  File "/usr/local/lib/python3.5/dist-packages/nameko/utils.py", line 148, in fail_fast_imap
    eventlet.getcurrent().throw(*exc_info)
  File "/usr/local/lib/python3.5/dist-packages/nameko/utils.py", line 127, in handle_result
    thread_result = finished_thread.wait()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/greenthread.py", line 175, in wait
    return self._exit_event.wait()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/event.py", line 125, in wait
    current.throw(*self._exc)
  File "/usr/local/lib/python3.5/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/nameko/utils.py", line 176, in call
    return getattr(item, name)(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/nameko/containers.py", line 321, in wait
    return self._died.wait()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/event.py", line 121, in wait
    return hubs.get_hub().switch()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/hubs/hub.py", line 294, in switch
    return self.greenlet.switch()
  File "/usr/local/lib/python3.5/dist-packages/nameko/containers.py", line 477, in _handle_thread_exited
    gt.wait()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/greenthread.py", line 175, in wait
    return self._exit_event.wait()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/event.py", line 125, in wait
    current.throw(*self._exc)
  File "/usr/local/lib/python3.5/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 719, in process_request
    proto.__init__(sock, address, self)
  File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
    self.handle()
  File "/usr/lib/python3.5/http/server.py", line 424, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.5/dist-packages/eventlet/wsgi.py", line 330, in handle_one_request
    self.raw_requestline = self.rfile.readline(self.server.url_length_limit)
OSError: raw readinto() returned invalid length -1 (should have been between 0 and 8192)

So you might think there's some bug in my code, but the service would work normally for hours, receiving requests and returning data without any problem. And suddenly it just crashed.

@mattbennett
Copy link
Member

I think you've found a bug in the WebServer extension.

The WebServer spawns a couple of threads with container.spawn_managed_thread. One of them is raising this OSError and, since the server is an Extension and not tied to a particular worker, the exception bubbles up to the container which considers it a fatal error.

I believe the OSError is raised when a client disconnects very early in the request. The WebServer should probably handle this as an acceptable error and not let it bubble to the container.

I will try to reproduce and fix it.

mattbennett added a commit to mattbennett/nameko that referenced this issue Oct 18, 2016
mattbennett added a commit to mattbennett/nameko that referenced this issue Oct 18, 2016
mattbennett added a commit to mattbennett/nameko that referenced this issue Oct 18, 2016
mattbennett added a commit to mattbennett/nameko that referenced this issue Oct 18, 2016
@laike9m
Copy link
Author

laike9m commented Oct 19, 2016

So a client disconnects early in the request and server cannot read data, therefore OSError occurs, is that the reason?

@mattbennett
Copy link
Member

That is correct. You can reproduce it by refreshing a browser very quickly, or aborting a curl call shortly after executing it.

@laike9m
Copy link
Author

laike9m commented Oct 19, 2016

I see you ignored OSError when processing request in above commits, but what if other OSError happens?

@mattbennett
Copy link
Member

That is a good question. I don't know what other circumstances can result in an OSError here.

The thread is exclusively used to handle an incoming request though, so it's probably fine to ignore any OSError and assume that the request will be retried, rather than letting the exception bubble and take down the whole container. @davidszotten do you agree?

@davidszotten
Copy link
Member

i'm hesitant to have such a blunt ignore. should at least check the errno i think. but would like to get a better understanding of what's going on. maybe this is actually a bug in one of the libraries we depend on.

@davidszotten
Copy link
Member

can you reproduce reliably? what happens if you do the same thing to a flask and/or werkzeug hello world?

@laike9m
Copy link
Author

laike9m commented Oct 19, 2016

I'll try to find a way to reproduce when I have time.

@mattbennett
Copy link
Member

The problem is with the eventlet WSGI server. You can demonstrate it using Flask with the following, which uses eventlet's own server wrapper for the app:

import eventlet
eventlet.monkey_patch()

from eventlet import wsgi

from flask import Flask

app = Flask(__name__)

@app.route('/answer')
def hello_world():
    return '42'

wsgi.server(eventlet.listen(('', 5000)), app)

The following will trigger the error and you'll see the traceback in the console:

import eventlet
eventlet.monkey_patch()

import requests

try:
    while True:
        gt = eventlet.spawn(requests.get, "http://localhost:5000/answer")
        eventlet.sleep(.005)
        gt.kill()
except KeyboardInterrupt:
    pass

The exception doesn't take down the server process though. The thread running Server.process_request isn't linked at all, so any exception thrown is ignored. See https://github.com/eventlet/eventlet/blob/master/eventlet/wsgi.py#L885

Interestingly enough, the OSError is only raised on Python 3. I wonder whether that is intentional or amounts to a low-impact bug in eventlet.

@davidszotten
Copy link
Member

still investigating. i wonder if it's related to https://bugs.python.org/issue13322

@davidszotten
Copy link
Member

nope. just an eventlet bug (codepath only used with py3). working on a patch. not sure what the best fix/workaround is going to be. there's no errno on the OSError but i don't think we want to blanket swallow them. might have to resort to comparing the msg :(

@davidszotten
Copy link
Member

eventlet/eventlet#353

in combination with a slightly unhelpful error message from cpython

@mattbennett
Copy link
Member

Awesome. What do you want to do while we're waiting for that fix to land?

Eventlet's own server blanket swallows everything, which may be why this bug went unnoticed. I'm OK string-matching the error for the time being, although I wonder what the best behaviour is here -- should the nameko process die if the recv thread throws an unhandled exception? That is consistent with our behaviour everywhere else.

@davidszotten
Copy link
Member

i think our assumption is that an uncaught exceptions could be anything, so the safest thing to do is suicide.

in this particular case, it might be possible to somehow attribute this exception to the worker (hm. maybe not, since we might not have parsed enough of the request yet). if we can't, then this is an error on the extension level -> suicide

and again, for this particular case, this particular exception (which is due to a bug) will now (once we catch it) no longer bubble out, so no reason to suicide

@mattbennett
Copy link
Member

OK great. I've reflected these change in #370. There's one outstanding comment there about faking vs triggering the failure.

temoto pushed a commit to eventlet/eventlet that referenced this issue Oct 27, 2016
make sure `_recv_loop` returns the correct value when `recv_meth` is of
foo_into type (fills a buffer and returns number of bytes, not the data)

#353
nameko/nameko#368
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants