New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SystemError: <built-in function utf_8_decode> returned a result with an error set #551

Closed
alanhamlett opened this Issue May 6, 2017 · 14 comments

Comments

Projects
None yet
6 participants
@alanhamlett
Copy link

alanhamlett commented May 6, 2017

Getting this error when a billiard.exceptions.SoftTimeLimitExceeded exception occurs interrupting a query:

billiard.exceptions.SoftTimeLimitExceeded:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/app/myapp/amqp.py", line 161, in decorated
    return fn(*args, **kwargs)
  File "/app/myapp/tasks.py", line 1399, in cache_user_summary
    dt=current,
  File "/app/myapp/utils.py", line 983, in get_summary
    use_cache=use_cache)
  File "/app/myapp/utils.py", line 601, in get_combined_durations
    durations = get_durations(user=user, start=start, end=end, use_cache=use_cache)
  File "/app/myapp/utils.py", line 713, in get_durations
    heartbeats.extend(query.all())
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2703, in all
    return list(self)
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 90, in instances
    util.raise_from_cause(err)
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
    raise value
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 67, in instances
    fetch = cursor.fetchmany(query._yield_per)
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1154, in fetchmany
    self.cursor, self.context)
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1405, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
    raise value
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1147, in fetchmany
    l = self.process_rows(self._fetchmany_impl(size))
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1306, in _fetchmany_impl
    row = self._fetchone_impl()
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1296, in _fetchone_impl
    self.__buffer_rows()
  File "/app/venv/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1283, in __buffer_rows
    self.__rowbuffer = collections.deque(self.cursor.fetchmany(size))
  File "/app/venv/lib/python3.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
SystemError: <built-in function utf_8_decode> returned a result with an error set

Using Python 3.5.2, SQLAlchemy 1.1.9 and psycopg2 2.7.1.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented May 8, 2017

Hello @alanhamlett

I've seen this error while developing the codecs cache speedup in psycopg 2.7. It was related to accessing utf8 decoding functions without holding the GIL. Apparently it is happening in some untested code path.

Please compile psycopg in debug mode and provide a log of what happened; alternatively please provide a self-contained test to reproduce the issue. Thank you.

@ljodal

This comment has been minimized.

Copy link

ljodal commented May 12, 2017

Hello, we just saw this in our Django application. We are now at Python 3.6.0, Django 1.10.7 and psycopg2 2.7.1. It occurred when a request took too long, so Gunicorn killed the worker. Here's our stack trace (with our code removed):

Internal Server Error: <url>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/encodings/utf_8.py", line 15, in decode
    def decode(input, errors='strict'):
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/base.py", line 192, in handle_abort
    sys.exit(1)
SystemExit: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py", line 42, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 249, in _legacy_get_response
    response = self._get_response(request)
  File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.6/site-packages/django/views/generic/base.py", line 68, in view
    return self.dispatch(request, *args, **kwargs)

 <snip>

  File "/usr/local/lib/python3.6/site-packages/django/db/models/query.py", line 550, in latest
    return self._earliest_or_latest(field_name=field_name, direction="-")
  File "/usr/local/lib/python3.6/site-packages/django/db/models/query.py", line 544, in _earliest_or_latest
    return obj.get()
  File "/usr/local/lib/python3.6/site-packages/django/db/models/query.py", line 379, in get
    num = len(clone)
  File "/usr/local/lib/python3.6/site-packages/django/db/models/query.py", line 238, in __len__
    self._fetch_all()
  File "/usr/local/lib/python3.6/site-packages/django/db/models/query.py", line 1087, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/usr/local/lib/python3.6/site-packages/django/db/models/query.py", line 54, in __iter__
    results = compiler.execute_sql()
  File "/usr/local/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 835, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/usr/local/lib/python3.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
SystemError: <built-in function utf_8_decode> returned a result with an error set

Hope this can help. Thanks!

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented May 12, 2017

Hey @ljodal thank you for reporting, but I need a self-contained test to debug the issue, or if you can run psycopg in debug mode and provide its info I may try and understand what's going on. Thank you very much!

@Chronial

This comment has been minimized.

Copy link

Chronial commented May 14, 2017

We have also been seeing this issue for multiple months, but it is very irregular. We see quite a bit of traffic and this happens once every week. It always happens like Ijodal's example inside a cursor.execute(sql, params) call.

@adamcharnock

This comment has been minimized.

Copy link

adamcharnock commented Jun 6, 2017

I'm seeing this daily in our celery workers. Some tasks pulling from external API's hit their soft timeout, and 20% of the time we get this error rather than a SoftTimeLimitExceeded exception.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 6, 2017

@adamcharnock could you please compile psycopg in debug mode and provide some context of when the problem occurs? I would like to make a release to fix this problem. Thank you very much.

@adamcharnock

This comment has been minimized.

Copy link

adamcharnock commented Jun 6, 2017

Hi @dvarrazzo, thanks for the quick reply. I actually meant to mention that in my reply but clearly got distracted. Unfortunately, deploying a custom compiled psycopg into production/staging is something I cannot realistically do at the moment for a variety of reasons, and I don't think I can recreate the problem locally.

I really appreciate how willing you are to take a look, and I'm sorry if this is all rather frustrating.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 6, 2017

@adamcharnock It's ok, keep me informed if you manage to reproduce the bug using a debug build.

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 8, 2017

If any of you manage to create a debug trace of this error happening I'll fix it for 2.7.2. Otherwise it will have to wait.

@brownan

This comment has been minimized.

Copy link

brownan commented Jun 14, 2017

I'm pretty sure I'm seeing the same problem. Here's a minimal test case that I came up with. The problem occurs when a signal handler raises sys.exit() while processing an INSERT query. Raising sys.exit() from a signal handler is how gunicorn handles worker timeouts.

Run this code and then from another process send it a SIGABRT.

import sys
import os
import signal

import psycopg2

def handle_sigabort(sig, frame):
    sys.exit(1)

signal.signal(signal.SIGABRT, handle_sigabort)

conn = psycopg2.connect(
    # ...
)

cur = conn.cursor()

cur.execute("CREATE TABLE IF NOT EXISTS test (id serial, num varchar(50))")

print("Now inserting. Send me a SIGABRT now!")
print("Run this: kill -SIGABRT {}".format(os.getpid()))
while True:
    cur.execute(
        "INSERT INTO test (num) VALUES (%s) RETURNING id",
        ("Hello, world!",)
    )
$ python psycopg2_utf8_decode_error.py 
Now inserting. Send me a SIGABRT now!
Run this: kill -SIGABRT 16448
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "psycopg2_utf8_decode_error.py", line 26, in <module>
    cur.execute("INSERT INTO test (num) VALUES (%s) RETURNING id", ("Hello, world!",))
  File "/home/.../env/lib/python3.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
SystemError: <built-in function utf_8_decode> returned a result with an error set

(I'm running on Ubuntu with Python 3.5, psycopg2 2.7.1)

And here's some debug output from a debug build, snipped to only show the last couple queries before crash:

[16456] pq_execute: pg connection at 0x25a3050 OK
[16456] pq_begin_locked: pgconn = 0x25a3050, autocommit = 0, status = 2
[16456] pq_begin_locked: transaction in progress
[16456] pq_execute: executing SYNC query: pgconn = 0x25a3050
[16456]     INSERT INTO test (num) VALUES ('Hello, world!') RETURNING id
[16456] pq_execute: entering synchronous DBAPI compatibility mode
[16456] pq_fetch: pgstatus = PGRES_TUPLES_OK
[16456] pq_fetch: got tuples
[16456] _pq_fetch_tuples: looking for cast 23:
[16456] curs_get_cast:        per-connection dict: (nil)
[16456] curs_get_cast:        global dict: 0x7f5e39dbec78
[16456] _pq_fetch_tuples: using cast at 0x7f5e39dbec78 for type 23
[16456] pq_fetch: fetching done; check for critical errors
[16456] psyco_curs_execute: res = 1, pgres = 0x244ce70
[16456] psyco_curs_execute: starting execution of new query
[16456] microprotocols_adapt: trying to adapt str
[16456] qstring_setup: init qstring object at 0x7f5e40318330, refcnt = 1
[16456] qstring_setup: good qstring object at 0x7f5e40318330, refcnt = 1
[16456] microprotocol_getquoted: adapted to psycopg2.extensions.QuotedString
[16456] qstring_dealloc: deleted qstring object at 0x7f5e40318330, refcnt = 0
[16456] pq_execute: pg connection at 0x25a3050 OK
[16456] pq_begin_locked: pgconn = 0x25a3050, autocommit = 0, status = 2
[16456] pq_begin_locked: transaction in progress
[16456] pq_execute: executing SYNC query: pgconn = 0x25a3050
[16456]     INSERT INTO test (num) VALUES ('Hello, world!') RETURNING id
[16456] pq_execute: entering synchronous DBAPI compatibility mode
[16456] pq_fetch: pgstatus = PGRES_TUPLES_OK
[16456] pq_fetch: got tuples
[16456] _pq_fetch_tuples: looking for cast 23:
[16456] curs_get_cast:        per-connection dict: (nil)
[16456] curs_get_cast:        global dict: 0x7f5e39dbec78
[16456] _pq_fetch_tuples: using cast at 0x7f5e39dbec78 for type 23
[16456] pq_fetch: fetching done; check for critical errors
[16456] psyco_curs_execute: res = -1, pgres = 0x25b7cd0
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "psycopg2_utf8_decode_error.py", line 26, in <module>
    cur.execute("INSERT INTO test (num) VALUES (%s) RETURNING id", ("Hello, world!",))
  File "/home/.../env/lib/python3.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
SystemError: <built-in function utf_8_decode> returned a result with an error set
[16456] conn_close: PQfinish called
[16456] connection_dealloc: deleted connection object at 0x7f5e39dd63d8, refcnt = 0
[16456] cursor_dealloc: deleted cursor object at 0x7f5e3ee05618, refcnt = 0

One thing I noticed is that if you don't use the RETURNING feature, you still get the same error, but not from utf_8_decode() but rather from cursor.execute()

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 14, 2017

@brownan Thank you so much for this analysis! I'll see what we can do. Probably the signal handler makes an hole into our lock management...

@Chronial

This comment has been minimized.

Copy link

Chronial commented Jun 15, 2017

We have now also seen a SystemError: <built-in method execute of psycopg2.extensions.cursor object at 0x7f8bb0331900> returned a result with an error set without any changes to our setup – in exactly the same place that also generates the utf_8_decode errors.

dvarrazzo added a commit that referenced this issue Jun 15, 2017

@dvarrazzo dvarrazzo closed this in 602c74f Jun 15, 2017

@dvarrazzo dvarrazzo added this to the psycopg 2.7.2 milestone Jun 15, 2017

@dvarrazzo

This comment has been minimized.

Copy link
Member

dvarrazzo commented Jun 15, 2017

There. I think it was a mundane case of return value not checked, in one of the most ancient parts of the codebase. utf_8_decode() seems failing after receiving the signal: now we should be handling that. I've added @brownan test to the test suite and verified both the code paths now work ok (with or without the RETURNING).

Testing from you affected by the issue is welcome. The fix is now in master and will be released in psycopg 2.7.2.

@brownan

This comment has been minimized.

Copy link

brownan commented Jun 15, 2017

Thanks @dvarrazzo. For anyone else coming upon this issue, the underlying problem for me was actually that Gunicorn has a default request timeout of 30 seconds (that I had forgotten about), so it was killing the workers after 30 seconds. For a long-running task that does a lot of inserts, it was likely to error within psycopg2's cursor.execute(). The exception that was being raised was misleading, and that's what this bug fixed.

After installing this fix, I no longer see those tracebacks, but sending a signal will still stop the process (as expected), but now it's much more graceful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment