Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memoryview can set an exception in tp_clear #77894

Open
serhiy-storchaka opened this issue May 31, 2018 · 17 comments
Open

memoryview can set an exception in tp_clear #77894

serhiy-storchaka opened this issue May 31, 2018 · 17 comments
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@serhiy-storchaka
Copy link
Member

BPO 33713
Nosy @pitrou, @vstinner, @skrah, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2018-05-31.12:15:10.896>
labels = ['interpreter-core', '3.10', '3.9', 'type-crash', '3.11']
title = 'memoryview can set an exception in tp_clear'
updated_at = <Date 2021-10-18.20:19:58.191>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2021-10-18.20:19:58.191>
actor = 'iritkatriel'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2018-05-31.12:15:10.896>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 33713
keywords = []
message_count = 14.0
messages = ['318288', '318291', '318295', '318296', '318297', '318298', '318299', '318300', '318306', '318316', '318318', '318319', '318320', '318586']
nosy_count = 4.0
nosy_names = ['pitrou', 'vstinner', 'skrah', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue33713'
versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

@serhiy-storchaka
Copy link
Member Author

The tp_clear handler of memoryview can set an exception when fail to release the buffer. An exception in tp_clear is not expected and caused a crash in the garbage collector. In the master branch it will cause just writing a traceback to stderr (see bpo-33622), but in any case it would be better to handle the failure locally in memoryview. I don't know what is the best solution: silencing an error, writing a traceback with more detailed information, or resurrecting the buffer object.

@serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels May 31, 2018
@skrah
Copy link
Mannequin

skrah mannequin commented May 31, 2018

Could you please show how tp_clear() can be called when self->exports > 0? It should not happen.

bpo-33622 is a big issue with many commits. Would it be possible to extract the relevant part?

@serhiy-storchaka
Copy link
Member Author

I don't know how to reproduce a failure in tp_clear(). I just can't prove that it never fails. Maybe it is needed a bug in the implementation of the buffer protocol in third-party extension.

If it should not happen then we can just add

assert(!PyErr_Occurred());

or

    if (PyErr_Occurred()) {
        PyErr_WriteUnraisable(NULL);
    }

It is better to crash in memoryview.c than in the garbage collector if this crash is caused by incorrect buffer protocol implementation.

@skrah
Copy link
Mannequin

skrah mannequin commented May 31, 2018

This looks the same as bpo-25525. I think it cannot happen, and no one has ever reported an actual issue for 6 years now.

You *really* need to show a reproducer if you assert that something can crash.

@serhiy-storchaka
Copy link
Member Author

See the delete_garbage() function line 770 in Modules/gcmodule.c for changes in the master branch relevant to this issue. See Py_FatalError() in the collect() function at line 974 for a crash.

@skrah
Copy link
Mannequin

skrah mannequin commented May 31, 2018

Yes, but who calls tp_clear() if the memoryview is not being deallocated?

@serhiy-storchaka
Copy link
Member Author

The GC calls tp_clear() if the memoryview is a part of the reference loop.

a = [memoryview(...)]
a.append(a)
del a

The GC will call tp_clear() of the list or the memoryview. What will be called first is not specified.

@skrah
Copy link
Mannequin

skrah mannequin commented May 31, 2018

The point is that *no garbage collection is triggered* if self->exports > 0.

It would be a major bug if it were and I suspect it would be reported within a week. Fortunately, no such bug has been reported in 6 years.

@pitrou
Copy link
Member

pitrou commented May 31, 2018

Serhiy is right about the theoretical concern here. However, it's probably quite difficult to find a concrete situation where this occurs, because we're talking about mbuf_clear and the managerbuffer object can't really get involved in a reference cycle by itself (not in normal use anyway): the memoryview object does, but it's a different thing.

By the way: PyBuffer_Release() returns void, so IMO it's a bug if it can return with an exception set. We should fix that rather than focus on mbuf_clear().

@skrah
Copy link
Mannequin

skrah mannequin commented May 31, 2018

Well, the example would need exports:

>>> a = [bytes()]
>>> a.append(memoryview(a[0]))
>>> a.append(memoryview(a[1]))
>>> a.append(a)
>>> a
[b'', <memory at 0x1ad21f8>, <memory at 0x1b52348>, [...]]

The first memoryview has one export, so its refcount > 0.

Do I fundamentally misunderstand tp_clear() and tp_clear() can be called on objects with refcount > 0?

@vstinner
Copy link
Member

If the bug cannot occur, just add "assert(!PyErr_Occurred());" no?

@skrah
Copy link
Mannequin

skrah mannequin commented May 31, 2018

Well yes, I still want to understand tp_clear(). :)

The docs are a bit vague.

@pitrou
Copy link
Member

pitrou commented May 31, 2018

Yes, tp_clear can be called with refcount > 0. It's exactly why it's
separate from tp_dealloc, actually :-)

@skrah
Copy link
Mannequin

skrah mannequin commented Jun 3, 2018

Okay that makes sense. :)

I looked a bit at the gc code. A consumer object always has one
reference to a memoryview with an export, which isn't visited. So
it looks to me that the gc_refs of that memoryview cannot fall to 0.

So memory_clear() isn't called in that case, but mbuf_clear() is, which
is known and expected to handle mbuf->exports >= 0.

Indeed let's perhaps just add "if (self->exports > 0) return 0" to memory_clear() if those assumptions are too complex.

@iritkatriel iritkatriel added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Oct 18, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@iritkatriel iritkatriel added 3.12 bugs and security fixes and removed 3.9 only security fixes labels Feb 28, 2023
@AlexSCFraser
Copy link

AlexSCFraser commented Aug 22, 2023

Several posts in the thread mention this is a purely theoretical concern, but I keep running into a crash bug that could be caused by this. The error I keep getting:

Exception ignored in tp_clear of: <class 'memoryview'>
Traceback (most recent call last):
  File "<frozen importlib._bootstrap_external>", line 672, in _compile_bytecode
BufferError: memoryview has 1 exported buffer

When this happens, the interpreter hard crashes, but I can just rerun my integration test or unit test without changing anything and then it will work. So it's some kind of uncommon intermittent error which is crashing the interpreter.

Unfortunately the error is intermittent and seems to happen very rarely. When it does happen, I can tell you that I am usually using the debugger to run code. I don't think it's happened to me when I'm not debugging. Also, the frequency of the error is greatly increased when using libraries which are actually wrappers to C or C++ code such as PyQT5. Most recently it happened to me during a subprocess.run() call to a binary executable on windows. I'm not sure if the underlying implementation of subprocess.run() has any non-python code that might be a source of errors as well. If I were to guess what the cause is, I imagine that there is an intermittent reference counting error in these foreign libraries the GC has no awareness of and this is somehow compounded by debugging because of additional variable references introduced by the debugger or something like that.

If I find a reliable method to reproduce this error I will document it and try to produce a minimal example, but it seems to occur very randomly even with the preconditions of running through the debugger and interfacing with C/C++ libraries.

@krzysgol
Copy link

krzysgol commented Aug 23, 2023

Not sure if it helps but I'm getting 100% repro when starting the attached script using PyCharm debug. It never happens with debugger disconnected.

Exception ignored in tp_clear of: <class 'memoryview'>
Traceback (most recent call last):
  File "C:\Python311\Lib\threading.py", line 265, in __enter__
    return self._lock.__enter__()

parallel.zip

@cyc1111111111
Copy link

线程中的几篇文章中提到的这纯粹是理论上的问题,但我不断遇到可能不断引起的崩溃错误。我收到的错误:

Exception ignored in tp_clear of: <class 'memoryview'>
Traceback (most recent call last):
  File "<frozen importlib._bootstrap_external>", line 672, in _compile_bytecode
BufferError: memoryview has 1 exported buffer

发生这种情况时,解释器会严重崩溃,但我可以重新运行集成测试或单元测试而不更改任何内容,然后它就会工作。所以这是一种不常见的间歇性错误,导致解释器崩溃。

不幸的是,该错误是间歇性的,而且似乎很少发生。当它确实发生时,我可以告诉你我通常使用调试器来发生运行代码。我不认为当我不调试时这会发生在我身上另外,当使用实际是 C 或 C++ 代码(如 PyQT5)的封装器的库时,错误的频率会大大增加。最近,我在对Windows 上的二进制执行文件进行 subprocess.run() 调用时发生了这种情况。我不确定 subprocess.run() 的底层实现是否有任何非 python 代码也可能是错误来源。如果我要猜测原因是什么,我想这些外部库中存在间歇性引用计数错误, GC 没有意识到,并且由于调试器或类似的东西引入了额外的引用,调试在某种程度上使这种情况变得更加复杂。

如果我找到一种可靠的方法来修复此错误,我将记录它并尝试生成一个最小的样本,但即使在运行调试器并与 C/C++ 库交互的前提下,它似乎也非常随机地发生。

Have you solved this problem? I had the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

7 participants