Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyErr_SetObject() behavior is strange and not as documented. #101578

Open
markshannon opened this issue Feb 5, 2023 · 33 comments · Fixed by #101607
Open

PyErr_SetObject() behavior is strange and not as documented. #101578

markshannon opened this issue Feb 5, 2023 · 33 comments · Fixed by #101607
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes topic-C-API type-bug An unexpected behavior, bug, or error

Comments

@markshannon
Copy link
Member

markshannon commented Feb 5, 2023

Briefly:
PyErr_SetObject(exc_type, exc_val) does not create a new exception iff isinstance(exc_val, BaseException), but uses exc_val instead.

Callers of PyErr_SetObject() need various workarounds to handle this.

The long version:

Internally CPython handles exceptions as a triple (type, value, traceback), but the language treats exceptions as a single value.

This a legacy of the olden days before proper exceptions.
To handle adding proper exceptions to Python, various error handling functions, specifically _PyErr_SetObject still treat exceptions as triples, with the convention that if the value is an exception, then the exception is already normalized.

One other oddity is that if exc_val is a tuple, it is treated as the * arguments to exc_type when calling it. So, if isinstance(exc_val, BaseException) the desired behavior can be achieved by wrapping exc_val in a one-tuple.

As a consequence, both _PyErr_SetKeyError and _PyGen_SetStopIterationValue are a lot more complex than they should be to workaround this behavior.

We could make PyErr_SetObject act as documented, but that is likely to break C extensions, given how old this behavior is, and that it is relied on throughout CPython.

Code that does the following is common:

    exc = new_foo_exception();
    PyErr_SetObject(&PyFooException_Type, exc);

We could just document the current behavior, but the current behavior is strange.
What I suggest is this:

  • Create a new API function, PyErr_SetException(exc)` that takes a single exception object.
  • Document PyErr_SetObject() accurately
  • Deprecate the old function

This is an old bug going back to the 2 series.

Linked PRs

Also relevant:

@markshannon markshannon added type-bug An unexpected behavior, bug, or error 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes 3.8 only security fixes 3.7 (EOL) end of life topic-C-API 3.12 bugs and security fixes labels Feb 5, 2023
@markshannon
Copy link
Member Author

The only case I can find of strange behavior leaking out is this:

import sys

def wrap_in_sys_exit(ex):
    try:
        try:
            raise ex(-1)
        except ex as err:
            sys.exit(err)
    except BaseException as err:
        return err

print(repr(wrap_in_sys_exit(ValueError)))
print(repr(wrap_in_sys_exit(SystemExit)))

Which prints

SystemExit(ValueError(-1))
SystemExit(-1)

instead of the expected

SystemExit(ValueError(-1))
SystemExit(SystemExit(-1))

@rhettinger
Copy link
Contributor

rhettinger commented Feb 5, 2023

This API goes by to 1997. It is remarkable how simple and clear the code used to be:

void
PyErr_Restore(PyObject *type, PyObject *value, PyObject *traceback)
{
	PyThreadState *tstate = PyThreadState_GET();
	PyObject *oldtype, *oldvalue, *oldtraceback;

	if (traceback != NULL && !PyTraceBack_Check(traceback)) {
		/* XXX Should never happen -- fatal error instead? */
		Py_DECREF(traceback);
		traceback = NULL;
	}

	/* Save these in locals to safeguard against recursive
	   invocation through Py_XDECREF */
	oldtype = tstate->curexc_type;
	oldvalue = tstate->curexc_value;
	oldtraceback = tstate->curexc_traceback;

	tstate->curexc_type = type;
	tstate->curexc_value = value;
	tstate->curexc_traceback = traceback;

	Py_XDECREF(oldtype);
	Py_XDECREF(oldvalue);
	Py_XDECREF(oldtraceback);
}

void
PyErr_SetObject(PyObject *exception, PyObject *value)
{
	Py_XINCREF(exception);
	Py_XINCREF(value);
	PyErr_Restore(exception, value, (PyObject *)NULL);
}

@markshannon
Copy link
Member Author

The code may have been simple, but it can leave the VM in a precarious state if the value is not a legal argument for calling the type.

markshannon added a commit that referenced this issue Feb 8, 2023
* Make sure that the current exception is always normalized.

* Remove redundant type and traceback fields for the current exception.

* Add new API functions: PyErr_GetRaisedException, PyErr_SetRaisedException

* Add new API functions: PyException_GetArgs, PyException_SetArgs
carljm added a commit to carljm/cpython that referenced this issue Feb 9, 2023
* main: (82 commits)
  pythongh-101670: typo fix in PyImport_ExtendInittab() (python#101723)
  pythonGH-99293: Document that `Py_TPFLAGS_VALID_VERSION_TAG` shouldn't be used. (#pythonGH-101736)
  no-issue: Add Dong-hee Na as the cjkcodecs codeowner (pythongh-101731)
  pythongh-101678: Merge math_1_to_whatever() and math_1() (python#101730)
  pythongh-101678: refactor the math module to use special functions from c11 (pythonGH-101679)
  pythongh-85984: Remove legacy Lib/pty.py code. (python#92365)
  pythongh-98831: Use opcode metadata for stack_effect() (python#101704)
  pythongh-101283: Version was just released, so should be changed in 3.11.3 (pythonGH-101719)
  pythongh-101283: Fix use of unbound variable (pythonGH-101712)
  pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286)
  pythongh-101277: Port more itertools static types to heap types (python#101304)
  pythongh-98831: Modernize CALL and family (python#101508)
  pythonGH-101696: invalidate type version tag in `_PyStaticType_Dealloc` (python#101697)
  pythongh-100221: Fix creating dirs in `make sharedinstall` (pythonGH-100329)
  pythongh-101670: typo fix in PyImport_AppendInittab() (pythonGH-101672)
  pythongh-101196: Make isdir/isfile/exists faster on Windows (pythonGH-101324)
  pythongh-101614: Don't treat python3_d.dll as a Python DLL when checking extension modules for incompatibility (pythonGH-101615)
  pythongh-100933: Improve `check_element` helper in `test_xml_etree` (python#100934)
  pythonGH-101578: Normalize the current exception (pythonGH-101607)
  pythongh-47937: Note that Popen attributes are read-only (python#93070)
  ...
@erlend-aasland
Copy link
Contributor

The new C APIs PyErr_GetRaisedException and PyErr_SetRaisedException are IMO problematic:

  • the former returns a borrowed reference
  • the latter steals a reference

This goes against our recommendations for adding new public C APIs.

I also find the added documentation inadequate. It is not documented that the former function returns a borrowed reference. The documentation for the latter function is not exactly friendly written, and I'm specifically thinking of the last parenthesised sentence:

exc must be a valid exception. (Violating this rules will cause subtle problems later.) This call consumes a reference to the exc object: you must own a reference to that object before the call and after the call you no longer own that reference. (If you don’t understand this, don’t use this function. I warned you.)

Suggesting to considering reopening this issue until at least the docs have been improved. IMO, we should also fix the newly added C APIs to be consistent in their reference handling, which means:

  • PyErr_GetRaisedException should return a new reference
  • PyErr_SetRaisedException should not steal a reference

@markshannon
Copy link
Member Author

The remark "(If you don’t understand this, don’t use this function. I warned you.)" comes directly from the documentation for PyErr_Restore and PyErr_Fetch which this API is meant to replace, as does the reference count behavior.

PyErr_GetRaisedException should return a new reference

How does that differ from the current behavior?

@encukou
Copy link
Member

encukou commented Feb 14, 2023

AFAICS, PyErr_GetRaisedException doesn't return a borrowed reference. The caller owns the return value.
It's just PyErr_SetRaisedException stealing its argument.

@erlend-aasland
Copy link
Contributor

AFAICS, PyErr_GetRaisedException doesn't return a borrowed reference. The caller owns the return value.
It's just PyErr_SetRaisedException stealing its argument.

Huh? In that case, I'm reading this completely wrong:

cpython/Python/errors.c

Lines 448 to 460 in 81e3aa8

PyObject *
_PyErr_GetRaisedException(PyThreadState *tstate) {
PyObject *exc = tstate->current_exception;
tstate->current_exception = NULL;
return exc;
}
PyObject *
PyErr_GetRaisedException(void)
{
PyThreadState *tstate = _PyThreadState_GET();
return _PyErr_GetRaisedException(tstate);
}

Anyway, I still think it smells to introduce a new API that steals its argument. There should be no performance hit in making the caller instead of the callee having to decref that argument.

@erlend-aasland
Copy link
Contributor

The remark "(If you don’t understand this, don’t use this function. I warned you.)" comes directly from the documentation for PyErr_Restore and PyErr_Fetch which this API is meant to replace, as does the reference count behavior.

If we introduce a new API, why copy the IMO inadequate docs and the smelly ref count behaviour? It's a new API; we can choose to follow the convention; we can choose to document it properly.

@markshannon
Copy link
Member Author

I'm not saying we shouldn't document it better, just don't blame me for the docs 🙂

@erlend-aasland
Copy link
Contributor

I'm not saying we shouldn't document it better, just don't blame me for the docs 🙂

Sorry, if it came out that way; that was not my intent! Let's just amend the docs; I can propose a patch 🙂

(Regarding the reference counting, I'd still like to see the public APIs following the established conventions.)

@erlend-aasland
Copy link
Contributor

(Re-opening till we've patched the docs)

erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Feb 16, 2023
erlend-aasland added a commit that referenced this issue Feb 19, 2023
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Feb 19, 2023
@gvanrossum
Copy link
Member

So if the plan is to deprecate PyErr_SetObject, why change its semantics? The change in semantics may still trip folks up, which is not so nice -- they may need three versions of their calling code, one for the old PyErr_SetObject semantics (3.11 and before), one for the 3.12 semantics, and one using the new API. (PyErr_SetObject is in the stable ABI so they may want to use that in 3.12 in some configurations.)

@markshannon
Copy link
Member Author

markshannon commented Mar 13, 2023

There is no plan to change the semantics of PyErr_SetObject(). We plan to deprecate it.

We do seem to have to changed its semantics accidentally, as described in #102594, but the semantics in that case made no sense and was setting up a trap for whatever code tried to normalize the exception.

@gvanrossum
Copy link
Member

From offline conversation with Irit it looks like it's better to change the behavior of PyErr_SetObject (but in an acceptable way) without deprecating it. @markshannon We should discuss this at the Wednesday meeting.

@markshannon
Copy link
Member Author

Why not deprecate it?

@Mekk
Copy link

Mekk commented Dec 15, 2023

I am just trying to port from PyErr_Fetch. And I would really suggest improving docs of new api

  1. PyErr_GetRaisedException description lacks info about return value in case there is no current exception. I suspect it is to be either NULL, or None, but will have to test or read sources.

  2. There is no info what should I do to get error value. Remark about deprecation of PyErr_Fetch should mention that („Deprecated since version 3.12: Use PyErr_GetRaisedException() instead. To get error value use PyException_IDontKnowWhat(err), to get error traceback use PyException_GetTraceback(err).”). I am going to try with PyErr_GetArgs, but I am unsure whether this is equivalent.

As a general remark, it isn't first such case when deprecation doesn't bring sufficient docs (and sometimes doesn't even bring sufficient new API). Please, don't deprecate without considering whether users of old code get sufficient info of how to port it. When you liquidate function which returned 3 values, describe how to get those 3 values.

PS Update. One more doubt which would be worth docs (less related to deprecation as such):

  1. Docs lack info whether PyErr_SetRaisedException overwrites possible present exception, or not, or merges them somehow, or … what. I mean the following scenario: I call PyErr_GetRaisedException, then I do something what possibly causes another exception (for example out of range or failed conversion when I read args, or unicode error raised by str), then I restore original error with PyErr_SetRaisedException . What is the state?

  2. Does PyException_SetArgs steal argument, or increfs it?

  3. (ok, ok, I already got my coredump and some SystemErrors, so I know that no) Is it legal to PyTuple_SetItem on tuple returned from PyException_GetArgs?

@iritkatriel
Copy link
Member

@Mekk Would you like to create a PR with improvements to the docs?

@Mekk
Copy link

Mekk commented Dec 19, 2023

I don't know all the replies yet (not to mention lack of formal/legal framework).

In general IMHO such info shouldn't be added as afterthought but should be treated as necessary part of change which deprecates APIs (nothing shows better whether deprecation is well thought out than detailed description how to change code which uses deprecated API).

@gvanrossum
Copy link
Member

It's open source. I understand that you feel this deprecation was insufficiently documented and possibly not thought through sufficiently. Okay. But please give us a break and don't lecture us like we are noobs. We're only human and we sometimes make mistakes. And you get to use the fruits of our efforts for free.

@Mekk
Copy link

Mekk commented Dec 21, 2023

I comment on issue, and maybe process¹, not on people.

Going back to technical aspects, there is also some subtle change with those old/deprecated APIs in 3.12, code like

PyObject* dic = PyDict_New();
PyRun_String("5 / 0", Py_file_input, dic, dic);  // trigger python exception
PyErr_Fetch(&errType, &errValue, errTb);
const char *msg = PyUnicode_AsUTF8(errValue);

stopped to work in 3.12 at least in some cases (I tested in context of ZeroDivisionError like above, UnicodeDecodeError triggered by invalid conversion and few more): msg is NULL and TypeError is raised by PyUnicode_AsUTf8.

This is caused by the fact that errValue isn't str anymore but exception object instead:

  • earlier (for example in 3.10.12) errType is ZeroDivisionError and errValue is string object "division by zero"
  • in 3.12.1 errType is ZeroDivisionError and errValue is object of ZeroDivisionError type

I don't claim the former is better or worth keeping or anything, but simply this is change which breaks some code.

¹ To give some other recent example, deprecation of PyEval_CallObjectWithKeywords was documented with brief „use PyObject_Call and its variants instead” in https://docs.python.org/3/whatsnew/3.9.html and at least one case (kwargs present, args null) doesn't seem to have replacement API.

@iritkatriel
Copy link
Member

This is mentioned in the release notes:

The interpreter’s error indicator is now always normalized. This means that [PyErr_SetObject()](https://docs.python.org/3/c-api/exceptions.html#c.PyErr_SetObject), [PyErr_SetString()](https://docs.python.org/3/c-api/exceptions.html#c.PyErr_SetString) and the other functions that set the error indicator now normalize the exception before storing it. (Contributed by Mark Shannon in [gh-101578](https://github.com/python/cpython/issues/101578).)

@iritkatriel
Copy link
Member

I comment on issue, and maybe process¹, not on people.

This seems to be doubling down on your communication style, so I don't know if you're open to feedback on it or not. But on the off chance that you are, your messages came across to me like "I don't know enough about cpython dev process to actually contribute a docs improvement PR, and I can't be asked to learn the processes so that I can contribute, but I can tell you how you should be doing your job (twice)".

That kind of attitude doesn't go down well in open source.

@iritkatriel
Copy link
Member

Regarding your earlier questions:

PyErr_GetRaisedException description lacks info about return value in case there is no current exception. I suspect it is to be either NULL, or None, but will have to test or read sources.

I will add that it returns NULL.

There is no info what should I do to get error value.

I don't understand this question. PyErr_GetRaisedException returns it, its doc says "Return the exception currently being raised". How could it be clearer?

Docs lack info whether PyErr_SetRaisedException overwrites possible present exception, or not, or merges them somehow, or … what.

The doc says Set exc as the exception currently being raised, clearing the existing exception if one is set. Is this ambiguous?

Does PyException_SetArgs steal argument, or increfs it?

We document when a function steals references, as that is the untypical case. We don't specify this for functions that do not steal references.

iritkatriel added a commit to iritkatriel/cpython that referenced this issue Dec 21, 2023
iritkatriel added a commit that referenced this issue Dec 31, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Dec 31, 2023
…NULL when the error indicator is not set (pythonGH-113369)

(cherry picked from commit 2849cbb)

Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
iritkatriel added a commit that referenced this issue Dec 31, 2023
… NULL when the error indicator is not set (GH-113369) (#113606)

gh-101578: [doc] mention that PyErr_GetRaisedException returns NULL when the error indicator is not set (GH-113369)
(cherry picked from commit 2849cbb)

Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
kulikjak pushed a commit to kulikjak/cpython that referenced this issue Jan 22, 2024
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes topic-C-API type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants