Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-85283: Add PyInterpreterState_IsMain() function #108577

Closed
wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Aug 28, 2023

This function can be used to convert the syslog extension to the limited C API. The PyInterpreterState_Main() is not available in the limited C API.


📚 Documentation preview 📚: https://cpython-previews--108577.org.readthedocs.build/

@vstinner
Copy link
Member Author

cc @ericsnowcurrently

Maybe the concept of "main interpreter" should be elaborate in the doc, since right now it's simply not documented or defined. I suppose that it's the only one which is allowed to register signal handlers?

Currently, the main interpreter has more features:

  • Can fork()
  • Has the "obmalloc" state (pymalloc)
  • Is responsible for init/clean global Python objects and other "global Python states", like static types and static built-ins, interned Unicode strings, GIL State, etc.
  • Can use PyOS_Readline()
  • Can import extensions which don't support sub-interpreters
  • Can call _PyPathConfig_UpdateGlobal()
  • Initialize _PyRuntime

I'm not sure about exec() and threads(), and these options:

    if (config->use_main_obmalloc) {
        interp->feature_flags |= Py_RTFLAGS_USE_MAIN_OBMALLOC;
    }
    else if (!config->check_multi_interp_extensions) {
        /* The reason: PyModuleDef.m_base.m_copy leaks objects between
           interpreters. */
        return _PyStatus_ERR("per-interpreter obmalloc does not support "
                             "single-phase init extension modules");
    }

    if (config->allow_fork) {
        interp->feature_flags |= Py_RTFLAGS_FORK;
    }
    if (config->allow_exec) {
        interp->feature_flags |= Py_RTFLAGS_EXEC;
    }
    // Note that fork+exec is always allowed.

    if (config->allow_threads) {
        interp->feature_flags |= Py_RTFLAGS_THREADS;
    }
    if (config->allow_daemon_threads) {
        interp->feature_flags |= Py_RTFLAGS_DAEMON_THREADS;
    }

    if (config->check_multi_interp_extensions) {
        interp->feature_flags |= Py_RTFLAGS_MULTI_INTERP_EXTENSIONS;
    }

The alternative is to expose PyInterpreterState_Main() which gives the main interpreter. I'm not sure which API is the best.

In Python 3.13, I also added Py_IsFinalizing(): https://docs.python.org/dev/c-api/init.html#c.Py_IsFinalizing

This function can be used to convert the syslog extension to the
limited C API. The PyInterpreterState_Main() is not available in the
limited C API.
@ericsnowcurrently
Copy link
Member

Maybe the concept of "main interpreter" should be elaborate in the doc, since right now it's simply not documented or defined.

See https://docs.python.org/3.12/c-api/init.html#sub-interpreter-support.

The alternative is to expose PyInterpreterState_Main() which gives the main interpreter. I'm not sure which API is the best.

Having both would be fine.

@vstinner
Copy link
Member Author

I would prefer to only add a single function to the limited C API.

@encukou
Copy link
Member

encukou commented Aug 31, 2023

Long-term, I don't think relying on “main”-ness is the right way.
In syslog, I guess only one interpreter can open the log, but it doesn't have to be the main interpreter.

@vstinner
Copy link
Member Author

Long-term, I don't think relying on “main”-ness is the right way.
In syslog, I guess only one interpreter can open the log, but it doesn't have to be the main interpreter.

For the specific case of the syslog module, please see issue gh-99127 and commit 8bb2303.

@encukou
Copy link
Member

encukou commented Aug 31, 2023

Yes. The current code makes an assumption that the “app” (that is, top-level code that sets up logging) lives in the main interpreter.
That's a fairly reasonable assumption at the moment, when the focus is on getting multiple interpreters working. But I'd rather not encourage relying on this assumption in the long term.

@vstinner
Copy link
Member Author

vstinner commented Sep 6, 2023

Yes. The current code makes an assumption that the “app” (that is, top-level code that sets up logging) lives in the main interpreter. That's a fairly reasonable assumption at the moment, when the focus is on getting multiple interpreters working. But I'd rather not encourage relying on this assumption in the long term.

In Python, the main interpreter is special. It has special skills and using Python API behaves differently depending if the current interpreter is the main one or not. So IMO it makes sense to check if the current code is running the main interpreter or not.

I suggest to leave the syslog extension aside. It can make its own decisions and yes, it can change in the future :-)


Cython uses a differerent logic to handle PEP 489 "Multi-phase extension module initialization". It uses a static PY_INT64_T main_interpreter_id = -1; variable which is initialized at the first call of __Pyx_check_single_interpreter().

But the problem is unrelated to checking if we are running the main interpreter or not.

//#if CYTHON_PEP489_MULTI_PHASE_INIT
static CYTHON_SMALL_CODE int __Pyx_check_single_interpreter(void) {
    #if PY_VERSION_HEX >= 0x030700A1
    static PY_INT64_T main_interpreter_id = -1;
    PY_INT64_T current_id = PyInterpreterState_GetID(PyThreadState_Get()->interp);
    if (main_interpreter_id == -1) {
        main_interpreter_id = current_id;
        return (unlikely(current_id == -1)) ? -1 : 0;
    } else if (unlikely(main_interpreter_id != current_id))

    #else
    static PyInterpreterState *main_interpreter = NULL;
    PyInterpreterState *current_interpreter = PyThreadState_Get()->interp;
    if (!main_interpreter) {
        main_interpreter = current_interpreter;
    } else if (unlikely(main_interpreter != current_interpreter))
    #endif

    {
        PyErr_SetString(
            PyExc_ImportError,
            "Interpreter change detected - this module can only be loaded into one interpreter per process.");
        return -1;
    }
    return 0;
}

@vstinner
Copy link
Member Author

vstinner commented Sep 6, 2023

Usage of the different existing APIs to check for the main intrepreter in PyPI top 5,000 projects.

Affected projects (9):

  • billiard (4.1.0)
  • catboost (1.2)
  • fastobo (0.12.2)
  • multiprocess (0.70.14)
  • numpy (1.25.0)
  • orjson (3.9.1)
  • psycopg2 (2.9.6)
  • psycopg2-binary (2.9.6)
  • tensorstore (0.1.40)

PyO3 binding has a comment about the lack of limited C API for this:

pyo3-ffi/src/intrcheck.rs: // skipped non-limited _PyOS_IsMainThread

tensorstore also has a comment about _PyOS_IsMainThread():

/// We can't use `_PyOS_IsMainThread()` because that requires the GIL on some

All matches:

PYPI-2023-07-04/billiard-4.1.0.tar.gz: billiard-4.1.0/Modules/_billiard/win32_functions.c: if (!wait_flag && _PyOS_IsMainThread()) {
PYPI-2023-07-04/fastobo-0.12.2.tar.gz: fastobo-0.12.2/crates/pyo3-ffi/src/cpython/pystate.rs: pub fn PyInterpreterState_Main() -> *mut PyInterpreterState;
PYPI-2023-07-04/fastobo-0.12.2.tar.gz: fastobo-0.12.2/crates/pyo3-ffi/src/intrcheck.rs: // skipped non-limited _PyOS_IsMainThread
PYPI-2023-07-04/catboost-1.2.tar.gz: catboost-1.2/catboost_all_src/contrib/python/numpy/py3/numpy/core/src/multiarray/multiarraymodule.c: if (PyThreadState_Get()->interp != PyInterpreterState_Main()) {
PYPI-2023-07-04/numpy-1.25.0.tar.gz: numpy-1.25.0/numpy/core/src/multiarray/multiarraymodule.c: if (PyThreadState_Get()->interp != PyInterpreterState_Main()) {
PYPI-2023-07-04/orjson-3.9.1.tar.gz: orjson-3.9.1/include/cargo/pyo3-ffi-0.19.0/src/cpython/pystate.rs: pub fn PyInterpreterState_Main() -> *mut PyInterpreterState;
PYPI-2023-07-04/orjson-3.9.1.tar.gz: orjson-3.9.1/include/cargo/pyo3-ffi-0.19.0/src/intrcheck.rs: // skipped non-limited _PyOS_IsMainThread
PYPI-2023-07-04/psycopg2-2.9.6.tar.gz: psycopg2-2.9.6/psycopg/utils.c: return _PyInterpreterState_Get() == PyInterpreterState_Main();
PYPI-2023-07-04/psycopg2-binary-2.9.6.tar.gz: psycopg2-binary-2.9.6/psycopg/utils.c: return _PyInterpreterState_Get() == PyInterpreterState_Main();
PYPI-2023-07-04/multiprocess-0.70.14.tar.gz: multiprocess-0.70.14/py3.10/Modules/_multiprocess/semaphore.c: if (_PyOS_IsMainThread()) {
PYPI-2023-07-04/multiprocess-0.70.14.tar.gz: multiprocess-0.70.14/py3.11/Modules/_multiprocess/semaphore.c: if (_PyOS_IsMainThread()) {
PYPI-2023-07-04/multiprocess-0.70.14.tar.gz: multiprocess-0.70.14/py3.7/Modules/_multiprocess/semaphore.c: if (_PyOS_IsMainThread()) {
PYPI-2023-07-04/multiprocess-0.70.14.tar.gz: multiprocess-0.70.14/py3.8/Modules/_multiprocess/semaphore.c: if (_PyOS_IsMainThread()) {
PYPI-2023-07-04/multiprocess-0.70.14.tar.gz: multiprocess-0.70.14/py3.9/Modules/_multiprocess/semaphore.c: if (_PyOS_IsMainThread()) {
PYPI-2023-07-04/tensorstore-0.1.40.tar.gz: tensorstore-0.1.40/python/tensorstore/gil_safe.cc: /// We can't use `_PyOS_IsMainThread()` because that requires the GIL on some

@encukou
Copy link
Member

encukou commented Sep 7, 2023

Cython uses a differerent logic to handle PEP 489 "Multi-phase extension module initialization". It uses a static PY_INT64_T main_interpreter_id = -1; variable which is initialized at the first call of __Pyx_check_single_interpreter().

But the problem is unrelated to checking if we are running the main interpreter or not.

IMO, this is along the lines of what syslog should do. It doesn't need the special skills of the main interpreter -- it only requires that different interpreters don't interfere with each other.
(Also, syslog.closelog can reset that int back to -1.)

@vstinner
Copy link
Member Author

IMO, this is along the lines of what syslog should do. It doesn't need the special skills of the main interpreter -- it only requires that different interpreters don't interfere with each other.

Do you want to propose an API to implement such check? How would the API look like?

@vstinner
Copy link
Member Author

vstinner commented Oct 3, 2023

Using the limited C API in stdlib C extensions is somehow on hold, and it became unclear to me if this function is the right want to handle sub-interpreters: see issue gh-109857. So I prefer close this PR for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants