Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyImport_AddModuleRef is questionable #106915

Open
encukou opened this issue Jul 20, 2023 · 14 comments
Open

PyImport_AddModuleRef is questionable #106915

encukou opened this issue Jul 20, 2023 · 14 comments
Assignees

Comments

@encukou
Copy link
Member

encukou commented Jul 20, 2023

The function PyImport_AddModuleRef, which was recently added to the stable ABI, is questionable.

It returns an already loaded module, or a new empty module, but doesn't give any indication of which one of those it did. IMO, that means you can't really use it for anything except a module that's supposed to stay empty.

It is not clear to me if this function actually useful for anything except the __main__ module.
Maybe it would be more useful as a recipe in the docs that users could adapt to their needs, rather than API. Or it should be changed to make it easier to use safely.

Linked PRs

@vstinner
Copy link
Member

It is not clear to me if this function actually useful for anything except the main module

I looked for usage of PyImport_AddModule() and I was surprised by the results:
#105922 (comment)

It's used for other modules than __main__.

The function PyImport_AddModuleRef, which was recently added to the stable ABI, is questionable.

I'm not sure of the purpose of this issue. What do you propose?

@encukou
Copy link
Member Author

encukou commented Aug 21, 2023

OK, I've analyzed the usages of PyImport_AddModule you posted in #105922 (comment)

⚠ Creating a new module

The following are (as far as I can tell) meant to create a new module.
If the module already existed, it would be a bug -- usually some subsequent code would reload/reinitialize the module.
Presumably the code guards against that somehow, but the API could be much more helpful (at least signal that the module was already loaded).

CFFI (imports a module "from" literal C source using PyRun_String):

cffi-1.15.1.tar.gz: cffi-1.15.1/cffi/_cffi_errors.h: m = PyImport_AddModule("_cffi_error_capture");

OpenCV (initializes submodules this way):

opencv-python-4.7.0.72.tar.gz: opencv-python-4.7.0.72/opencv/modules/python/src2/cv2.cpp: submodule = PyImport_AddModule(full_submodule_name.c_str());
opencv-python-4.7.0.72.tar.gz: opencv-python-4.7.0.72/opencv/modules/python/src2/cv2.cpp: /// Return `PyImport_AddModule` NULL with an exception set on failure.
opencv-python-headless-4.7.0.72.tar.gz: opencv-python-headless-4.7.0.72/opencv/modules/python/src2/cv2.cpp: submodule = PyImport_AddModule(full_submodule_name.c_str());
opencv-python-headless-4.7.0.72.tar.gz: opencv-python-headless-4.7.0.72/opencv/modules/python/src2/cv2.cpp: /// Return `PyImport_AddModule` NULL with an exception set on failure.
opencv-contrib-python-4.7.0.72.tar.gz: opencv-contrib-python-4.7.0.72/opencv/modules/python/src2/cv2.cpp: submodule = PyImport_AddModule(full_submodule_name.c_str());
opencv-contrib-python-4.7.0.72.tar.gz: opencv-contrib-python-4.7.0.72/opencv/modules/python/src2/cv2.cpp: /// Return `PyImport_AddModule` NULL with an exception set on failure.
opencv-contrib-python-headless-4.7.0.72.tar.gz: opencv-contrib-python-headless-4.7.0.72/opencv/modules/python/src2/cv2.cpp: submodule = PyImport_AddModule(full_submodule_name.c_str());
opencv-contrib-python-headless-4.7.0.72.tar.gz: opencv-contrib-python-headless-4.7.0.72/opencv/modules/python/src2/cv2.cpp: /// Return `PyImport_AddModule` NULL with an exception set on failure.

pybind11:

pybind11-2.10.4.tar.gz: pybind11-2.10.4/pybind11/include/pybind11/pybind11.h: handle submodule = PyImport_AddModule(full_name.c_str());
onnx-1.13.1.tar.gz: onnx-1.13.1/third_party/pybind11/include/pybind11/pybind11.h: auto result = reinterpret_borrow<module_>(PyImport_AddModule(full_name.c_str()));
dlib-19.24.1.tar.gz: dlib-19.24.1/dlib/external/pybind11/include/pybind11/pybind11.h: handle submodule = PyImport_AddModule(full_name.c_str());
osmium-3.6.0.tar.gz: osmium-3.6.0/contrib/pybind11/include/pybind11/pybind11.h: handle submodule = PyImport_AddModule(full_name.c_str());
onnxoptimizer-0.3.10.tar.gz: onnxoptimizer-0.3.10/third_party/onnx/third_party/pybind11/include/pybind11/pybind11.h: auto result = reinterpret_borrow<module_>(PyImport_AddModule(full_name.c_str()));
onnxsim-0.4.19.tar.gz: onnxsim-0.4.19/third_party/onnx-optimizer/third_party/onnx/third_party/pybind11/include/pybind11/pybind11.h: auto result = reinterpret_borrow<module_>(PyImport_AddModule(full_name.c_str()));
onnxsim-0.4.19.tar.gz: onnxsim-0.4.19/third_party/pybind11/include/pybind11/pybind11.h: auto result = reinterpret_borrow<module_>(PyImport_AddModule(full_name.c_str()));
mercurial-6.4.1.tar.gz: mercurial-6.4.1/contrib/fuzz/pyutil.cc: mainmod = PyImport_AddModule("__main__");

uwsgi:

uwsgi-2.0.21.tar.gz: uwsgi-2.0.21/plugins/python/python_plugin.c: new_uwsgi_module = PyImport_AddModule("uwsgi");
uwsgi-2.0.21.tar.gz: uwsgi-2.0.21/plugins/python/python_plugin.c: py_file_module = PyImport_AddModule(name);
uwsgi-2.0.21.tar.gz: uwsgi-2.0.21/plugins/python/symimporter.c: PyObject *mod = PyImport_AddModule(fullname);
uwsgi-2.0.21.tar.gz: uwsgi-2.0.21/plugins/python/symimporter.c: PyObject *mod = PyImport_AddModule(fullname);
uwsgi-2.0.21.tar.gz: uwsgi-2.0.21/plugins/python/symimporter.c: PyObject *mod = PyImport_AddModule(fullname);
uwsgi-2.0.21.tar.gz: uwsgi-2.0.21/plugins/python/symimporter.c: PyObject *mod = PyImport_AddModule(fullname);

others:

pythonnet-3.0.1.tar.gz: pythonnet-3.0.1/src/runtime/PythonEngine.cs: var module = Runtime.PyImport_AddModule(name);

⚠ Importing the standard library

The module is assumed to already exist and be loaded.
Most of these cases should use PyImport_ImportModule instead.

matplotlib-3.7.1.tar.gz: matplotlib-3.7.1/src/_macosx.m: if (!(builtins = PyImport_AddModule("builtins"))  // borrowed.
JPype1-1.4.1.tar.gz: JPype1-1.4.1/native/common/jp_context.cpp: JPPyObject import = JPPyObject::use(PyImport_AddModule("importlib.util"));
uwsgi-2.0.21.tar.gz: uwsgi-2.0.21/plugins/python/python_plugin.c: if (!PyImport_AddModule("dummy_threading"))

⚠ Importing existing module

Here, the module is assumed to already exist and be loaded.
These should also use PyImport_ImportModule instead.

jpype.protocol is a .py file:

JPype1-1.4.1.tar.gz: JPype1-1.4.1/native/common/include/jp_primitive_accessor.h: PyObject *typing = PyImport_AddModule("jpype.protocol");
JPype1-1.4.1.tar.gz: JPype1-1.4.1/native/common/jp_booleantype.cpp: PyObject *typing = PyImport_AddModule("jpype.protocol");
JPype1-1.4.1.tar.gz: JPype1-1.4.1/native/common/jp_booleantype.cpp: PyObject *typing = PyImport_AddModule("jpype.protocol");
JPype1-1.4.1.tar.gz: JPype1-1.4.1/native/common/jp_classhints.cpp: PyObject *typing = PyImport_AddModule("jpype.protocol");
JPype1-1.4.1.tar.gz: JPype1-1.4.1/native/common/jp_classhints.cpp: PyObject *typing = PyImport_AddModule("jpype.protocol");
JPype1-1.4.1.tar.gz: JPype1-1.4.1/native/common/jp_classhints.cpp: PyObject *typing = PyImport_AddModule("jpype.protocol");
JPype1-1.4.1.tar.gz: JPype1-1.4.1/native/common/jp_classhints.cpp: PyObject *typing = PyImport_AddModule("jpype.protocol");
JPype1-1.4.1.tar.gz: JPype1-1.4.1/native/common/jp_classhints.cpp: PyObject *typing = PyImport_AddModule("jpype.protocol");
JPype1-1.4.1.tar.gz: JPype1-1.4.1/native/common/jp_functional.cpp: PyObject *typing = PyImport_AddModule("jpype.protocol");

Projects importing themselves:

PyMuPDF-1.21.1.tar.gz: PyMuPDF-1.21.1/fitz/helper-other.i: PyObject *this_module = PyImport_AddModule("fitz");  // get our module
casadi-3.6.0.tar.gz: casadi-3.6.0/swig/casadi.i: PyObject* module = PyImport_AddModule("casadi");

✔ Namespace

These create a module that's initially empty, and is filled with attributes.

This is a valid use case, if somewhat niche.
Such a private namespace seems useful for code generators.

Cython:

Cython-0.29.34.tar.gz: Cython-0.29.34/Cython/Compiler/ModuleNode.py: '%s = PyImport_AddModule(__Pyx_BUILTIN_MODULE_NAME); %s' % (`
Cython-0.29.34.tar.gz: Cython-0.29.34/Cython/Compiler/ModuleNode.py: '%s = PyImport_AddModule((char *) "cython_runtime"); %s' % (
Cython-0.29.34.tar.gz: Cython-0.29.34/Cython/Compiler/ModuleNode.py: '%s = PyImport_AddModule("%s"); %s' % (
Cython-0.29.34.tar.gz: Cython-0.29.34/Cython/Includes/cpython/module.pxd: PyObject* PyImport_AddModule(const char *name) except NULL
Cython-0.29.34.tar.gz: Cython-0.29.34/Cython/Utility/CommonStructures.c: fake_module = PyImport_AddModule((char*) "_cython_" CYTHON_ABI);
Cython-0.29.34.tar.gz: Cython-0.29.34/Cython/Utility/ModuleSetupCode.c: abi_module = PyImport_AddModule(__Pyx_FastGIL_ABI_module);

SWIG runtime data:

sentencepiece-0.1.98.tar.gz: sentencepiece-0.1.98/src/sentencepiece/sentencepiece_wrap.cxx: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
python-crfsuite-0.9.9.tar.gz: python-crfsuite-0.9.9/crfsuite/swig/python/export_wrap.cpp: PyObject *module = PyImport_AddModule((char*)"swig_runtime_data" SWIG_RUNTIME_VERSION);
cvxpy-1.3.1.tar.gz: cvxpy-1.3.1/cvxpy/cvxcore/python/cvxcore_wrap.cxx: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
mecab-python3-1.0.6.tar.gz: mecab-python3-1.0.6/src/MeCab/MeCab_wrap.cpp: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
M2Crypto-0.38.0.tar.gz: M2Crypto-0.38.0/src/SWIG/_m2crypto_wrap.c: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
pygraphviz-1.10.zip: pygraphviz-1.10/pygraphviz/graphviz_wrap.c: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
GDAL-3.6.3.tar.gz: GDAL-3.6.3/extensions/gdal_array_wrap.cpp: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
GDAL-3.6.3.tar.gz: GDAL-3.6.3/extensions/gdal_wrap.cpp: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
GDAL-3.6.3.tar.gz: GDAL-3.6.3/extensions/gdalconst_wrap.c: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
GDAL-3.6.3.tar.gz: GDAL-3.6.3/extensions/gnm_wrap.cpp: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
GDAL-3.6.3.tar.gz: GDAL-3.6.3/extensions/ogr_wrap.cpp: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
GDAL-3.6.3.tar.gz: GDAL-3.6.3/extensions/osr_wrap.cpp: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
pygdal-3.5.1.11.tar.gz: pygdal-3.5.1.11/extensions/gdal_array_wrap.cpp: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
pygdal-3.5.1.11.tar.gz: pygdal-3.5.1.11/extensions/gdal_wrap.cpp: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
pygdal-3.5.1.11.tar.gz: pygdal-3.5.1.11/extensions/gdalconst_wrap.c: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
pygdal-3.5.1.11.tar.gz: pygdal-3.5.1.11/extensions/ogr_wrap.cpp: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
pygdal-3.5.1.11.tar.gz: pygdal-3.5.1.11/extensions/osr_wrap.cpp: PyObject *module = PyImport_AddModule("swig_runtime_data" SWIG_RUNTIME_VERSION);
casadi-3.6.0.tar.gz: casadi-3.6.0/swig/python/target/source/casadiPYTHON_wrap.cxx: PyObject *module = PyImport_AddModule((char*)"swig_runtime_data" SWIG_RUNTIME_VERSION);
casadi-3.6.0.tar.gz: casadi-3.6.0/swig/python/target3/source/casadiPYTHON_wrap.cxx: PyObject *module = PyImport_AddModule((char*)"swig_runtime_data" SWIG_RUNTIME_VERSION);

✔ Importing __main__

Another legit but niche use case. The __main__ module is initially empty, so PyImport_AddModule does the right thing.

pyinstaller-5.10.0.tar.gz: pyinstaller-5.10.0/bootloader/src/pyi_launch.c: __main__ = PI_PyImport_AddModule("__main__");
pyarmor-8.1.3.zip: pyarmor-8.1.3/src/helper/buildext.py: m = PyImport_AddModule("__main__");

□ Not relevant.

Comments:

cloudpickle-2.2.1.tar.gz: cloudpickle-2.2.1/tests/cloudpickle_testpkg/_cloudpickle_testpkg/mod.py: # it will use the Python/C API ``PyImport_AddModule(submodule_qualified_name)``
cloudpickle-2.2.1.tar.gz: cloudpickle-2.2.1/tests/cloudpickle_testpkg/build/lib/_cloudpickle_testpkg/mod.py: # it will use the Python/C API ``PyImport_AddModule(submodule_qualified_name)``
osmium-3.6.0.tar.gz: osmium-3.6.0/contrib/pybind11/tests/test_modules.py: # Meant to trigger PyImport_AddModule() failure:

Re-exports & definitions:

orjson-3.8.10.tar.gz: orjson-3.8.10/local_dependencies/pyo3-ffi/src/import.rs: pub fn PyImport_AddModuleObject(name: *mut PyObject) -> *mut PyObject;
orjson-3.8.10.tar.gz: orjson-3.8.10/local_dependencies/pyo3-ffi/src/import.rs: #[cfg_attr(PyPy, link_name = "PyPyImport_AddModule")]
orjson-3.8.10.tar.gz: orjson-3.8.10/local_dependencies/pyo3-ffi/src/import.rs: pub fn PyImport_AddModule(name: *const c_char) -> *mut PyObject;
orjson-3.8.10.tar.gz: orjson-3.8.10/include/pyo3/pyo3-ffi/src/import.rs: pub fn PyImport_AddModuleObject(name: *mut PyObject) -> *mut PyObject;
orjson-3.8.10.tar.gz: orjson-3.8.10/include/pyo3/pyo3-ffi/src/import.rs: #[cfg_attr(PyPy, link_name = "PyPyImport_AddModule")]
orjson-3.8.10.tar.gz: orjson-3.8.10/include/pyo3/pyo3-ffi/src/import.rs: pub fn PyImport_AddModule(name: *const c_char) -> *mut PyObject;

pyinstaller-5.10.0.tar.gz: pyinstaller-5.10.0/bootloader/src/pyi_python.c: DECLPROC(PyImport_AddModule);
pyinstaller-5.10.0.tar.gz: pyinstaller-5.10.0/bootloader/src/pyi_python.c: GETPROC(dll, PyImport_AddModule);
pyinstaller-5.10.0.tar.gz: pyinstaller-5.10.0/bootloader/src/pyi_python.h: EXTDECLPROC(PyObject *, PyImport_AddModule, (char *));

pythonnet-3.0.1.tar.gz: pythonnet-3.0.1/src/runtime/Runtime.Delegates.cs: PyImport_AddModule = (delegate* unmanaged[Cdecl]<StrPtr, BorrowedReference>)GetFunctionByName(nameof(PyImport_AddModule), GetUnmanagedDll(_PythonDll));
pythonnet-3.0.1.tar.gz: pythonnet-3.0.1/src/runtime/Runtime.Delegates.cs: internal static delegate* unmanaged[Cdecl]<StrPtr, BorrowedReference> PyImport_AddModule { get; }
pythonnet-3.0.1.tar.gz: pythonnet-3.0.1/src/runtime/Runtime.cs: internal static BorrowedReference PyImport_AddModule(string name)
pythonnet-3.0.1.tar.gz: pythonnet-3.0.1/src/runtime/Runtime.cs: return Delegates.PyImport_AddModule(namePtr);

Generated code:

casadi-3.6.0.tar.gz: casadi-3.6.0/swig/python/target/source/casadiPYTHON_wrap.cxx: PyObject* module = PyImport_AddModule(SWIG_module_name);
casadi-3.6.0.tar.gz: casadi-3.6.0/swig/python/target/source/casadiPYTHON_wrap.cxx: PyObject* module = PyImport_AddModule("casadi");
casadi-3.6.0.tar.gz: casadi-3.6.0/swig/python/target3/source/casadiPYTHON_wrap.cxx: PyObject* module = PyImport_AddModule(SWIG_module_name);
casadi-3.6.0.tar.gz: casadi-3.6.0/swig/python/target3/source/casadiPYTHON_wrap.cxx: PyObject* module = PyImport_AddModule("casadi");

@encukou
Copy link
Member Author

encukou commented Aug 21, 2023

Based on this, I believe it would be better to add the following,
instead of PyImport_AddModuleRef:

int PyImport_ImportOrAddModule(char* name, PyObject **result);
  • Return -1 on error
  • If the module was already loaded, store a new reference to it in *result and return 1.
  • If not, store a reference to a newly added module in *result and return 0.

@vstinner would that make sense to you?

Usage examples:

Creating a new module

PyObject *mod;
int result = PyImport_ImportOrAddModule("__main__", &mod);
if (result < 0) {
    goto error;
}
if (result == 0) {
    ... initialize `mod` ...
    // (on error, the module should be removed from sys.modules!)
}

Importing __main__ or another “namespace” module

PyObject *main_mod;
int result = PyImport_ImportOrAddModule("__main__", &main_mod);
if (result < 0) {
    goto error;
}

Importing stdlib or a module that should be already loaded

Use PyImport_ImportModule instead.

@encukou
Copy link
Member Author

encukou commented Sep 5, 2023

@vstinner would that make sense to you?

@vstinner
Copy link
Member

vstinner commented Sep 5, 2023

Yes, your proposed API makes sense, as soon as the caller can decide to treat 0 and 1 the same way, the migration should be smooth.

Would it be possible to keep PyImport_AddModuleRef() name, to have a smoother transition (the new name would be easier to discover)? I don't have a strong opinion on names, but your proposed name sounds too long and I'm surprised to have "Or" in its name. If you are fine with keeping the name, I can write a PR to maje the API change.

@erlend-aasland
Copy link
Contributor

@encukou, wouldn't it make for cleaner APIs if we provide two separate APIs for each use case (1. create module, 2. import __main__ or namespace module)? Making one API do two things is seldom a good idea IMO.

@vstinner
Copy link
Member

vstinner commented Sep 6, 2023

My worry is the adoption of a new API if it's too different to the API that it replaces, or too complicated to use.

PyImport_AddModule() is weird, but it has its user base. For me, the most surprising is to see that it's used to get a module already loaded!

Maybe we should put a different name and fail if the module already exists. I don't know.

@encukou
Copy link
Member Author

encukou commented Sep 7, 2023

Would it be possible to keep PyImport_AddModuleRef() name, to have a smoother transition (the new name would be easier to discover)?

I'd prefer not doing that. Few projects use it correctly, so if we're already asking the maintainers to update their code (add DECREF), I think it's reasonable to switch to the slightly more complex alternative.
Could the deprecation warning mention the intended replacement, to make it more discoverable?

@encukou, wouldn't it make for cleaner APIs if we provide two separate APIs for each use case (1. create module, 2. import main or namespace module)? Making one API do two things is seldom a good idea IMO.

For 1. (create new module, which needs to be initialized), you'd use the API I proposed. You need the 3-way return to tell whether the module needs initialization.
Once we have that API, 2. (__main__ or namespace module) becomes a special case of 1. -- you simply always skip the initialization.
For 2., PyImport_AddModuleRef actually does the right thing, and is a bit easier to use -- but IMO this case is not common enough to get a shortcut. (Here I'm mostly worried about the docs: explaining when should you use one or the other. Skipping the (result == 0) initialization branch if it doesn't apply to you is straightforward.)

your proposed name sounds too long and I'm surprised to have "Or" in its name.

Yeah, a better name describing the getdefault- style functionality would be nice.
IMO PyImport_ImportOrAddModule is more likely to make people read the docs, rather than assume behaviour from the name alone.

@encukou
Copy link
Member Author

encukou commented Oct 23, 2023

@vstinner do you still insist the function needs to be added to the stable ABI?

vstinner added a commit to vstinner/cpython that referenced this issue Oct 31, 2023
Remove PyImport_AddModuleRef() function.
@vstinner
Copy link
Member

vstinner commented Oct 31, 2023

I dislike PyImport_ImportOrAddModule() name, it's just too long :-)

I proposed PR gh-111559 to add int PyImport_CreateModule(name, &module) function and remove module = PyImport_AddModuleRef(name) function.

@encukou
Copy link
Member Author

encukou commented Nov 2, 2023

Thank you!

I dislike PyImport_ImportOrAddModule() name, it's just too long :-)

It's long but precise. IMO, PyImport_CreateModule is confusing: it doesn't always create the module -- sometimes it imports the existing one.

What about PyImport_ImportOrCreate?

@vstinner
Copy link
Member

vstinner commented Nov 2, 2023

It's long but precise. IMO, PyImport_CreateModule is confusing: it doesn't always create the module -- sometimes it imports the existing one.

It's now harder to miss the fact that the module can be not created if it already exists with the new API, since the information is in the return value:

int PyImport_CreateModule(const char *name, PyObject **module)

@encukou
Copy link
Member Author

encukou commented Nov 2, 2023

Well, I guess this should be discussed in the c-api working group, once that's formed.

vstinner added a commit to vstinner/cpython that referenced this issue Nov 7, 2023
Remove PyImport_AddModuleRef() function.
@vstinner
Copy link
Member

vstinner commented Nov 7, 2023

Since no one proposed better name, I modified my PR to rename the function to PyImport_ImportOrCreate(). I also updated the doc to explain that first the function tries to get an already imported module, and only create a new module if it doesn't exist (to better stick to the new function name).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants