-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for embedding the interpreter #774
Conversation
I edited the original post to add some more information. There's quite a lot of info to unpack with this PR so I may have forgotten to mention some bits. Just let me know if something looks off. |
@skebanga I found your blog post very helpful in identifying some pain points in pybind11's embedding support. I've tried to address them here (some issues were already resolved earlier, e.g. there's no more need for |
CMakeLists.txt
Outdated
@@ -44,6 +44,7 @@ set(PYBIND11_HEADERS | |||
include/pybind11/descr.h | |||
include/pybind11/options.h | |||
include/pybind11/eigen.h | |||
include/pybind11/embedded.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very superficial: I would prefer a verb or a noun for the header file name (e.g. embed.h
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Actually, looking at the whole thing again, perhaps interpreter.h
would be most appropriate. Some of the smaller functions currently in there could be moved into the main headers since they can be used even without the interpreter: py::globals()
can be move into pytypes.h
and py::main()
could become a static function of py::module
, i.e. py::module::main()
which would actually be clearer. Then the only thing left are the interpreter controls, thus interpreter.h
. Similarly, the CMake target could be pybind11::interpreter
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like embed.h
slightly more, given that it's a front-facing user feature, and that, if I'm a user wanting to "embed the interpreter" I think "embed" is the more descriptive word.
include/pybind11/embedded.h
Outdated
}); | ||
} | ||
\endrst */ | ||
#define PYBIND11_ADD_EMBEDDED_MODULE(name) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be called PYBIND11_EMBEDDED_PLUGIN
? (the module<->plugin naming choice is a bit unfortunate in retrospect, but it might be better to stay consistent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the naming bothers you, we could add
#define PYBIND11_EMBEDDED_MODULE(name) PYBIND11_EMBEDDED_PLUGIN(name)
#define PYBIND11_MODULE(name) PYBIND11_PLUGIN(name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment below discusses the rationale for name and interface change.
tests/test_cmake_build/embedded.cpp
Outdated
namespace py = pybind11; | ||
|
||
PYBIND11_ADD_EMBEDDED_MODULE(test_cmake_build)(py::module &m) { | ||
m.def("add", [](int i, int j) { return i + j; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we keep the same interface?
PYBIND11_EMBEDDED_PLUGIN(test_cmake_build) {
py::module m("test_cmake_build");
m.def("add", [](int i, int j) { return i + j; });
return m;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the things I should have discussed in more detail in the original post. The interface change here is prompted by the need to support the new multi-phase initialization (PEP 489). I haven't actually implemented it yet (it's on the TODO list for this PR) but it requires this interface change. PEP 489 splits up the creation and initialization of modules into separate phases. This makes it possible to support subinterpreters and interpreter reloading. To enable this, it's expected that users only handle module initialization, but not creation.
I think multi-phase initialization support is important enough to warrant the new interface -- at least for embedded modules (although it's certainly possible to have the same interface with extension modules). Note that PEP 489 is only available for Python >= 3.5, but it should be possible backport some of the benefits to older versions of Python by doing a similar creation/initialization split within the macro.
The name change to MODULE
was mostly done because of the interface change, to avoid having similarly named macros with completely different interfaces. Likewise, the ADD
in the macro name is intentionally there to suggest that multiple embedded modules can be added to an executable -- as opposed to the single extension module.
The interface inconsistency with PYBIND11_PLUGIN
is pretty bad and I would normally err on the side of consistency, but I feel that benefits of the new initialization scheme make it worth adopting, not just because of the embedded modules but also as a way of future-proofing if the old module initialization scheme becomes deprecated within Python. A possible way to restore consistency would be to introduce a similar macro for extension modules:
PYBIND11_MODULE(name)(py::module &m) { ... } // extension -- only one possible
PYBIND11_ADD_EMBEDDED_MODULE(name)(py::module &m) { ... } // embedded -- any number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One positive change of this: it always seemed redundant to me to need to specify "test_cmake_build" (or whatever) twice: once in the macro, and once as a string in the py::module
constructor.
Could we get rid of the necessity of the arguments from the macro by throwing (py::module &m)
into the macro? While I appreciate it being explicit, I feel it's the sort of thing that is always going to be specified exactly as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, eliminating the name repetition is a nice bonus and it also removes a possible beginner's mistake of setting different names for the macro and module (which results in a not-so-obvious error).
As for the argument, a possible alternative is:
PYBIND11_ADD_EMBEDDED_MODULE(name, m) { ... }
But I personally prefer the explicitness of py::module &m
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the multi-phase initialization: this makes sense, and I agree that it would be desirable to support it (with the appropriate fallbacks for Python 2.7). I find the macro with double parentheses somewhat painful. How about the following?
PYBIND11_EMBEDDED_MODULE(name, py::module &m) { ... }
PYBIND11_MODULE(name, py::module &m) { ... }
Note also the removal of _ADD_
-- the fact that there can be multiple embedded modules is something that should be explained in the documentation, but I don't think we put the explanation into the macro names.
Hi @dean0x7d, regarding the embedding support: this looks fantastic, +1 for the RAII approach. I added a few minor comments regarding syntax. One thing that does bother me a bit is the addition of a framework like Catch/GTest. Maybe we can discuss this? My view is that we can test virtually all of pybind11 via pytest. For embedding it will be useful to have a few tests on the C++ side, but it does not really change the overall numbers. IMO all we need is a way of doing a few assertions tests in C++, so this seems somewhat overkill to me (plus, it adds a network dependency). Thoughts? Thanks, |
Re: Catch, I think if used, it should either be an embedded component (i.e. in the pybind repository somewhere--which I think should be redistributable with pybind given that it's licensed under a permissive boost license) or should be a set of tests that we enable when available, like we do with eigen, perhaps with a cmake message along the lines of "install catch to enable embedded interpreter testing". Then we can put the header download into the CI scripts (or for Debian by adding I'm definitely not a fan of cmake invoking a download, and am fairly sure downstream package redistributors are going to find that objectionable as well. Edit: I'm personally in favour of the latter (downloading in the CI scripts), given the two above options. |
Just a brief user feedback: I've been made aware of this PR in the gitter channel and use the embedded interpreter in a toy project: https://github.com/fuchsto/dnants/blob/master/gos/src/client.cc |
There are two things that can make use of a C++ testing framework: the interpreter-related APIs and the Python C++ API (types, functions, exceptions, etc.). The interpreter APIs pretty much must be tested within an executable. A formal C++ test framework is optional, but I think it would be worth having. The current tests within this PR are only part of what needs to be tested. Additional tests will be needed for multiple embedded modules, subinterpreters, restarting the interpreter, and some additional utility functions. This can definitely be done with just having a simple fatal error when something goes wrong, but I dislike the idea of losing detailed failure reports and introspection offered by test frameworks. It would make diagnosing issues more difficult. Without a framework, it's possible to create some assertions macros (instead of a fatal error) and logging functionality, but that's just going in the direction of reinventing a test framework. The other thing is the Python C++ API which doesn't need to be tested in an executable/C++ tests, but it would make things easier and reduce some of the testing boilerplate of going from C++ to Python just for an assertion. E.g. the following is a simple one-liner test in Catch: REQUIRE_THROWS_WITH(py::eval("nonsense code ..."), Catch::Contains("invalid syntax")); It would otherwise require creating dedicated functions on both the C++ and Python sides. This isn't a huge deal, but it would be nice to avoid some boilerplate when writing tests.
The default is to check if the headers are installed locally and only download if not found. I don't think any distributions will have an issue with that (CMake script download things all the time). I like the automated download because it makes it very convenient for first-time contributors to get started. Including the |
@fuchsto Thanks for the feedback. That's very good to hear. Regarding performance, note that you can also avoid auto update_state = module.attr("update_state");
return update_state(current, neighbor_grid(...)).cast<T>; It doesn't eliminate the Python function call overhead, but it does avoid the
Yeah, sorry about that. Expanding the docs is definitely on the TODO list for this PR, but it's been on pause for a while since I've had some trouble finding free time lately. But that should improve this week and I'm hoping to finish it up soon. |
I'm not opposed to the download functionality being in cmake, but I think it should be something the user explicitly asks for, sort of like we give the |
|
As suggested, the header A full new documentation page has been added for embedding. It should cover everything from getting started to a few notes for advanced features. As for subinterpreters and restarting, support and tests are included for all Python versions with a few caveats -- see the last two sections in the new docs page. Implementing PEP 489 would iron out the remaining issues, however, it would require a large rework of the
Implementing PEP 489 would entail breaking up Now, with PEP 489 out the window, the new module macro syntax isn't a prerequisite anymore. However, I would still advocate for the new syntax for future-proofing and dropping the module name duplication. |
Looking forward to this getting merged. I was totally impressed how easy it was to embed python in C++ thanks to pybind11. It'll be even easier with this PR merged. I'd be happy to update this example for the docs when this gets merged. Or feel free to grab and modify this practical example. #include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <pybind11/stl.h>
#include <pybind11/eigen.h>
namespace py = pybind11;
#include <Eigen/Dense>
int main(){
Py_Initialize();
// Import numpy and matplotlib
py::object np = py::module::import("numpy");
py::module plt = py::module::import("matplotlib.pyplot");
// Create a linearly-spaced vector in numpy
auto x1 = np.attr("linspace")(0,10,300).attr("tolist")(); // Vector converted to list, though not necessary
auto y1 = np.attr("sin")(x1);
plt.attr("plot")(x1, y1, "o-");
// Do the same exercise with Eigen VectorXd, except cos(x) instead of sin(x)
Eigen::VectorXd x2 = Eigen::VectorXd::LinSpaced(300, 0, 10);
// The Eigen::Ref passes a reference to an Eigen MatrixXd, which should avoid the copy
// but a temporary numpy array is created
auto y2 = np.attr("cos")(Eigen::Ref<Eigen::VectorXd>(x2));
plt.attr("plot")(x2, y2, "^-");
plt.attr("show")();
return 0;
} |
"but a temporary array is created" -> nope, no temporary data copy will be involved there. The intermediate object will just be a shell that points directly to the data stored in |
Thanks @jagerman - I seem to recall that the direct reference wasn't working, which is why I reverted to the Eigen::Ref. I'll update my code, but in this case, its only a question of convenience. |
There's nothing wrong with using an Eigen::Ref there--I was mainly pointing out that the comment in the Eigen part about needing a temporary isn't actually the case (no matter whether you use the Eigen::Ref or a pointer). (It's a different story if you were to do But this is all a bit of a diversion: I think the example itself is great. |
Having used |
Minor comment re: the new macro (+1 on it, I actually had the exact same thing I've been using myself for a while): PYBIND11_MODULE(example, py::module &m) Is PYBIND11_MODULE(example, m) // As a bonus, the minimal hello-world example becomes a tad shorter as it doesn't even have to reference the |
Yeah, @jagerman raised the same point. With the initial syntax ( |
Last round of changes:
I think that's pretty much done, as far as I see. Does anything else need to be addressed here? |
This looks pretty good to me. This is really great work! One minor thing (which needn't hold up merging this PR): How would you feel about adding a PYBIND11_PLUGIN(pybind11_tests) {
py::module m("pybind11_tests", "pybind testing plugin"); to: PYBIND11_MODULE(pybind11_tests, m) {
m.attr("__doc__") = "pybind11 test module"; which, though purely cosmetic, is a bit uglier, especially given that one typically doesn't set PYBIND11_MODULE(pybind11_tests, m) {
m.doc("pybind11 test module"); |
Good point (although |
It (along with everything else) would get it automatically if it went into |
Yea, that's exactly what I meant by "purely for consistency purposes" :) |
This looks great. If auto doc() { return attr("__doc__"); } i.e. PYBIND11_MODULE(pybind11_tests, m) {
m.doc() = "pybind11 test module";
//... contrived :)
my_class_b.doc() = my_class_a.doc();
} |
All targets provided by pybind11: * pybind11::module - the existing target for creating extension modules * pybind11::embed - new target for embedding the interpreter * pybind11::pybind11 - common "base" target (headers only)
At this point, there is only a single test for interpreter basics. Apart from embedding itself, having a C++ test framework will also benefit the C++-side features by allowing them to be tested directly.
Adding a nicer shortcut for Since that isn't actually related to embedding and there's still a bit of work, I'll spin out the whole |
Embedding support is merged now. For the module macro and |
This adds build system, testing and a basic C++ interface for embedding the Python interpreter with pybind11. The C++ interface covers just the basics and more convenience features could be added, but the aim of this PR is to get things rolling and have the build/test infrastructure in place so embedding features can easily be added and tested.
New CMake target:
pybind11::embed
This is pretty straightforward. There's a new target which can be used as:
Unlike the extension module support (
pybind11::module
target andpybind11_add_module
function), there's no function version here since it wouldn't offer any advantage.PyPy doesn't offer any embedding support with the CPython API, so the target tests are skipped there.
Testing with Catch
This PR adds the Catch framework for testing the interpreter in an executable. There are other options here (e.g. Google Test) but Catch feels like the pytest of C++ with minimal assertion macros and some nice failure introspection.
The build system will try to find a local
catch.hpp
file and if not found it will download it into the build directory (it's a single header). The tests can be executed usingmake cpptest
(to mirrormake pytest
) ormake check
which tests everything.Apart from embedding itself, having a C++ test framework will also benefit the C++-side features by allowing them to be tested directly. The current situation with the extension module and pytest is that all assertion need to made in Python. This makes testing C++ interface features awkward since the majority of the logic is on the C++ side.
C++ interface for the interpreter
This is pretty basic support for starting and stopping the interpreter with a scope guard and support for adding any number of C++ modules using the
PYBIND11_EMBEDDED_MODULE
macro.The macro has a slightly different signature compared to the extension module version -- this will be needed to support restarting the interpreter.
This isn't supported yet in this PR and it would only be possible with Python >= 3.5 which have an alternative mechanism of initializing binary modules (PEP 489). The macro added here can be made to work with it.