Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for embedding the interpreter #774

Merged
merged 5 commits into from
May 28, 2017
Merged

Conversation

dean0x7d
Copy link
Member

@dean0x7d dean0x7d commented Apr 2, 2017

This adds build system, testing and a basic C++ interface for embedding the Python interpreter with pybind11. The C++ interface covers just the basics and more convenience features could be added, but the aim of this PR is to get things rolling and have the build/test infrastructure in place so embedding features can easily be added and tested.

New CMake target: pybind11::embed

This is pretty straightforward. There's a new target which can be used as:

add_executable(example example.cpp)
target_link_libraries(example PRIVATE pybind11::embed)

Unlike the extension module support (pybind11::module target and pybind11_add_module function), there's no function version here since it wouldn't offer any advantage.

PyPy doesn't offer any embedding support with the CPython API, so the target tests are skipped there.

Testing with Catch

This PR adds the Catch framework for testing the interpreter in an executable. There are other options here (e.g. Google Test) but Catch feels like the pytest of C++ with minimal assertion macros and some nice failure introspection.

The build system will try to find a local catch.hpp file and if not found it will download it into the build directory (it's a single header). The tests can be executed using make cpptest (to mirror make pytest) or make check which tests everything.

Apart from embedding itself, having a C++ test framework will also benefit the C++-side features by allowing them to be tested directly. The current situation with the extension module and pytest is that all assertion need to made in Python. This makes testing C++ interface features awkward since the majority of the logic is on the C++ side.

C++ interface for the interpreter

This is pretty basic support for starting and stopping the interpreter with a scope guard and support for adding any number of C++ modules using the PYBIND11_EMBEDDED_MODULE macro.

The macro has a slightly different signature compared to the extension module version -- this will be needed to support restarting the interpreter. This isn't supported yet in this PR and it would only be possible with Python >= 3.5 which have an alternative mechanism of initializing binary modules (PEP 489). The macro added here can be made to work with it.

@dean0x7d
Copy link
Member Author

dean0x7d commented Apr 2, 2017

I edited the original post to add some more information. There's quite a lot of info to unpack with this PR so I may have forgotten to mention some bits. Just let me know if something looks off.

@dean0x7d
Copy link
Member Author

dean0x7d commented Apr 2, 2017

@skebanga I found your blog post very helpful in identifying some pain points in pybind11's embedding support. I've tried to address them here (some issues were already resolved earlier, e.g. there's no more need for py::cast for dict assignments). test_interpreter.cpp and test_interpreter.py cover some of the same use cases as you blog post. If you found any other problematic areas, I'd love to hear about them.

CMakeLists.txt Outdated
@@ -44,6 +44,7 @@ set(PYBIND11_HEADERS
include/pybind11/descr.h
include/pybind11/options.h
include/pybind11/eigen.h
include/pybind11/embedded.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very superficial: I would prefer a verb or a noun for the header file name (e.g. embed.h)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Actually, looking at the whole thing again, perhaps interpreter.h would be most appropriate. Some of the smaller functions currently in there could be moved into the main headers since they can be used even without the interpreter: py::globals() can be move into pytypes.h and py::main() could become a static function of py::module, i.e. py::module::main() which would actually be clearer. Then the only thing left are the interpreter controls, thus interpreter.h. Similarly, the CMake target could be pybind11::interpreter. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like embed.h slightly more, given that it's a front-facing user feature, and that, if I'm a user wanting to "embed the interpreter" I think "embed" is the more descriptive word.

});
}
\endrst */
#define PYBIND11_ADD_EMBEDDED_MODULE(name) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be called PYBIND11_EMBEDDED_PLUGIN? (the module<->plugin naming choice is a bit unfortunate in retrospect, but it might be better to stay consistent)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the naming bothers you, we could add

#define PYBIND11_EMBEDDED_MODULE(name) PYBIND11_EMBEDDED_PLUGIN(name)
#define PYBIND11_MODULE(name) PYBIND11_PLUGIN(name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment below discusses the rationale for name and interface change.

namespace py = pybind11;

PYBIND11_ADD_EMBEDDED_MODULE(test_cmake_build)(py::module &m) {
m.def("add", [](int i, int j) { return i + j; });
Copy link
Member

@wjakob wjakob Apr 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we keep the same interface?

PYBIND11_EMBEDDED_PLUGIN(test_cmake_build) {
    py::module m("test_cmake_build");
    m.def("add", [](int i, int j) { return i + j; });
    return m;
}

Copy link
Member Author

@dean0x7d dean0x7d Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the things I should have discussed in more detail in the original post. The interface change here is prompted by the need to support the new multi-phase initialization (PEP 489). I haven't actually implemented it yet (it's on the TODO list for this PR) but it requires this interface change. PEP 489 splits up the creation and initialization of modules into separate phases. This makes it possible to support subinterpreters and interpreter reloading. To enable this, it's expected that users only handle module initialization, but not creation.

I think multi-phase initialization support is important enough to warrant the new interface -- at least for embedded modules (although it's certainly possible to have the same interface with extension modules). Note that PEP 489 is only available for Python >= 3.5, but it should be possible backport some of the benefits to older versions of Python by doing a similar creation/initialization split within the macro.

The name change to MODULE was mostly done because of the interface change, to avoid having similarly named macros with completely different interfaces. Likewise, the ADD in the macro name is intentionally there to suggest that multiple embedded modules can be added to an executable -- as opposed to the single extension module.

The interface inconsistency with PYBIND11_PLUGIN is pretty bad and I would normally err on the side of consistency, but I feel that benefits of the new initialization scheme make it worth adopting, not just because of the embedded modules but also as a way of future-proofing if the old module initialization scheme becomes deprecated within Python. A possible way to restore consistency would be to introduce a similar macro for extension modules:

PYBIND11_MODULE(name)(py::module &m) { ... } // extension -- only one possible
PYBIND11_ADD_EMBEDDED_MODULE(name)(py::module &m) { ... } // embedded -- any number

Copy link
Member

@jagerman jagerman Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One positive change of this: it always seemed redundant to me to need to specify "test_cmake_build" (or whatever) twice: once in the macro, and once as a string in the py::module constructor.

Could we get rid of the necessity of the arguments from the macro by throwing (py::module &m) into the macro? While I appreciate it being explicit, I feel it's the sort of thing that is always going to be specified exactly as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, eliminating the name repetition is a nice bonus and it also removes a possible beginner's mistake of setting different names for the macro and module (which results in a not-so-obvious error).

As for the argument, a possible alternative is:

PYBIND11_ADD_EMBEDDED_MODULE(name, m) { ... }

But I personally prefer the explicitness of py::module &m.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the multi-phase initialization: this makes sense, and I agree that it would be desirable to support it (with the appropriate fallbacks for Python 2.7). I find the macro with double parentheses somewhat painful. How about the following?

PYBIND11_EMBEDDED_MODULE(name, py::module &m) { ... }
PYBIND11_MODULE(name, py::module &m) { ... }

Note also the removal of _ADD_ -- the fact that there can be multiple embedded modules is something that should be explained in the documentation, but I don't think we put the explanation into the macro names.

@wjakob
Copy link
Member

wjakob commented Apr 7, 2017

Hi @dean0x7d,

regarding the embedding support: this looks fantastic, +1 for the RAII approach. I added a few minor comments regarding syntax.

One thing that does bother me a bit is the addition of a framework like Catch/GTest. Maybe we can discuss this? My view is that we can test virtually all of pybind11 via pytest. For embedding it will be useful to have a few tests on the C++ side, but it does not really change the overall numbers. IMO all we need is a way of doing a few assertions tests in C++, so this seems somewhat overkill to me (plus, it adds a network dependency). Thoughts?

Thanks,
Wenzel

@jagerman
Copy link
Member

jagerman commented Apr 9, 2017

Re: Catch, I think if used, it should either be an embedded component (i.e. in the pybind repository somewhere--which I think should be redistributable with pybind given that it's licensed under a permissive boost license) or should be a set of tests that we enable when available, like we do with eigen, perhaps with a cmake message along the lines of "install catch to enable embedded interpreter testing". Then we can put the header download into the CI scripts (or for Debian by adding catch to the package dependencies) to enable the testing on the CI builds.

I'm definitely not a fan of cmake invoking a download, and am fairly sure downstream package redistributors are going to find that objectionable as well.

Edit: I'm personally in favour of the latter (downloading in the CI scripts), given the two above options.

@fuchsto
Copy link

fuchsto commented Apr 11, 2017

Just a brief user feedback: I've been made aware of this PR in the gitter channel and use the embedded interpreter in a toy project: https://github.com/fuchsto/dnants/blob/master/gos/src/client.cc
It's for students to play around with state machines, an ant colony simulation written in C++. The ants' behavior is implemented in Python as state transition function. It is called about (30fps x 50 ants x 3 colonies) = 4500 times per second and I didn't see any negative effect on FPS compared to native C++ code.
Briefly put, I am impressed.
However, performance dropped drastically when I used py::eval<py::eval_statements> instead of py::eval<py::eval_expr>. I don't know if this is to be expected, but it should be documented.
Same with the RAII-style py::scoped_interpreter guard. It's a good solution but it took me the better part of two hours for debugging because it's not documented and, if not used, just leads to a miraculous segfault.
But that's just nitpicking. The features implemented in this PR made pybind11 the tool I was looking for since good ol' 2007. Good stuff!

@dean0x7d
Copy link
Member Author

@wjakob said:
One thing that does bother me a bit is the addition of a framework like Catch/GTest. Maybe we can discuss this?

There are two things that can make use of a C++ testing framework: the interpreter-related APIs and the Python C++ API (types, functions, exceptions, etc.).

The interpreter APIs pretty much must be tested within an executable. A formal C++ test framework is optional, but I think it would be worth having. The current tests within this PR are only part of what needs to be tested. Additional tests will be needed for multiple embedded modules, subinterpreters, restarting the interpreter, and some additional utility functions. This can definitely be done with just having a simple fatal error when something goes wrong, but I dislike the idea of losing detailed failure reports and introspection offered by test frameworks. It would make diagnosing issues more difficult. Without a framework, it's possible to create some assertions macros (instead of a fatal error) and logging functionality, but that's just going in the direction of reinventing a test framework.

The other thing is the Python C++ API which doesn't need to be tested in an executable/C++ tests, but it would make things easier and reduce some of the testing boilerplate of going from C++ to Python just for an assertion. E.g. the following is a simple one-liner test in Catch:

REQUIRE_THROWS_WITH(py::eval("nonsense code ..."), Catch::Contains("invalid syntax"));

It would otherwise require creating dedicated functions on both the C++ and Python sides. This isn't a huge deal, but it would be nice to avoid some boilerplate when writing tests.

@jagerman
I'm definitely not a fan of cmake invoking a download, and am fairly sure downstream package redistributors are going to find that objectionable as well.

The default is to check if the headers are installed locally and only download if not found. I don't think any distributions will have an issue with that (CMake script download things all the time). I like the automated download because it makes it very convenient for first-time contributors to get started. Including the catch.hpp header is also a possibility, but I personally don't like it because that would make it a non-optional part of the repository, but it's only needed for development and not for simply using pybind11.

@dean0x7d
Copy link
Member Author

@fuchsto Thanks for the feedback. That's very good to hear.

Regarding performance, note that you can also avoid eval completely by importing the relevant function and calling it directly. Something like the following (based on your source file):

auto update_state = module.attr("update_state");
return update_state(current, neighbor_grid(...)).cast<T>;

It doesn't eliminate the Python function call overhead, but it does avoid the eval interpreter overhead.

Same with the RAII-style py::scoped_interpreter guard. It's a good solution but it took me the better part of two hours for debugging because it's not documented and, if not used, just leads to a miraculous segfault.

Yeah, sorry about that. Expanding the docs is definitely on the TODO list for this PR, but it's been on pause for a while since I've had some trouble finding free time lately. But that should improve this week and I'm hoping to finish it up soon.

@jagerman
Copy link
Member

I'm not opposed to the download functionality being in cmake, but I think it should be something the user explicitly asks for, sort of like we give the pip command for installing pytest rather than just invoking it. Something like: Catch not detected; install catch headers or use cmake -DDOWNLOAD_CATCH=1 to enable additional C++ tests.

@wjakob
Copy link
Member

wjakob commented Apr 12, 2017

-DCATCH=22 :)

@dean0x7d
Copy link
Member Author

dean0x7d commented Apr 24, 2017

As suggested, the header embedded.h has been renamed to embed.h and the CMake target pybind11::embedded -> pybind11::embed. PYBIND11_ADD_EMBEDDED_MODULE lost the _ADD_ and the extra parentheses. PYBIND11_MODULE is now available for extension modules with a matching syntax.

A full new documentation page has been added for embedding. It should cover everything from getting started to a few notes for advanced features.

As for subinterpreters and restarting, support and tests are included for all Python versions with a few caveats -- see the last two sections in the new docs page. Implementing PEP 489 would iron out the remaining issues, however, it would require a large rework of the internals structure but only benefit Python >= 3.5. Right now, I'm actually pretty happy with the all-Python-version solution which required only a small change to internals (changing the stored capsule from internals * to internals **) while the tradeoffs are not unreasonable:

  1. Restarting the interpreter is only possible with pybind11's functions and not with Py_Initialize / Py_Finalize (errors when re-initializing modules).

  2. Restarting the interpreter leaks a bit of memory because of the PyObjects in internals which may not be garbage collected. But this happens in Python in general even without pybind11. I did some quick tests by restarting the interpreter a few million times and the leak ends up being a few megabytes -- the presence of pybind11 makes minimal difference.

  3. Embedded modules are shared between the sub-interpreters instead of being fully independent.

Implementing PEP 489 would entail breaking up internals into module-specific chunks and attaching the data to the module state so that it can be ref-counted and cleaned up together with the module (instead of being global persistent data). But such a drastic change to internals doesn't seem worth it at this point.

Now, with PEP 489 out the window, the new module macro syntax isn't a prerequisite anymore. However, I would still advocate for the new syntax for future-proofing and dropping the module name duplication.

@dean0x7d dean0x7d changed the title WIP: Support for embedding the interpreter Support for embedding the interpreter Apr 24, 2017
@ibell
Copy link
Contributor

ibell commented Apr 29, 2017

Looking forward to this getting merged. I was totally impressed how easy it was to embed python in C++ thanks to pybind11. It'll be even easier with this PR merged. I'd be happy to update this example for the docs when this gets merged. Or feel free to grab and modify this practical example.

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <pybind11/stl.h>
#include <pybind11/eigen.h>
namespace py = pybind11;

#include <Eigen/Dense>

int main(){
    Py_Initialize();
    
    // Import numpy and matplotlib
    py::object np = py::module::import("numpy");
    py::module plt = py::module::import("matplotlib.pyplot");

    // Create a linearly-spaced vector in numpy
    auto x1 = np.attr("linspace")(0,10,300).attr("tolist")(); // Vector converted to list, though not necessary
    auto y1 = np.attr("sin")(x1);
    plt.attr("plot")(x1, y1, "o-");
    
    // Do the same exercise with Eigen VectorXd, except cos(x) instead of sin(x)
    Eigen::VectorXd x2 = Eigen::VectorXd::LinSpaced(300, 0, 10);
    // The Eigen::Ref passes a reference to an Eigen MatrixXd, which should avoid the copy
    // but a temporary numpy array is created
    auto y2 = np.attr("cos")(Eigen::Ref<Eigen::VectorXd>(x2));
    plt.attr("plot")(x2, y2, "^-");
    
    plt.attr("show")();

    return 0;
}

@jagerman
Copy link
Member

"but a temporary array is created" -> nope, no temporary data copy will be involved there. The intermediate object will just be a shell that points directly to the data stored in x2. (Technically, there is a temporary numpy object involved, but not one that owns its own data). You can even skip the Eigen::Ref entirely there, and just call np.attr("cos")(&x2); and you'll get the same (non-copying) behaviour.

@ibell
Copy link
Contributor

ibell commented Apr 29, 2017

Thanks @jagerman - I seem to recall that the direct reference wasn't working, which is why I reverted to the Eigen::Ref. I'll update my code, but in this case, its only a question of convenience.

@jagerman
Copy link
Member

There's nothing wrong with using an Eigen::Ref there--I was mainly pointing out that the comment in the Eigen part about needing a temporary isn't actually the case (no matter whether you use the Eigen::Ref or a pointer). (It's a different story if you were to do py::object o = py::cast(x) and then pass o into the function: you'd need to use py::return_value_policy::reference to avoid the copy there).

But this is all a bit of a diversion: I think the example itself is great.

@dean0x7d
Copy link
Member Author

dean0x7d commented May 8, 2017

Having used scoped_interpreter a bit more, it seems that making it reference-counted is overkill as well as more difficult to explain. The latest commit changes it to a move-only type. Independently creating a second scoped_interpreter now throws an exception (so does calling initialize_interpreter() after it's already running).

@aldanor
Copy link
Member

aldanor commented May 14, 2017

Minor comment re: the new macro (+1 on it, I actually had the exact same thing I've been using myself for a while):

PYBIND11_MODULE(example, py::module &m)

Is py::module& necessary here, would it ever be anything else really? Wonder if this could be simplified further to just this?

PYBIND11_MODULE(example, m)

// As a bonus, the minimal hello-world example becomes a tad shorter as it doesn't even have to reference the py:: namespace :)

@dean0x7d
Copy link
Member Author

@aldanor
Is py::module& necessary here, would it ever be anything else really? Wonder if this could be simplified further to just this?

Yeah, @jagerman raised the same point. With the initial syntax (PYBIND11_MODULE(example)(py::module &m)) I wanted to minimize the macro magic to just the name, with the second pair of parentheses being a regular function arg list with an explicit type. In hindsight, running away from macro magic only partially like that is probably useless. I'll change it to the simplified version PYBIND11_MODULE(example, m) and mention the type of m in the docs.

@dean0x7d
Copy link
Member Author

Last round of changes:

  • Simplified the PYBIND11_MODULE and PYBIND11_EMBEDDED_MODULE macros to omit py::module&.
  • Made Catch optional and off by default (but added cmake -DDOWNLOAD_CATCH=ON to both Travis and AppVeyor).
  • Limited Catch tests to just the interpreter and nothing else (removed the port of the eval tests from pytest).
  • Cleaned up git history for this PR.

I think that's pretty much done, as far as I see. Does anything else need to be addressed here?
Should the PYBIND11_PLUGIN -> PYBIND11_MODULE change perhaps be spun off into a separate PR?

@jagerman
Copy link
Member

This looks pretty good to me. This is really great work!

One minor thing (which needn't hold up merging this PR): How would you feel about adding a doc(const char *) method to py::module (or perhaps even to the object API)? One thing that isn't so nice in the PYBIND11_PLUGIN -> PYBIND11_MODULE macro is the docstring specification, i.e. changing:

PYBIND11_PLUGIN(pybind11_tests) {
    py::module m("pybind11_tests", "pybind testing plugin");

to:

PYBIND11_MODULE(pybind11_tests, m) {
    m.attr("__doc__") = "pybind11 test module";

which, though purely cosmetic, is a bit uglier, especially given that one typically doesn't set __doc__ directly in Python but rather uses a docstring. With a doc() method though, it would look quite a bit nicer:

PYBIND11_MODULE(pybind11_tests, m) {
    m.doc("pybind11 test module");

@aldanor
Copy link
Member

aldanor commented May 20, 2017

One minor thing (which needn't hold up merging this PR): How would you feel about adding a doc(const char *) method to py::module (or perhaps even to the object API)?

Good point (although m.attr("__doc__") is not that ugly, but I agree). Maybe class_ should also have the .doc() then, purely for consistency purposes?

@jagerman
Copy link
Member

It (along with everything else) would get it automatically if it went into object_api. But I think it's much less of an issue for class_ because that one is typically specified as a constructor argument.

@aldanor
Copy link
Member

aldanor commented May 20, 2017

I think it's much less of an issue for class_ because that one is typically specified as a constructor argument.

Yea, that's exactly what I meant by "purely for consistency purposes" :)

@wjakob
Copy link
Member

wjakob commented May 24, 2017

This looks great. If .attr("__doc__") is considered bothersome, I would propose something like the following object_api extension (untested), which would enable read and write access:

auto doc() { return attr("__doc__"); }

i.e.

PYBIND11_MODULE(pybind11_tests, m) {
    m.doc() = "pybind11 test module";
    //... contrived :)
    my_class_b.doc() = my_class_a.doc();
}

All targets provided by pybind11:

* pybind11::module - the existing target for creating extension modules
* pybind11::embed - new target for embedding the interpreter
* pybind11::pybind11 - common "base" target (headers only)
At this point, there is only a single test for interpreter basics.

Apart from embedding itself, having a C++ test framework will also
benefit the C++-side features by allowing them to be tested directly.
@dean0x7d
Copy link
Member Author

dean0x7d commented May 27, 2017

Adding a nicer shortcut for doc seems reasonable to me. Regarding PYBIND11_PLUGIN -> PYBIND11_MODULE, I forgot to rename a few occurrences in the documentation.

Since that isn't actually related to embedding and there's still a bit of work, I'll spin out the whole PYBIND11_PLUGIN -> PYBIND11_MODULE part into a new PR and I'll merge the embedding support after the CI tests are finished.

@dean0x7d dean0x7d merged commit 6d2411f into pybind:master May 28, 2017
@dean0x7d
Copy link
Member Author

Embedding support is merged now. For the module macro and __doc__ accessor, see #879.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants