New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PyROOT exp] Introduce MakeNumpyDataFrame to read numpy arrays with RDF #3669
Conversation
Starting build on |
Starting build on |
@phsft-bot build with |
Starting build on |
Build failed on ROOT-ubuntu18.04-i386/cxx14. Failing tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR is great. It allows to close the circle between cpp and py. I mark this as "Request Changes" because I think it requires some more discussion, especially the factory function part. Other than that, well done @stwunsch !!
@@ -96,3 +97,7 @@ def pythonize_rdataframe(klass, name): | |||
klass.AsNumpy = RDataFrameAsNumpy | |||
|
|||
return True | |||
|
|||
# Add MakeNumpyDataFrame feature as free function to the ROOT module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't this be a regular binding to a C++ factory function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the input is a PyObject*
and we need to process it with ROOT.AsRVec
internally, we need a free function in the module.
} | ||
|
||
protected: | ||
std::string AsString() { return "RVec data source"; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a better name can be found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, gonna fix this!
// Note that we have to return the object on the heap so that the interpreter | ||
// does not clean it up during shutdown and causes a double delete. | ||
template <typename... ColumnTypes> | ||
RDataFrame* MakeNumpyDataFrame(PyObject* pyRVecs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one would be an unicum: we do not write C++ code which depends on Python, unless this is a pythonisation. Is there any way we can exploit to write a proper cpp function and then use pyroot bindings to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to call a Py_DECREF
in the destructor of the data-source, which dictates the lifetime of a dataframe graph. I do not see any other way to make the refcount properly than having this custom datasource. However, the compliation (or dependecy) is at runtime since the source is jitted.
9fabc91
to
bcefa3d
Compare
Starting build on |
1 similar comment
Starting build on |
Build failed on mac1014/cxx17. Errors:
|
ce55f9e
to
154c6d5
Compare
Starting build on |
154c6d5
to
520c0bc
Compare
Starting build on |
520c0bc
to
86a0563
Compare
Starting build on |
86a0563
to
edd5b9b
Compare
Starting build on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks @stwunsch !
Build failed on ROOT-fedora27/noimt. Failing tests: |
edd5b9b
to
23102dd
Compare
Starting build on |
Build failed on windows10/default. |
This PR supersedes #3424.
The reference counting is greatly improved and the data is kept alive until the datasource dies, which gets delete at the end of the lifetime of the computational graph.
See here for the use-case:
The feature plays well along with the
RDataFrame.AsNumpy
feature:TODO:
NumyDataSource
MakeNumpyDataFrame.hxx
)? We should put it in a scope.