Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyROOT exp] Introduce MakeNumpyDataFrame to read numpy arrays with RDF #3669

Merged
merged 1 commit into from Apr 17, 2019

Conversation

stwunsch
Copy link
Contributor

@stwunsch stwunsch commented Apr 10, 2019

This PR supersedes #3424.

The reference counting is greatly improved and the data is kept alive until the datasource dies, which gets delete at the end of the lifetime of the computational graph.

See here for the use-case:

import ROOT
import numpy

data = {
    "x": numpy.array([1, 2, 3]),
    "y": numpy.array([4, 5, 6])
}

df = ROOT.ROOT.RDF.MakeNumpyDataFrame(data)
df = df.Define("z", "x + y")

print(df.Mean("z").GetValue()) # Returns 7.0

The feature plays well along with the RDataFrame.AsNumpy feature:

import ROOT

df = ROOT.ROOT.RDataFrame(10).Define("x", "(int)rdfentry_")
data = df.AsNumpy()

df2 = ROOT.ROOT.RDF.MakeNumpyDataFrame(data)
df2.Snapshot("tree", "file.root")

TODO:

  • Figure out how to install the header needed for the NumyDataSource
  • How to call the header (current name MakeNumpyDataFrame.hxx)? We should put it in a scope.

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos7-multicore/default, ROOT-fedora27/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, ROOT-ubuntu18.04-i386/cxx14, mac1014/cxx17, windows10/default
How to customize builds

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos7-multicore/default, ROOT-fedora27/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, ROOT-ubuntu18.04-i386/cxx14, mac1014/cxx17, windows10/default
How to customize builds

@stwunsch
Copy link
Contributor Author

@phsft-bot build with -Dpyroot_experimental=ON

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos7-multicore/default, ROOT-fedora27/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, ROOT-ubuntu18.04-i386/cxx14, mac1014/cxx17, windows10/default
How to customize builds

Copy link
Member

@dpiparo dpiparo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is great. It allows to close the circle between cpp and py. I mark this as "Request Changes" because I think it requires some more discussion, especially the factory function part. Other than that, well done @stwunsch !!

@@ -96,3 +97,7 @@ def pythonize_rdataframe(klass, name):
klass.AsNumpy = RDataFrameAsNumpy

return True

# Add MakeNumpyDataFrame feature as free function to the ROOT module
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't this be a regular binding to a C++ factory function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the input is a PyObject* and we need to process it with ROOT.AsRVec internally, we need a free function in the module.

}

protected:
std::string AsString() { return "RVec data source"; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a better name can be found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, gonna fix this!

// Note that we have to return the object on the heap so that the interpreter
// does not clean it up during shutdown and causes a double delete.
template <typename... ColumnTypes>
RDataFrame* MakeNumpyDataFrame(PyObject* pyRVecs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one would be an unicum: we do not write C++ code which depends on Python, unless this is a pythonisation. Is there any way we can exploit to write a proper cpp function and then use pyroot bindings to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to call a Py_DECREF in the destructor of the data-source, which dictates the lifetime of a dataframe graph. I do not see any other way to make the refcount properly than having this custom datasource. However, the compliation (or dependecy) is at runtime since the source is jitted.

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos7-multicore/default, ROOT-fedora27/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, ROOT-ubuntu18.04-i386/cxx14, mac1014/cxx17, windows10/default
How to customize builds

1 similar comment
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos7-multicore/default, ROOT-fedora27/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, ROOT-ubuntu18.04-i386/cxx14, mac1014/cxx17, windows10/default
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on mac1014/cxx17.
See console output.

Errors:

  • CMake Error at /build/jenkins/workspace/root-pullrequests-build_2/build/CMakeFiles/CMakeTmp/CMakeLists.txt:15 (add_executable):
  • CMake Error at /build/jenkins/workspace/root-pullrequests-build_2/build/CMakeFiles/CMakeTmp/CMakeLists.txt:15 (add_executable):
  • CMake Error at /usr/local/Cellar/cmake/3.13.4/share/cmake/Modules/CheckSymbolExists.cmake:90 (try_compile):
  • CMake Error at /build/jenkins/workspace/root-pullrequests-build_2/rootspi/jenkins/root-build.cmake:844 (message):

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos7-multicore/default, ROOT-fedora27/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, ROOT-ubuntu18.04-i386/cxx14, mac1014/cxx17, windows10/default
How to customize builds

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos7-multicore/default, ROOT-fedora27/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, mac1014/cxx17, windows10/default
How to customize builds

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos7-multicore/default, ROOT-fedora27/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, mac1014/cxx17, windows10/default
How to customize builds

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos7-multicore/default, ROOT-fedora27/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, mac1014/cxx17, windows10/default
How to customize builds

Copy link
Contributor

@etejedor etejedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @stwunsch !

@phsft-bot
Copy link
Collaborator

Build failed on ROOT-fedora27/noimt.
See console output.

Failing tests:

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos7-multicore/default, ROOT-fedora27/noimt, ROOT-fedora29/python3, ROOT-ubuntu16/rtcxxmod, mac1014/cxx17, windows10/default
How to customize builds

@stwunsch stwunsch merged commit 989619c into root-project:master Apr 17, 2019
@phsft-bot
Copy link
Collaborator

Build failed on windows10/default.
See console output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants