Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyROOT] Update cppyy to a recent version #14507

Merged
merged 7 commits into from
Mar 19, 2024

Conversation

guitargeek
Copy link
Contributor

@guitargeek guitargeek commented Jan 31, 2024

Udate cppyy

Sister PR in roottest: root-project/roottest#1071

Summary

Synchronizes the CPyCppyy CPython extension and cppyy Python library with upstream to fix bugs, add features, and avoid duplicate maintenance efforts.

Behavior changes

No implicit conversion from fixed-sized char buffers to null-terminated string

If you have a char buffer with constant size, people might use it for different things. For example, to store null-terminated short strings in a TTree. Therefore, the current PyROOT converts such buffers to Python strings. However, that means it's impossible to get the full buffer if it contains zeros, which can be useful if the buffer doesn't contain a string but for example some status bytes.

Therefore, the used is not required to explicitly convert the buffer to a Python string with the as_string() method.

Demo:

import ROOT

ROOT.gInterpreter.Declare("""

struct Struct {
    char char_buffer[5] {};
};


void fill_char_buffer(Struct & st)
{
    std::string foo{"foo"};
    std::memcpy(st.char_buffer, foo.data(), foo.size());
}

""")
struct = ROOT.Struct()
ROOT.fill_char_buffer(struct)
char_buffer = struct.char_buffer

# With thew new cppyy, you get access to the lower level buffer instead:
print("struct.char_buffer            : ", char_buffer)

# However, you can turn the buffer into a string very easily with as_string():
print("struct.char_buffer.as_string(): ", char_buffer.as_string())

The output:

struct.char_buffer            :  <cppyy.LowLevelView object at 0x74c7a2682fb0>
struct.char_buffer.as_string():  foo

Associated GitHub issues

This will close the following GitHub issue:

Upstream contributions to cppyy in the context of this synchronization

Performance validation

To validate the performance, I ran the Python tests in roottest and the PyROOT pythonization tests and compared runtimes with and without the cppyy upgrade. The total runtime of these tests reduced by about 4 % from 208 s to 287 s, so the performance impact of this PR is marginal. The runtime comparison for each test can be found in cppyy_upgrade_test_runtimes.txt.
The worst observed performance penalty is 24 %. However, significant speedups are observed in some of the longer tests. For convenience, the 20 tests with the longest runtime are listed here:

                               title  no_cppyy_upgrade  cppyy_upgrade     ratio
        python-regression-regression             32.61          34.30  1.051825
       pyroot-pyz-rdataframe-asnumpy             14.30          14.61  1.021678
     pyroot-pyz-rdataframe-makenumpy             13.66           9.65  0.706442
            python-function-function             12.27          11.08  0.903015
                      python-cpp-cpp             12.03          11.26  0.935993
                 python-cpp-advanced             11.55          10.83  0.937662
 pyroot-pyz-rdataframe-histo-profile             11.24           7.37  0.655694
                  python-basic-basic             10.11           9.95  0.984174
                      python-stl-stl              8.35           7.87  0.942515
                  pyroot-pyz-rtensor              6.94           2.32  0.334294
                  python-ttree-ttree              6.15           5.70  0.926829
               python-basic-datatype              5.87           5.86  0.998296
                 python-pickle-write              5.56           5.56  1.000000
               python-basic-overload              5.29           5.30  1.001890
python-pythonizations-pythonizations              5.27           5.20  0.986717
      python-pythonizations-smartptr              5.04           4.89  0.970238
               python-basic-operator              4.79           4.74  0.989562
                    python-cpp-cpp11              4.72           4.69  0.993644
                python-memory-memory              4.62           4.48  0.969697
         python-regression-root_6023              4.61           4.44  0.963124

@guitargeek guitargeek self-assigned this Jan 31, 2024
@root-project root-project deleted a comment from phsft-bot Jan 31, 2024
@root-project root-project deleted a comment from phsft-bot Jan 31, 2024
@root-project root-project deleted a comment from phsft-bot Jan 31, 2024
@root-project root-project deleted a comment from phsft-bot Jan 31, 2024
core/base/inc/TString.h Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Jan 31, 2024

Test Results

    11 files      11 suites   2d 8h 31m 25s ⏱️
 2 602 tests  2 601 ✅ 0 💤 1 ❌
26 692 runs  26 691 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit 9c4418a.

♻️ This comment has been updated with latest results.

@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@guitargeek guitargeek force-pushed the CPyCppyy-1.12.16 branch 2 times, most recently from 05accc6 to 0151066 Compare February 1, 2024 14:57
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@root-project root-project deleted a comment from phsft-bot Feb 1, 2024
@guitargeek guitargeek force-pushed the CPyCppyy-1.12.16 branch 3 times, most recently from 6f98d6c to 55c810c Compare February 2, 2024 13:07
@guitargeek guitargeek changed the title [PyROOT] Update CPyCppyy to 1.12.16 [PyROOT] Update CPyCppyy to master Feb 2, 2024
@root-project root-project deleted a comment from phsft-bot Feb 5, 2024
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

@root-project root-project deleted a comment from phsft-bot Mar 15, 2024
@guitargeek
Copy link
Contributor Author

Hi @smuzaffar! This is a big upgrade, and testing it with CMSSW would give us more confidence, the ROOT tests are all green already. Can you give us a hand? Thanks!

@smuzaffar
Copy link
Contributor

@guitargeek , cmssw tests via cms-sw#205

@guitargeek
Copy link
Contributor Author

Thank you!

@dpiparo
Copy link
Member

dpiparo commented Mar 16, 2024

@smuzaffar , thanks also from me - appreciated!

@smuzaffar
Copy link
Contributor

FYI , cmssw tests passed cms-sw#205 (comment)

@guitargeek
Copy link
Contributor Author

This is great news! I'll stop making code changes to the PR now, so we keep the validated state. I'll just need to update a few more times for the commit history and bookkeeping of patches.

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on windows10/default.
See console output.

This reverts commit d5efd70.

We are patching the automatic conversion to Python strings back in, so
it's not necessary to Pythonize a `__str__` funciton implementing it in
C++.

Also, the `hasattr(foo, "___cpp__str")` caused a *huge* performance it
in some cases, because looking up a non-existing attribute in cppyy can
be quite expensive. All base classes are crawled too, and that invokes
the interpreter and string manipulation.
The added script will be used to synchronize `cppyy` and `CPyCppyy`,
applying some patches that are necessary for ROOT.
This was done with the `sync-upstream` script, introduced in the last
commit.
Other than the `cppyy` Python library and the `CPyCppyy` CPython
extension, the `cppyy-backend` can't easily be synchronized with
upstream. The reason is that it depends both on patches to cling and to
ROOT meta. There is no bookkeeping on the patches to ROOT meta which is
complicating things, and the patches might also interfere with other
ROOT functionality.

A possible synchronization of ROOT meta is also not worth the effor for
another reason: it will be replaced by libInterOp in the future in the
context of cppyy and PyROOT.

Furthermore, synchronizing the backend would not result in fixing any
further reported ROOT issues.

Therefore, only minimal changes were made to the `cppyy-backend` in the
cppyy upgrade.
The reference count in Python is usually quite fragile to implementation
changes, so it's not too surprising that some counts change with the
`CPyCppyy` upgrade.
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this incredible piece of work! It's such a huge step towards the right direction! I believe this first PR is ready to be merged to move forward, but note the following comment.

I strongly believe we should have a similar approach as with our LLVM fork. We need to have some "source of truth" that is a certain cppyy/CPyCPPyy tag from upstream and then a clear way to reach the status of our fork from there, i.e. a series of patches that can be applied without conflicts. This PR goes in that direction but doesn't implement it fully as the sync script refers to a repository outside of our organisation. Ideally we would have separate repositories (one for cppyy and one for CPyCppyy) that we can refer to

@@ -77,7 +77,7 @@ def fn1(x):

self.assertTrue(hasattr(fn1, "__cpp_wrapper__"))
self.assertTrue(type(fn1.__cpp_wrapper__) == str)
self.assertEqual(sys.getrefcount(fn1.__cpp_wrapper__), 2)
self.assertEqual(sys.getrefcount(fn1.__cpp_wrapper__), 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we even testing this? I believe it's not necessary and as your commit message rightly says, fragile. Let's just remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I'm not familiar with the tests.

Right now things are quite stable with only this little change of the reference value, but if you agree that it can be removed I'm happy to do this as soon as the test fails somewhere again!

@guitargeek
Copy link
Contributor Author

I strongly believe we should have a similar approach as with our LLVM fork. We need to have some "source of truth" that is a certain cppyy/CPyCPPyy tag from upstream and then a clear way to reach the status of our fork from there, i.e. a series of patches that can be applied without conflicts. This PR goes in that direction but doesn't implement it fully as the sync script refers to a repository outside of our organisation. Ideally we would have separate repositories (one for cppyy and one for CPyCppyy) that we can refer to

Thanks for raising this point. The situation will be improved in the next weeks, I'll try to get as many patches merged to upstream as possible. Then, based on how many differences are left, we can decide if we want to go with one (or multiple) separate repositories, or we stay with the patch files.

@guitargeek guitargeek merged commit 7b8e293 into root-project:master Mar 19, 2024
13 of 17 checks passed
@guitargeek guitargeek deleted the CPyCppyy-1.12.16 branch August 20, 2024 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants