Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_gdb fails on Python 3.6 when built with LTO+PGO #74530

Closed
vstinner opened this issue May 11, 2017 · 29 comments
Closed

test_gdb fails on Python 3.6 when built with LTO+PGO #74530

vstinner opened this issue May 11, 2017 · 29 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life build The build process and cross-build tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

BPO 30345
Nosy @pitrou, @vstinner, @jkloth, @mcepl, @encukou, @methane, @koobs, @stratakis, @Dormouse759, @miss-islington
PRs
  • [3.6] bpo-30345: Update test_gdb.py and python-gdb.py from master #1549
  • bpo-30345: Add -g to LDFLAGS to ease debug #7709
  • [3.7] bpo-30345: Add -g to LDFLAGS for LTO (GH-7709) #7824
  • [2.7] bpo-30345: Add -g to LDFLAGS for LTO (GH-7709) #7825
  • [3.6] bpo-30345: Add -g to LDFLAGS for LTO (GH-7709) #7826
  • bpo-30345: travis: use -Og cflags with --with-pydebug #14423
  • [3.8] bpo-30345: travis: use -Og with --with-pydebug (GH-14423) #14427
  • Files
  • build.log
  • root.log
  • pgo-lto-gdb-errors-build
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-06-22.07:14:38.565>
    created_at = <Date 2017-05-11.16:16:32.838>
    labels = ['3.7', '3.8', 'build', 'tests']
    title = 'test_gdb fails on Python 3.6 when built with LTO+PGO'
    updated_at = <Date 2019-06-28.16:19:24.809>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-06-28.16:19:24.809>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-06-22.07:14:38.565>
    closer = 'vstinner'
    components = ['Build', 'Tests']
    creation = <Date 2017-05-11.16:16:32.838>
    creator = 'vstinner'
    dependencies = []
    files = ['46856', '46857', '46858']
    hgrepos = []
    issue_num = 30345
    keywords = ['patch']
    message_count = 29.0
    messages = ['293503', '293505', '293506', '293507', '293508', '293516', '293582', '295299', '295339', '295341', '295348', '319614', '319616', '319618', '319621', '319623', '319625', '319979', '319984', '319985', '320076', '320213', '320214', '320215', '320216', '346759', '346760', '346762', '346831']
    nosy_count = 11.0
    nosy_names = ['pitrou', 'vstinner', 'jkloth', 'mcepl', 'petr.viktorin', 'methane', 'koobs', 'cstratak', 'Dormouse759', 'mi', 'miss-islington']
    pr_nums = ['1549', '7709', '7824', '7825', '7826', '14423', '14427']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue30345'
    versions = ['Python 2.7', 'Python 3.6', 'Python 3.7', 'Python 3.8']

    @vstinner
    Copy link
    Member Author

    cstratak reported the following test failure on Fedora 24 when building Python 3.6 with LTO + PGO:

    ======================================================================
    FAIL: test_threads (test.test_gdb.PyBtTests)
    Verify that "py-bt" indicates threads that are waiting for the GIL
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/builddir/build/BUILD/Python-3.6.1/Lib/test/test_gdb.py", line 783, in test_threads
        cmds_after_breakpoint=['thread apply all py-bt'])
      File "/builddir/build/BUILD/Python-3.6.1/Lib/test/test_gdb.py", line 218, in get_stack_trace
        self.assertEqual(unexpected_errlines, [])
    AssertionError: Lists differ: ["Python Exception <class 'ValueError'> Va[95 chars]nd."] != []
    First list contains 2 additional elements.
    First extra element 0:
    "Python Exception <class 'ValueError'> Variable 'func_obj' not found.: "
    + []
    - ["Python Exception <class 'ValueError'> Variable 'func_obj' not found.: ",
    -  "Error occurred in Python command: Variable 'func_obj' not found."]

    @vstinner vstinner added build The build process and cross-build tests Tests in the Lib/test dir labels May 11, 2017
    @stratakis
    Copy link
    Mannequin

    stratakis mannequin commented May 11, 2017

    Full build log

    @stratakis
    Copy link
    Mannequin

    stratakis mannequin commented May 11, 2017

    All the dependencies dragged.

    gdb is of version 7.11. The failures do not happen with gdb 7.12 (which exists in later Fedora releases).

    @vstinner
    Copy link
    Member Author

    I created #1549 to update test_gdb.py and python-gdb.py in Python 3.6. I don't know if it will fix the issus, but it shouldn't hurt :-)

    @stratakis
    Copy link
    Mannequin

    stratakis mannequin commented May 11, 2017

    Note: test_gdb is skipped on later Fedora's actually (possibly due to gdb package no being dragged at the minimal buildroot) so the issue might still be there, so the gdb version might have no effect on that. Will investigate further.

    @stratakis
    Copy link
    Mannequin

    stratakis mannequin commented May 11, 2017

    So the issue wasn't restricted to a specific gdb version or distro release, as due to some issues dependency issues the gdb binary wasn't pulled in the buildroot which makes test_gdb to get skipped.

    So I was able to reproduce it on my system by installing gdb (version 7.12.1), compiling python 3.6 from sources with
    --enable-optimizations and --with-lto flags enabled and running 'make test'

    Also applied the relevant PR however it didn't fix the issue.

    Attaching the full log of 'make test'

    @vstinner
    Copy link
    Member Author

    New changeset d05f7fd by Victor Stinner in branch '3.6':
    [3.6] bpo-30345: Update test_gdb.py and python-gdb.py from master (bpo-1549)
    d05f7fd

    @mi
    Copy link
    Mannequin

    mi mannequin commented Jun 6, 2017

    I rebuilt my python-3.6.1 with the path (d05f7fd) and tried debugging again -- same problem:

    % gdb7121 /opt/bin/python3.6
    ...
    (gdb) r SCRIPT.py
    Thread 1 received signal SIGSEGV, Segmentation fault.
    ...
    (gdb) py-bt
    Python Exception <type 'exceptions.ValueError'> Variable 'func_obj' not found.:
    Error occurred in Python command: Variable 'func_obj' not found.

    @mi
    Copy link
    Mannequin

    mi mannequin commented Jun 7, 2017

    The actual stack, which I'm trying to debug, begins like this:

    #0 0xbfbfd34e in ?? ()
    #1 0x2a9ec81e in ?? () from /opt/lib/qt5/libQt5WebKit.so.5
    #2 0x2acf0efe in ?? () from /opt/lib/qt5/libQt5WebKit.so.5
    #3 0x2acd8b74 in ?? () from /opt/lib/qt5/libQt5WebKit.so.5
    #4 0x2acd5d60 in ?? () from /opt/lib/qt5/libQt5WebKit.so.5
    #5 0x2acd87ae in ?? () from /opt/lib/qt5/libQt5WebKit.so.5
    #6 0x2a9fe2e3 in QWebFrameAdapter::load(QNetworkRequest const&, QNetworkAccessManager::Operation, QByteArray const&) () from /opt/lib/qt5/libQt5WebKit.so.5
    #7 0x2d7a18dd in QWebFrame::setUrl(QUrl const&) () from /opt/lib/qt5/libQt5WebKitWidgets.so.5
    #8 0x2d7ad5eb in QWebView::setUrl(QUrl const&) () from /opt/lib/qt5/libQt5WebKitWidgets.so.5
    #9 0x2d75efd4 in meth_QWebView_setUrl(_object*, _object*) ()
    from /opt/lib/python3.6/site-packages/PyQt5/QtWebKitWidgets.so
    #10 0x28125151 in _PyCFunction_FastCallDict () from /opt/lib/libpython3.6m.so.1.0
    #11 0x28125326 in _PyCFunction_FastCallKeywords () from /opt/lib/libpython3.6m.so.1.0
    #12 0x2819a458 in ?? () from /opt/lib/libpython3.6m.so.1.0
    #13 0x28193ab2 in _PyEval_EvalFrameDefault () from /opt/lib/libpython3.6m.so.1.0
    #14 0x2819b790 in ?? () from /opt/lib/libpython3.6m.so.1.0
    #15 0x2819a425 in ?? () from /opt/lib/libpython3.6m.so.1.0
    #16 0x28193ab2 in _PyEval_EvalFrameDefault () from /opt/lib/libpython3.6m.so.1.0
    [...]

    Maybe, it is "too deep" into the native (not Python) code for the feature to work?

    @jkloth
    Copy link
    Contributor

    jkloth commented Jun 7, 2017

    It seems that commit (c525723) changed the parameter name in the definition of _PyCFunction_FastCallDict(). I believe that changing 'func_obj' to just 'func' should fix it (in Tools/gdb/libpython.py).

    @mi
    Copy link
    Mannequin

    mi mannequin commented Jun 7, 2017

    So, I tried the modified patch (see http://aldan.algebra.com/~mi/tmp/patch-issue30345) -- and now I simply get a different variable name in the error-message:

    (gdb) py-bt
    Python Exception <type 'exceptions.ValueError'> Variable 'func' not found.:
    Error occurred in Python command: Variable 'func' not found.

    However, the older version of the patch only referenced "func_obj" in test_gdb.py -- not in libpython.py -- so I may have misunderstood Jeremy's suggestion entirely...

    @Dormouse759
    Copy link
    Mannequin

    Dormouse759 mannequin commented Jun 15, 2018

    LTO may break the debug symbols and make GDB unusable.
    There is an option, that fixes the issue: to use a -g switch in link flags.
    Note that this slows loading of the debug symbols significantly.

    I suggest these options as possible approaches:

    1. make the configure script include -g in LDFLAGS when --enable-optimizations and --with-lto are used

    2. same as 1), but only when --with-pydebug is also used.

    3. document this problem and make the user aware that this possible fix (-g in link flags) exists

    @vstinner
    Copy link
    Member Author

    • ["Python Exception <class 'ValueError'> Variable 'func_obj' not found.: ",

    bpo-32962: My commit 019d33b "python-gdb catchs ValueError on read_var()" (PR 7692) catches this ValueError.

    @Dormouse759
    Copy link
    Mannequin

    Dormouse759 mannequin commented Jun 15, 2018

    Yes, but that is not a fix really in this case.
    While it makes the test pass because it 'correctly' prints out unknown objects, it makes no real difference when actually debugging. The -g switch at link time makes the debug symbols readable and user is able to debug just as usual.

    @vstinner
    Copy link
    Member Author

    I tested on the current master:

    git clean -fdx
    ./configure --with-lto --enable-optimizations
    sed -i -e 's/^PROFILE_TASK=.*/PROFILE_TASK=-c pass/' Makefile
    make 2>&1|tee log

    Python is compiled twice:

    • (1) gcc -DNDEBUG -g -O3 -flto -fprofile-generate (...)
    • (2) gcc -DNDEBUG -g -O3 -flto -fprofile-use (...)

    I see -g in both compilation steps.

    It seems like debug symbols are still here:

    vstinner@apu$ file ./python
    ./python: ELF 64-bit LSB executable, x86-64, (...), with debug_info, not stripped

    But I confirm that test_gdb fails when using LTO+PGO.

    gdb seems to be to read any C function argument:

    $ gdb -args  ./python Lib/test/gdb_sample.py
    (gdb) b builtin_id
    (gdb) run
    Breakpoint 1, 0x0000000000518da0 in builtin_id ()
    (gdb) py-bt
    Traceback (most recent call first):
      (unable to read python frame information)
      (unable to read python frame information)
      (unable to read python frame information)
      (unable to read python frame information)

    @Dormouse759
    Copy link
    Mannequin

    Dormouse759 mannequin commented Jun 15, 2018

    Those -g switches you see there are during compile-time.
    For this to work, you need to enable it also during link/time:
    ./configure --enable-optimizations --with-lto LDFLAGS="-g"

    Except for py-bt, you should also try bt. With this link flag enabled, I can observe significant slowdown on my machine during the backtrace when using bt command (At least when I let the PGO do all the profiling, when compiled with the sed edit you posted here, I observe none).

    @vstinner
    Copy link
    Member Author

    Except for py-bt, you should also try bt.

    Oh. Using PGO+LTO but without LDFLAGS=-g, bt only shows me function names: all arguments are missing.

    I tested with LDFLAGS=-g: py-bt and bt work as expected, and test_gdb pass.

    I created PR 7709 to always compile Python with LDFLAGS=-g.

    @vstinner
    Copy link
    Member Author

    Very interesting article about PGO and LTO changes in GCC:
    https://hubicka.blogspot.com/2018/06/gcc-8-link-time-and-interprocedural.html

    See "Early debug info" paragraph.

    @vstinner
    Copy link
    Member Author

    New changeset 06fe77a by Victor Stinner in branch 'master':
    bpo-30345: Add -g to LDFLAGS for LTO (GH-7709)
    06fe77a

    @vstinner
    Copy link
    Member Author

    Ok, I pushed a change to the master branch. Now the question is if Python 2.7, 3.6 and 3.7 should be fixed as well?

    I will wait at least one day to see if buildbots are happy.

    @vstinner
    Copy link
    Member Author

    I created backports to 2.7, 3.6 and 3.7 branches: do you see any reason to not fix python-gdb.py in these branches? (Any reason to not add -g to $LTOFLAGS?)

    @vstinner
    Copy link
    Member Author

    New changeset 1bb9dd3 by Victor Stinner (Miss Islington (bot)) in branch '3.7':
    bpo-30345: Add -g to LDFLAGS for LTO (GH-7709) (GH-7824)
    1bb9dd3

    @vstinner
    Copy link
    Member Author

    New changeset 7839288 by Victor Stinner in branch '3.6':
    bpo-30345: Add -g to LDFLAGS for LTO (GH-7709) (GH-7826)
    7839288

    @vstinner
    Copy link
    Member Author

    New changeset 319cfb5 by Victor Stinner in branch '2.7':
    bpo-30345: Add -g to LDFLAGS for LTO (GH-7709) (GH-7825)
    319cfb5

    @vstinner
    Copy link
    Member Author

    I created backports to 2.7, 3.6 and 3.7 branches: do you see any reason to not fix python-gdb.py in these branches? (Any reason to not add -g to $LTOFLAGS?)

    Honestly, the risk is very low: only "./configure --with-lto" is impacted and the addition of -g is protected by $ac_cv_prog_cc_g in configure. The -g flag just asks to copy debug symbols, it should not impact compiler performances.
    0
    Anyway, if something goes wrong, obviously we can revert the change and see for a different fix.

    Thanks cstratak for the bug report, and thanks Marcel Plch for the proposed fix (it works well)!

    @vstinner vstinner added 3.7 (EOL) end of life 3.8 (EOL) end of life labels Jun 22, 2018
    @methane
    Copy link
    Member

    methane commented Jun 27, 2019

    New changeset 21cfae1 by Inada Naoki in branch 'master':
    bpo-30345: travis: use -Og with --with-pydebug (GH-14423)
    21cfae1

    @methane
    Copy link
    Member

    methane commented Jun 27, 2019

    I think test_gdb is useful for 3.8 branch because some PEP-590 relating changes will be backported to it.
    But I don't know how test_gdb is useful for older branches.

    @miss-islington
    Copy link
    Contributor

    New changeset 60f24b2 by Miss Islington (bot) in branch '3.8':
    bpo-30345: travis: use -Og with --with-pydebug (GH-14423)
    60f24b2

    @vstinner
    Copy link
    Member Author

    I'm fine with relying on buildbots for test_gdb on Python 3.7 and older. Thanks for the fixes.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life build The build process and cross-build tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants