Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ToolPalette._layout_widgets() appears to recurse, causing a stack overflow #27

Closed
philipstarkey opened this issue Oct 30, 2019 · 8 comments · Fixed by #76
Closed

ToolPalette._layout_widgets() appears to recurse, causing a stack overflow #27

philipstarkey opened this issue Oct 30, 2019 · 8 comments · Fixed by #76
Labels
bug Something isn't working minor

Comments

@philipstarkey
Copy link
Member

Original report (archived issue) by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


A lab here at NIST got a "Python has stopped working" error upon starting BLACS today. Starting Python as python -X faulthander -m blacs revealed it to be a stack overflow in ToolPalette._layout_widgets(), line 348:

self.setMinimumSize(QSize(self.minimumSize().width(), total_height))

Adding a printline to inspect the dimensions of the QSize() object revealed that _layout_widgets() was being called a large number of times prior to the crash, with the dimensions alternating back and forth between two values.

I added a counter to print the recursion depth and confirm that the method is recursing, but I made a syntax error and BLACS started successfully (having caught the error), modifying its save file and widget geometries such that the stack overflow no longer occurred. So unfortunately I lost the ability to reproduce the problem, as it is sensitive to the widget geometries.

Just documenting what I found here. ToolPalette._layout_widgets() seems to be recursing and not converging on a fixed layout geometry that would break the cycle. If I see it again I will backup BLACS save file and connection table file to create a reliable reproducer of the problem.

Others have reported similar crashes before, but they have not been stack overflows, they have been segfaults that looked to be bugs in Qt. This one actually is plausibly our fault which means we may be able to fix it.

@philipstarkey
Copy link
Member Author

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


  • changed title from "ToolPalette._layout_widgets() appears to recurse" to "ToolPalette._layout_widgets() appears to recurse, causing a stack overflow"

@philipstarkey
Copy link
Member Author

Original comment by Philip Starkey (Bitbucket: philipstarkey, GitHub: philipstarkey).


This is a bit of a shot in the dark, but I suspect that there is a bug when calculating how many items fit in a row. This is likely due to incorrectly accounting for padding/borders/similar. I think such a error could lead to the behaviour you saw (but it's a a while since I touched that code)

Plausibly this issue could cause segfaults on some versions of Qt as Qt often doesn’t like doing things really fast (some of which may be fixed in later versions of Qt)

@philipstarkey
Copy link
Member Author

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


I suspect you're right, that the calculation could be a little bit off. But it would be good if the code wasn't sensitive to relatively minor errors or to changing padding values, so fixing the calculation itself might not be the best solution, unless it's totally bomb-proof and looks up all the padding values at run-time with no hard-coded values.

There are also things that can happen where a widget says it is one size, and on that basis a container widget decides it should show a scrollbar or something, then the child widget is updated with the new size of the container (minus the scrollbar width), it declares itself to be smaller, then the parent widget decides it doesn't need a scrollbar after all, repeat. In this case something needs to break the cycle. We might be able to add a "with signals disconnected, update size" or similar to just stop recursion and keep the most recently calculated size. If that size if "good enough" then we'll be happy, even if there is some hysteresis in the number of buttons per row whilst resizing tabs.

@philipstarkey philipstarkey added minor bug Something isn't working labels Apr 5, 2020
@zakv
Copy link
Contributor

zakv commented Nov 24, 2020

I ran into this bug today and managed to work around it in a way that kept the data in the blacs settings file, which is lost when following the suggestion from the mailing list. I opened the blacs settings file in HDFView, then went to front_panel -> _notebook_data and played with the value for window_height, in my case changing it from 732 to 1000. After saving and closing HDFView, I started blacs and it opened just fine. Furthermore, the rest of the front panel data, such as the tab locations, was retained.

I'm sure 1000 isn't a magic number for window height, so using that value won't always solve the issue. That said it still may be more convenient to try out a few different window height values than to delete the file and reconfigure the blacs gui from scratch.

@philipstarkey
Copy link
Member Author

@zakv Can you reproduce it with your configuration if you set the value back to 732?

Also, Do the widgets on one of the visible tabs (at that window size) just require a scroll bar? I'm wondering if it's due to a relayout swapping between needing a scroll bar and not, and perhaps just making the scrollbar always visible would be a sufficient fix?

@zakv
Copy link
Contributor

zakv commented Nov 24, 2020

Can you reproduce it with your configuration if you set the value back to 732?

Yep. I backed up that non-functional config file and was able to reproduce the bug by reverting to it. I also then edited it to set the window height to 1000, opened blacs (which worked), then closed blacs, then reset the window height back to 732 with HDFView, then blacs failed to open again.

Do the widgets on one of the visible tabs (at that window size) just require a scroll bar?

Yep. I don't know a good way to set the height to 732 pixels given that I can't get blacs to open with that setting, but opening it with the height set to 1000, then adjusting it to be ~3/4 it's original height gives this:
image

Also, not sure if this is relevant but the blacs window is partially off the left edge of the screen when I open it with that settings file. For reference, here is how it looks right after I open with with the height set to 1000:
image

@chrisjbillington
Copy link
Member

For what it's worth I've been sent (from one of the other users having this issue) a config file that may be able to reproduce it - haven't looked at it yet, but will give it a shot at some point. Can send it to you if you want to look into this Phil.

@zakv
Copy link
Contributor

zakv commented Feb 7, 2021

I've run into this issue a few times in the last several months and I recently started learning a bit about Qt, so I decided to take another look at it. My hope was to just count the recursion depth and implement Chris's "with signals disconnected, update size" idea when it got too deep. Unfortunately it looks like my initial naïve implementation of that approach won't work.

After uncommenting/adding some debugging print statements in toolpalette.py I tried opening blacs with the same settings file from my comments above. In my case at least, it turns out that the recursion worked differently than I expected. The resursing steps were the following, which occurred for one of the PCI-6713 tabs judging by the names of the tool palettes:

  1. ToolPalette.resizeEvent() is called for the analog output tool palette.
    • This calls _layout_widgets(), which changes the number of analog output columns from 3 to 2.
  2. ToolPalette.resizeEvent() is called for the port0 digital output tool palette.
    • This calls _layout_widgets(), which actually doesn't change the number of digital output columns.
  3. ToolPalette.resizeEvent() is called for the analog output tool palette
    • This calls _layout_widgets(), which changes the number of analog output columns back from 2 to 3.
  4. ToolPalette.resizeEvent() is called for the port0 digital output tool palette.
    • This calls _layout_widgets(), which actually doesn't change the number of digital output columns.
  5. Back to step 1.

Before this I had thought _layout_widgets() was calling itself more directly so I could count the recursion depth by incrementing a counter at the beginning of the method and decrementing it at the end. However it actually recurses by causing events to be added to a queue, and those events don't start running until the previous ones have finished. That means that the recursion counter always just switched between 1 and 0 without incrementing beyond that. The process went something like the following:

  1. Start handling a QResizeEvent.
  2. Increment the recursion counter to 1.
  3. Perform some operations that add QResizeEvent(s) to Qt's queue.
  4. Finish the call to resizeEvent()/_layout_widgets() which decrements the recursion counter back to 0.
  5. Start handling the next QResizeEvent, going back to step 1.

Given that, the only way I can think of to implement Chris's "with signals disconnected, update size" idea is to add custom events to the Qt event queue. Basically resizeEvent() or _layout_widgets() would start by putting a custom IgnoreResizeEvents event in the queue and end by putting a AcceptResizeEvents event in the queue. The class would be updated to effectively ignore all of the QResizeEvents between those custom events.

I should get a chance to try to implement that soon, though I haven't worked with the Qt event system before so it might take a decent amount of trial/error before I get something working.

zakv added a commit to zakv/labscript-utils that referenced this issue Feb 8, 2021
…vent() could trigger itself in an infinite resursion loop.
zakv added a commit to zakv/labscript-utils that referenced this issue Feb 8, 2021
…vent() could trigger itself in an infinite resursion loop.
dihm added a commit that referenced this issue Nov 30, 2021
commit a79d0e4
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Tue Nov 30 11:20:27 2021 -0500

    Update setup.cfg in preparation for release.

commit 811b43d
Merge: b8726ad a3be0db
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Nov 10 17:07:51 2021 -0500

    Merge pull request #81 from chrisjbillington/py36-path-bug

    Fix bug on Python3.6

commit b8726ad
Merge: d309a5f c4dd2a8
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Nov 10 17:06:47 2021 -0500

    Merge pull request #78 from zakv/setup-logging-jupyter

    Make setup_logging() more Jupyter-friendly.

commit d309a5f
Merge: d5dfea7 12db878
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Nov 10 17:06:02 2021 -0500

    Merge pull request #82 from dihm/zlog_port_profile

    Add default zlog port to default profile configuration file.

commit c4dd2a8
Author: Zak V <zakven@mit.edu>
Date:   Tue Nov 9 20:12:19 2021 -0500

    Removed duplicate call to sys.stdout.fileno() in setup_logging.setup_logging().

    Probably best practice to call it just once rather than calling it first to check if it raises an error then again later to actually use the value it returns.

commit 2e38107
Author: Zak V <zakven@mit.edu>
Date:   Tue Nov 9 20:04:37 2021 -0500

    Restructured try/except block in setup_logging.setup_logging() to avoid catching any UnsupportedOperation errors thrown by code other than sys.stdout.fileno().

commit 97e70c7
Author: Zak V <zakven@mit.edu>
Date:   Tue Nov 9 19:56:07 2021 -0500

    Added warning message when setup_logging.setup_logging() cannot send output to stdout and stderr.

commit b711923
Author: Zak V <zakven@mit.edu>
Date:   Mon Jun 7 17:48:41 2021 -0400

    setup_logging.py is now more Jupyter-friendly.

commit d5dfea7
Merge: 9b76dcb b149e19
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Nov 10 08:54:21 2021 -0500

    Merge pull request #74 from zakv/unitconversions-import-bugfix

    Fixed bug in unitconversions._All._import_all()

commit 12db878
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Mon Nov 8 10:20:28 2021 -0500

    Add default zlog port to default profile configuration file.

commit 9b76dcb
Merge: e13c992 e5cf3fb
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Fri Oct 22 17:53:17 2021 -0400

    Merge pull request #80 from dihm/default_labconfig_docs

    Add default labconfig file to docs

commit a3be0db
Author: chrisjbillington <chrisjbillington@gmail.com>
Date:   Tue Sep 21 13:45:13 2021 +1000

    Fix bug on Python3.6, where Path object not able to be treated as a
    string in ConfigParser.

commit e5cf3fb
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Tue Aug 31 15:29:45 2021 -0400

    Create a doc page that displays the current default labconfig.ini file,
    for easy reference.

commit e13c992
Merge: e74472c 461dc9f
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Fri Jul 16 15:12:37 2021 -0400

    Merge pull request #79 from dihm/labscript_utils-docs

    Update docs to match autogenerating templates used in other modules of labscript.

commit 461dc9f
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Thu Jul 15 10:26:41 2021 -0400

    Add docstring coverage test to build.

commit f90a4cc
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Thu Jul 15 10:26:26 2021 -0400

    Update sphinx version.

commit e45107d
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Mon Jul 12 12:57:18 2021 -0400

    Converting docs build to fully automated version based on recursive
    autosummary calls.

    This more closely resembles how docs are built in the other modules,
    but is slightly different in that submodules are also recursively generated.

commit 56d64f7
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Mon Jul 12 09:25:18 2021 -0400

    Update sphinx pins to same versions as rest of suite.

commit e74472c
Merge: f0999f5 c543328
Author: Chris Billington <chrisjbillington@gmail.com>
Date:   Tue Feb 9 08:54:16 2021 +1100

    Merge pull request #76 from zakv/fix-27

    Work around infinite recursion in ToolPalette.resizeEvent()

commit c543328
Author: Zak V <zakven@mit.edu>
Date:   Mon Feb 8 13:30:47 2021 -0500

    Removed some python 2 debugging print statements from toolpalette.py.

commit d9eb80f
Author: Zak V <zakven@mit.edu>
Date:   Sun Feb 7 23:44:47 2021 -0500

    Fixed #27; worked around bug where ToolPalette.resizeEvent() could trigger itself in an infinite resursion loop.

commit f0999f5
Merge: 8371585 7d14b65
Author: Chris Billington <chrisjbillington@gmail.com>
Date:   Mon Jan 4 13:46:54 2021 +1100

    Merge pull request #75 from chrisjbillington/master

    Do not use deprecated set-env command

commit 7d14b65
Author: chrisjbillington <chrisjbillington@gmail.com>
Date:   Mon Jan 4 13:40:14 2021 +1100

    Do not use deprecated set-env command

commit b149e19
Author: Zak V <zakven@mit.edu>
Date:   Tue Nov 24 18:06:11 2020 -0500

    Fixed bug in unitconversions._All._import_all() where self.__all__ wasn't always changed from None to an empty list before appending things to it.
dihm added a commit that referenced this issue Mar 21, 2022
commit 95965b8
Merge: a88582d f2f0ac7
Author: Chris Billington <chrisjbillington@gmail.com>
Date:   Thu Mar 17 08:02:09 2022 +1100

    Merge pull request #86 from dihm/replace_distutils

    Replace distutils

commit f2f0ac7
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Wed Mar 16 09:43:51 2022 -0400

    Replace `distutils.log` with basic `logging` in `setup.py`.

commit b6b7823
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Wed Mar 16 09:04:46 2022 -0400

    Replace `distutils.sysconfig` with stdlib `sysconfig` in `modulewatcher`.

commit 9840228
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Wed Mar 16 09:04:10 2022 -0400

    Replace `distutils.version.LooseVersion` with `packaging.version.Version` in `ls_zprocess`.

commit a88582d
Merge: 66c68ab 75a7df6
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Thu Mar 3 09:28:57 2022 -0500

    Merge pull request #84 from dihm/generic_freq_conv

    Add generic frequency conversion class

commit 75a7df6
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Feb 9 12:27:17 2022 -0500

    Add generic frequency conversion class that goes between standard SI prefixes from a base of Hz.

commit 66c68ab
Merge: 7a3d58c 27501a3
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Feb 9 10:47:36 2022 -0500

    Merge pull request #83 from dihm/fix_docs_build

    Prevent starting a zlock server when importing `h5_lock` on RTD.

commit 27501a3
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Feb 9 10:34:22 2022 -0500

    Prevent starting a zlock server when importing `h5_lock` on RTD.

commit 7a3d58c
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Feb 2 17:24:26 2022 -0500

    Pin `mistune` to fix doc builds on RTD.

commit 36c3c6a
Author: chrisjbillington <chrisjbillington@gmail.com>
Date:   Fri Jan 7 17:23:48 2022 +1100

    Ensure only ints passed to QFrame.move in splash

    Otherwise this is a TypeError in the latest PyQt5.

commit f4da12e
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Tue Nov 30 13:00:27 2021 -0500

    Triggering build of release branch by undoing unnecessary pin change for pyqtgraph.

commit a79d0e4
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Tue Nov 30 11:20:27 2021 -0500

    Update setup.cfg in preparation for release.

commit 811b43d
Merge: b8726ad a3be0db
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Nov 10 17:07:51 2021 -0500

    Merge pull request #81 from chrisjbillington/py36-path-bug

    Fix bug on Python3.6

commit b8726ad
Merge: d309a5f c4dd2a8
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Nov 10 17:06:47 2021 -0500

    Merge pull request #78 from zakv/setup-logging-jupyter

    Make setup_logging() more Jupyter-friendly.

commit d309a5f
Merge: d5dfea7 12db878
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Nov 10 17:06:02 2021 -0500

    Merge pull request #82 from dihm/zlog_port_profile

    Add default zlog port to default profile configuration file.

commit c4dd2a8
Author: Zak V <zakven@mit.edu>
Date:   Tue Nov 9 20:12:19 2021 -0500

    Removed duplicate call to sys.stdout.fileno() in setup_logging.setup_logging().

    Probably best practice to call it just once rather than calling it first to check if it raises an error then again later to actually use the value it returns.

commit 2e38107
Author: Zak V <zakven@mit.edu>
Date:   Tue Nov 9 20:04:37 2021 -0500

    Restructured try/except block in setup_logging.setup_logging() to avoid catching any UnsupportedOperation errors thrown by code other than sys.stdout.fileno().

commit 97e70c7
Author: Zak V <zakven@mit.edu>
Date:   Tue Nov 9 19:56:07 2021 -0500

    Added warning message when setup_logging.setup_logging() cannot send output to stdout and stderr.

commit b711923
Author: Zak V <zakven@mit.edu>
Date:   Mon Jun 7 17:48:41 2021 -0400

    setup_logging.py is now more Jupyter-friendly.

commit d5dfea7
Merge: 9b76dcb b149e19
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Wed Nov 10 08:54:21 2021 -0500

    Merge pull request #74 from zakv/unitconversions-import-bugfix

    Fixed bug in unitconversions._All._import_all()

commit 12db878
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Mon Nov 8 10:20:28 2021 -0500

    Add default zlog port to default profile configuration file.

commit 9b76dcb
Merge: e13c992 e5cf3fb
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Fri Oct 22 17:53:17 2021 -0400

    Merge pull request #80 from dihm/default_labconfig_docs

    Add default labconfig file to docs

commit a3be0db
Author: chrisjbillington <chrisjbillington@gmail.com>
Date:   Tue Sep 21 13:45:13 2021 +1000

    Fix bug on Python3.6, where Path object not able to be treated as a
    string in ConfigParser.

commit e5cf3fb
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Tue Aug 31 15:29:45 2021 -0400

    Create a doc page that displays the current default labconfig.ini file,
    for easy reference.

commit e13c992
Merge: e74472c 461dc9f
Author: David Meyer <dihm@users.noreply.github.com>
Date:   Fri Jul 16 15:12:37 2021 -0400

    Merge pull request #79 from dihm/labscript_utils-docs

    Update docs to match autogenerating templates used in other modules of labscript.

commit 461dc9f
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Thu Jul 15 10:26:41 2021 -0400

    Add docstring coverage test to build.

commit f90a4cc
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Thu Jul 15 10:26:26 2021 -0400

    Update sphinx version.

commit e45107d
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Mon Jul 12 12:57:18 2021 -0400

    Converting docs build to fully automated version based on recursive
    autosummary calls.

    This more closely resembles how docs are built in the other modules,
    but is slightly different in that submodules are also recursively generated.

commit 56d64f7
Author: David Meyer <dihm.meyer@gmail.com>
Date:   Mon Jul 12 09:25:18 2021 -0400

    Update sphinx pins to same versions as rest of suite.

commit e74472c
Merge: f0999f5 c543328
Author: Chris Billington <chrisjbillington@gmail.com>
Date:   Tue Feb 9 08:54:16 2021 +1100

    Merge pull request #76 from zakv/fix-27

    Work around infinite recursion in ToolPalette.resizeEvent()

commit c543328
Author: Zak V <zakven@mit.edu>
Date:   Mon Feb 8 13:30:47 2021 -0500

    Removed some python 2 debugging print statements from toolpalette.py.

commit d9eb80f
Author: Zak V <zakven@mit.edu>
Date:   Sun Feb 7 23:44:47 2021 -0500

    Fixed #27; worked around bug where ToolPalette.resizeEvent() could trigger itself in an infinite resursion loop.

commit f0999f5
Merge: 8371585 7d14b65
Author: Chris Billington <chrisjbillington@gmail.com>
Date:   Mon Jan 4 13:46:54 2021 +1100

    Merge pull request #75 from chrisjbillington/master

    Do not use deprecated set-env command

commit 7d14b65
Author: chrisjbillington <chrisjbillington@gmail.com>
Date:   Mon Jan 4 13:40:14 2021 +1100

    Do not use deprecated set-env command

commit b149e19
Author: Zak V <zakven@mit.edu>
Date:   Tue Nov 24 18:06:11 2020 -0500

    Fixed bug in unitconversions._All._import_all() where self.__all__ wasn't always changed from None to an empty list before appending things to it.
Loki27182 pushed a commit to Loki27182/labscript-utils that referenced this issue Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working minor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants