Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grc: Fix cairo assertion failure by not storing reference to context #6352

Merged
merged 2 commits into from Nov 17, 2022

Conversation

ryanvolz
Copy link
Contributor

@ryanvolz ryanvolz commented Nov 10, 2022

Description

The connection flowgraph element previously kept a reference to the cairo context passed to the draw function in order to be able to use cairo's in_stroke function to determine if the mouse cursor was on the path of the curved connection line. As it turns out, this is dangerous because GTK is constantly destroying and creating new cairo contexts and surfaces.

This avoids keeping a reference to the cairo context by initializing a local cairo context with the connection class for the sole purpose of storing the curved path and calculating in_stroke. The local context and surface can be very basic because not much is needed for in_stroke, and the context can persist and just have its path replaced when it needs to be updated.

On Windows, resizing the GRC window (and particularly the flowgraph canvas) to a larger size than its initial size would cause the underlying cairo surface to be destroyed and a new one created. The cairo context stored for the curved connection line had a reference to that destroyed surface, and closing that context (upon replacing it with a new one) would attempt to decrement the reference count on the surface. This would produce an assertion failure that crashed GRC stating Assertion failed: CAIRO_REFERENCE_COUNT_HAS_REFERENCE (&surface->ref_count).

On macOS with the Quartz backend, a related crash and assertion error would manifest when connecting blocks. I only figured out that these two were essentially the same bug when I saw that my fix to the Windows crash worked around the same piece of code as an older patch carried by MacPorts and, up until recently until it was dropped because of another bug, the conda package. Considering the other bug that that patch triggered with the conda-forge cairo (which has since been fixed), I think this approach is lighter weight and has less potential for future problems.

Related Issue

Fixes #2726.
Fixes #2938.
Fixes #5431.
Fixes #5734.

Which blocks/areas does this affect?

GRC

Testing Done

I found a reliable way to trigger #2938 on Windows, and with this fix I don't see any crashes. Selecting connection lines still works as well.

Checklist

The connection flowgraph element previously kept a reference to the
cairo context passed to the draw function in order to be able to use
cairo's `in_stroke` function to determine if the mouse cursor was on the
path of the curved connection line. As it turns out, this is dangerous
because GTK is constantly destroying and creating new cairo contexts
and surfaces.

This avoids keeping a reference to the cairo context by initializing a
local cairo context with the connection class for the sole purpose of
storing the curved path and calculating `in_stroke`. The local context
and surface can be very basic because not much is needed for
`in_stroke`, and the context can persist and just have its path replaced
when it needs to be updated.

On Windows, resizing the GRC window (and particularly the flowgraph
canvas) to a larger size than its initial size would cause the
underlying cairo surface to be destroyed and a new one created. The
cairo context stored for the curved connection line had a reference to
that destroyed surface, and closing that context (upon replacing it with
a new one) would attempt decrement the reference count on the surface.
This would produce an assertion failure that crashed GRC stating
`Assertion failed: CAIRO_REFERENCE_COUNT_HAS_REFERENCE
(&surface->ref_count)`.

On macOS with the Quartz backend, a related crash and assertion error
would manifest when connecting blocks.

Signed-off-by: Ryan Volz <ryan.volz@gmail.com>
All this did was request a larger size for the cairo surface of the
DrawingArea on which this is set, which very roughly worked around the
cairo context reference bug for users with high DPI displays. Now that
this bug is fixed by the previous commit, this is not necessary.

It's important to note that GTK takes care of the DPI scaling for high
DPI displays behind the scenes, so this doesn't even help to scale the
flowgraph properly. If there are further bugs with scaling, don't look
here.

Signed-off-by: Ryan Volz <ryan.volz@gmail.com>
@ryanvolz
Copy link
Contributor Author

I now also think that #5111 was a partial, circumstantial workaround for the same bug that this addresses, so I've added a commit to undo that one. Now that I understand better how cairo and GTK work, it's clear that it wasn't fixing any scaling issues and only had the effect of making the flowgraph canvas larger on high DPI displays, which just so happened to prevent triggering this bug for some of those users. That analysis agrees with user experience with actual scaling bugs like #5841, where having that commit or reversing it made no difference.

@mormj
Copy link
Contributor

mormj commented Nov 14, 2022

Anyone able to test this out on mac? (tagging names from #4174)
@geraman21
@NeuroForLunch
@jackdally
@sam-k
@mkierczak

@ahooper
Copy link

ahooper commented Nov 16, 2022

Thanks! This fixed my crash connecting blocks. #5431 GNU Radio Companion 3.10.4.0 MacOS 10.15.7

@willcode willcode added the Backport-3.10 Should be backported to 3.10 label Nov 17, 2022
@willcode
Copy link
Member

WRT get_scale_factor(), this doesn't affect anything for HiDPI? Since the scale factor isn't used anywhere else in GRC, it was affecting only the DrawingArea and not any of the content, so I'm pretty sure it was already useless ... just checking.

@RobertSwirsky
Copy link

Thank you! This bug has been plaguing me for years! I replaced the two python files on my Windows installation, and all is fine (and the fix makes sense, too.)

@ryanvolz
Copy link
Contributor Author

WRT get_scale_factor(), this doesn't affect anything for HiDPI? Since the scale factor isn't used anywhere else in GRC, it was affecting only the DrawingArea and not any of the content, so I'm pretty sure it was already useless ... just checking.

Right, it only changes the size of the DrawingArea and so is not actually helpful for HiDPI. Before understanding this and knowing that including the get_scale_factor() fixed problems for some people, I thought that it might be because the rest of the GUI was scaled on HiDPI systems but the DrawingArea wasn't because it worked in native pixels, but that's not true. Everything including the DrawingArea gets scaled properly, and since I don't have HiDPI I've tested this on my non-HiDPI system by setting GDK_SCALE=2 when running GRC and finding that everything does in fact double in size. By including the get_scale_factor stuff, the DrawingArea just gets doubled in size again with the side effect of making this bug harder to trigger. So now that this actually fixes the bug, the get_scale_factor is just undesirable.

@ryanvolz
Copy link
Contributor Author

We have confirmation from a few users now in the various bug reports that this fixes those issues (linked by the PR). I haven't heard back yet about a few of the other issues that are potentially related (e.g. GRC 1/4 window size on macOS), but the reports now cover everything I thought should be fixed.

Copy link
Contributor

@mormj mormj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the confirmation of fixes, let's go ahead and merge this one! Very nice!

@mormj mormj merged commit 28bbc39 into gnuradio:main Nov 17, 2022
@dkozel
Copy link
Contributor

dkozel commented Nov 18, 2022

This is huge. This could be the most impactful bugfix of the year as the combo of issues have been total blockers for huge groups of potential users. Ryan, thank you so much for tracking this down.

@cryptik
Copy link

cryptik commented Dec 4, 2022

I am running GnuRadio v 3.10.4.0 installed from home-brew on Ventura 13.0.1 and an M1 and getting the segfaults when using gnu-radio companion. It does not segfault right away... but at random times... even if just moving around the widget menu. How can I tell if this particular fix (#6352) is included in the version that I am running? Should I build from source?

@ryanvolz ryanvolz deleted the cairo_context_fix branch December 4, 2022 19:19
@ryanvolz
Copy link
Contributor Author

ryanvolz commented Dec 4, 2022

@cryptik This fix won't be in your 3.10.4.0 installation, but it's easy enough to modify it without even reinstalling. Just find where Homebrew has installed the particular Python files that this changes and edit them manually as in this PR.

@cryptik
Copy link

cryptik commented Dec 4, 2022

Hi @ryanvolz, thanks for the quick response. I was able to modify the files indicated in the PR. I am not sure if this is the correct place, but in my Homebrew install, it appears the python files are located here (/opt/homebrew/Cellar/gnuradio/3.10.4.0_1/lib/python3.10/site-packages/gnuradio/grc/gui). Hopefully there are not also copied somewhere else. After making the changes, GRC is hugely more stable. I do still get segfaults, but they are very infrequent now.

@cryptik
Copy link

cryptik commented Dec 4, 2022

I might have a different problem. It seems to still segfault whenever I try and load a .grc file from disk. It loads the file but as soon as I move the mouse, it segfaults. Running a .grc file seems to do the same thing, but on a more random basis. This Mac M1 has been more trouble than it's worth. I so wish I could return it.

cas-- added a commit to cas--/Deluge that referenced this pull request Aug 28, 2023
Windows users have reported Deluge crashes when resizing the window with
Piecesbar or Stats plugins enabled:

    Expression: CAIRO_REFERENCE_COUNT_HAS_REFERENCE(&surface->ref_count)

This is similar to issues fixed in GNU Radio which is a problem due to
storing the current cairo context which is then being destroyed and
recreated within GTK causing a reference count error

Fixes: https://dev.deluge-torrent.org/ticket/3339
Refs: gnuradio/gnuradio#6352
cas-- added a commit to cas--/Deluge that referenced this pull request Aug 28, 2023
Windows users have reported Deluge crashes when resizing the window with
Piecesbar or Stats plugins enabled:

    Expression: CAIRO_REFERENCE_COUNT_HAS_REFERENCE(&surface->ref_count)

This is similar to issues fixed in GNU Radio which is a problem due to
storing the current cairo context which is then being destroyed and
recreated within GTK causing a reference count error

Fixes: https://dev.deluge-torrent.org/ticket/3339
Refs: gnuradio/gnuradio#6352
cas-- added a commit to cas--/Deluge that referenced this pull request Aug 28, 2023
Windows users have reported Deluge crashes when resizing the window with
Piecesbar or Stats plugins enabled:

    Expression: CAIRO_REFERENCE_COUNT_HAS_REFERENCE(&surface->ref_count)

This is similar to issues fixed in GNU Radio which is a problem due to
storing the current cairo context which is then being destroyed and
recreated within GTK causing a reference count error

Fixes: https://dev.deluge-torrent.org/ticket/3339
Refs: gnuradio/gnuradio#6352
cas-- added a commit to cas--/Deluge that referenced this pull request Sep 18, 2023
Windows users have reported Deluge crashes when resizing the window with
Piecesbar or Stats plugins enabled:

    Expression: CAIRO_REFERENCE_COUNT_HAS_REFERENCE(&surface->ref_count)

This is similar to issues fixed in GNU Radio which is a problem due to
storing the current cairo context which is then being destroyed and
recreated within GTK causing a reference count error

Fixes: https://dev.deluge-torrent.org/ticket/3339
Refs: gnuradio/gnuradio#6352
cas-- added a commit to deluge-torrent/deluge that referenced this pull request Sep 18, 2023
Windows users have reported Deluge crashes when resizing the window with
Piecesbar or Stats plugins enabled:

    Expression: CAIRO_REFERENCE_COUNT_HAS_REFERENCE(&surface->ref_count)

This is similar to issues fixed in GNU Radio which is a problem due to
storing the current cairo context which is then being destroyed and
recreated within GTK causing a reference count error

Fixes: https://dev.deluge-torrent.org/ticket/3339
Refs: gnuradio/gnuradio#6352
Closes: #431
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment