Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kiva segfaulting on an img_plot on OSX (not Windows) #315

Closed
jonathanrocher opened this issue Oct 12, 2016 · 11 comments
Closed

Kiva segfaulting on an img_plot on OSX (not Windows) #315

jonathanrocher opened this issue Oct 12, 2016 · 11 comments

Comments

@jonathanrocher
Copy link
Collaborator

jonathanrocher commented Oct 12, 2016

I am occasionally getting a segfault 11 of the form:

Process: Python [15499]
Path: /Users/USER/*/Python.app/Contents/MacOS/Python
Identifier: org.python.python
Version: 2.7.6 (2.7.6)
Code Type: X86-64 (Native)
Parent Process: bash [88535]
Responsible: Terminal [310]
User ID: 501

Date/Time: 2016-10-11 22:25:13.634 -0600
OS Version: Mac OS X 10.11.6 (15G1004)
Report Version: 11
Anonymous UUID: 6A5D759C-6553-8C85-EF5F-9A50C6539B5E

Sleep/Wake UUID: 09B3EE01-592B-4DDF-BAAC-04CE1BE086CD

Time Awake Since Boot: 830000 seconds
Time Since Wake: 11000 seconds

System Integrity Protection: enabled

Crashed Thread: 0 Dispatch queue: com.apple.main-thread

Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000200000794

VM Regions Near 0x200000794:
CoreUI image data 000000011f237000-000000011f298000 [ 388K] rw-/rwx SM=PRV
-->
STACK GUARD 0000700000000000-0000700000001000 [ 4K] ---/rwx SM=NUL stack guard for thread 1

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 _agg.so 0x000000011bcafef0 void kiva::graphics_context<agg24::pixfmt_alpha_blend_rgba<agg24::blender_rgba<agg24::rgba8, agg24::order_bgra>, agg24::row_ptr_cache, unsigned int> >::transform_image_interpolate<agg24::pixfmt_alpha_blend_rgba<agg24::blender_rgba<agg24::rgba8, agg24::order_rgba>, agg24::row_ptr_cache, unsigned int> >(kiva::graphics_context<agg24::pixfmt_alpha_blend_rgba<agg24::blender_rgba<agg24::rgba8, agg24::order_rgba>, agg24::row_ptr_cache, unsigned int> >&, agg24::trans_affine&) + 2000 (agg_pixfmt_rgba.h:1671)
1 _agg.so 0x000000011bcb8685 kiva::graphics_context<agg24::pixfmt_alpha_blend_rgba<agg24::blender_rgba<agg24::rgba8, agg24::order_bgra>, agg24::row_ptr_cache, unsigned int> >::transform_image(kiva::graphics_context_base*, agg24::trans_affine&) + 165 (kiva_graphics_context.h:1651)
2 ??? 0x00007fff5fbf4fa0 0 + 140734799761312
3 ??? 0x3fe0000000000000 0 + 4602678819172646912

when I run the following:

import numpy as np
from numpy import exp, linspace, meshgrid
import pandas as pd

from chaco.api import ArrayPlotData, Plot, jet
from enable.component_editor import ComponentEditor
from traits.api import HasTraits, Instance
from traitsui.api import Item, View


class ImagePlot(HasTraits):

    plot = Instance(Plot)

    data = Instance(pd.DataFrame)

    traits_view = View(Item('plot', editor=ComponentEditor(),
                            show_label=False),
                       width=500, height=500,
                       resizable=True,
                       title="Chaco Plot")

    def _plot_default(self):
        pivoted = self.data.pivot_table(index="x", columns="y", values="z")
        plotdata = ArrayPlotData(zdata=pivoted.values)
        # Create a Plot and associate it with the PlotData.
        plot = Plot(plotdata)
        # Create a line plot in the Plot
        plot.img_plot("zdata", colormap=jet)
        return plot


if __name__ == "__main__":
    data = pd.DataFrame({"x": [0, 1, 0, 1], "y": [0, 0, 1, 1],
                         "z": [0., 1., 2., 3.]})
    demo = ImagePlot(data=data)
    demo.configure_traits()

I am seeing this on OSX 10.11.6, with a Canopy environment with chaco 4.5.0-3 and enable 4.5.1-11 (numpy 1.10, and pandas 0.18.0-8). I am seeing this segfault with another environment with same chaco, enable 4.6dev, numpy 1.9.2-3 and pandas 0.16.2-2 too.

Additionally, even just trying to plot a trivial np.ones array leads to a completely white plot, even when it doesn't segfault. The same code runs as expected on Windows (same versions of all packages).

Any idea what is going on?
cc @cfarrow @jwiggins

@cfarrow
Copy link
Member

cfarrow commented Oct 12, 2016

I ran this with the first combination (assuming numpy 1.10.4) and cannot reproduce it.

@jwiggins
Copy link
Member

Hey @jonathanrocher 😃

From what I can read in the stack trace, it appears to have segfaulted when drawing an image. Specifically, in the pixel access parts of the code, which has lots of pointer arithmetic. It also has lots of inlined functions, so an accurate stack trace is not too likely.

If it is a normal bug (and there is never a guarantee that it is), then it kinda looks like kiva is trying to draw from something which isn't there. I'm not quite sure how that might happen, but there it is.

@jonathanrocher
Copy link
Collaborator Author

Thanks @jwiggins for commenting! Not sure what to think of that... I am reproducing the segfault consistently. Is that what you mean by the bug is normal?

@jwiggins
Copy link
Member

I suppose I should clarify, huh?

What I meant by a "normal" bug was that this was some usual scenario like use after free, or passing an invalid object pointer. Bugs like that tend to be easier to reproduce (including on more than one system).

An "unusual" bug would be caused by some other code misbehaving and corrupting the memory used by AGG, thus causing the image pixel access to run off into the weeds. Looking at the address involved, that might actually be a possibility. The KERN_INVALID_ADDRESS at 0x0000000200000794 means that it tried to access memory address 0x0200000794 which was not mapped into the process VM map. That address looks suspicious. It's quite a nice round number (0x0200000000) plus an offset of 0x794. You could maybe get into this situation by writing data to the memory where your image buffer's C++ object lives. Then when you tell it to draw, it reads from some address in lala-land. You could try changing the contants in you sample code (that would be the values in the data frame) and see if the segfault address moves around. Unless that address is different every time; then I'm not sure what to suggest.

And since this is AGG, yet another "unusual" bug might be caused by C++ undefined behavior that was different when the code was written and has recently changed with new compilers. The AGG backend is certainly guilty of that on linux already (see enthought/enable#97)

@cfarrow
Copy link
Member

cfarrow commented Oct 14, 2016

FWIW, I've tried various combinations of dependent packages, thinking that there could be an ABI mismatch somewhere, and still cannot induce a segfault.

@jonathanrocher
Copy link
Collaborator Author

jonathanrocher commented Oct 14, 2016

I just found that indeed, the issue had nothing to do with chaco or kiva. I was able to reproduce the issue on my colleague's OSX who had a similar set up than me, namely that the active gcc was macport's mp-gcc5. By deactivating that gcc, the segfaults go away. Before I close the ticket, can anyone explain to me why that should change anything? I am assuming that all libraries chaco is using are already compiled. Why would the active compiler have any impact?

Will close the ticket soon...

@jvkersch
Copy link
Contributor

@jonathanrocher Do you have more specifics as to how mp-gcc5 was the active gcc? Was it set via the CC environment variable, or found earlier in the PATH, ... ?

@jonathanrocher
Copy link
Collaborator Author

I used port select gcc ??. Not sure how port does this...

@jwiggins
Copy link
Member

Ahh, gcc... This sounds a lot like enthought/enable#97 now.

@jvkersch
Copy link
Contributor

In addition to John's answer (and to venture a guess as to why setting the compiler triggers the segfault even when there's nothing being compiled), I suspect that somehow setting the active compiler also affects where the loader looks for libraries and that there is some ABI incompatibility with the libraries that it finds.

@jonathanrocher If you want to explore further, it might be worth setting the DYLD_PRINT_LIBRARIES in your environment and running the example with and without the non-standard gcc, and comparing the libraries being loaded. Perhaps that could point towards the likely culprit...

@jonathanrocher
Copy link
Collaborator Author

Yes, that's it @jwiggins ! Thanks @jvkersch , your guess makes sense. Thanks for the tip, I will give that a try that DYLD_PRINT_LIBRARIES to better understand what is going on. Closing this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants