-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fltk 1.4.x/Linux] application invoking native filechooser crashes under linux if fltk built with --enable-localpng #232
Comments
If building FLTK without
..which forces the native file chooser to use FLTK's own file chooser instead of GTK's. |
Unfortunately this is a known issue and can only be avoided properly by not using the bundled image libs (jpeg, png) and zlib. The reason is that some system libs are linked to these shared libs anyway (unless you build "everything" yourself) and you will end up with an ambiguity (at least) to resolve a particular symbol from one of these libs (FLTK or system lib) and maybe the linker will pick the wrong one.
As you can see, if you're using Note that |
PS: I'm not aware of another open STR or issue about this library linking problem. There's however STR 3409 "libpng" that mentions a related, similar problem with finding and linking libpng. |
Think it's worth mentioning in the docs for FNFC as another caveat? |
I'd appreciate that. |
Crap!!! fluid has this problem too, and I just lost all my work. I was using fluid to make a new project (fluid somefile.fl), and spent about 1/2 hour building the GUI, and when I hit ^S to save the unnamed project, it freaking /crashed/ and lost everything I'd built. Argh. I guess fluid by default uses native chooser and because I'd built with FLTK's png lib, it blew up. This is really really bad. I'm thinking we need a way to probe GTK to see if its PNG version is different than the one we've built with, or something, and cause FNFC to fall back to the FLTK browser. Either that, or we better configure fluid with the above Fl::option() workaround until we have a fix, as this is gonna affect random users. |
I feel with you... it's always bad to lose one's work. As a workaround: I noticed that the native file chooser saves the 'preview' check button state between invocations (even after closing the application). I'm wondering if the file chooser crashes too if you disable the preview. This would, of course, require to use an fltk app that doesn't crash (built with system libs) or maybe it would suffice to open the file chooser in a dir w/o image files. I know it's only a workaround, but if it helps...
I don't think this is possible. Please keep in mind that using Xft already introduces this issue by linking with the system libpng implicitly, so it's not only the GTK file chooser that has issues. Just for curiosity: why are you building your app with the FLTK libpng in the first place? If it's for better compatibility with different systems (aka portability) you should maybe reconsider this, at least on Linux systems. But maybe you have reasons I don't see? |
FTR: there has been a proposal to use "prefixing" [1] (or something like that) to rename all lib{png|jpeg|z} symbols (function names etc.) for internal use. This would - if done correctly - essentially make the bundled libs totally different than the system libs. This would avoid name clashes like this but would mean that app code would also have to use the prefixed function names if the app wants to use the (bundled) libs. I tried to search for an STR or issue but couldn't find one (which doesn't mean that it doesn't exist). [EDIT:] STR 3514 "Bundled image libs on Linux are incompatible with Xft". See also this comment below. Generally I don't like this approach because it obfuscates other issues, but if it helps solving this issue it might be a potential solution. [1] Prefixing works by defining other names like ' |
My main concern is others loosing their work too. Perhaps linking in FLTK's image libs should automatically disable the GTK stuff..? I'm not sure what the right solution is, but we can't leave a time bomb where a runtime config could cause FLTK widgets to coredump.. esp. a file chooser which will crash just before a save.
For the usual reasons; to statically link known compatible libs to avoid missing .so/.dll errors at runtime, and/or subtle incompatibilities with runtime DLLs (such as shown here).
Yes, I'd thought of that too, so that app code can access the static FLTK libs directly, and not the DLLs GTK uses. I suppose some tricks could be offered via inline wrapper methods so that if the actual image function is called Might need an encompassing namespace or class for the wrappers to prevent linker collisions with the OS image libs. I suppose the timing would be good for 1.4.x where we can break API a bit by forcing folks who want to access the FLTK image libs to include a namespace, or maybe a static class, e.g. Fl_PNG_Lib::png_something() or some such. I don't know all the tricks. namespace might be the best solution, as then the user could just add one line at the top of their source code to get things to build again.
Right, though I think we've already seen how macro solutions are bad for stuff like this, as macros expand /everywhere/ (recall the problems |
[I heavily edited the above msg, so be sure to read it on github, and not in email] |
I'm aware of the problems with macros and that's why I wrote that I don't like that solution. I only mentioned it because ISTR that there was a request to use it for building FLTK and some of the bundled libs offer it. For instance: zlib offers I agree that a namespace like However, what would we do with the FLTK code to be able to use either the system libs or the bundled libs? Would we just use the namespace by adding Anyway, this is an interesting idea which seems to be worth investigating ... |
I figure the bundled code's function names have to all be changed, and the wrapper code would be turned on or off based on the config settings. If turned off at config, any #include for the wrapper .h file would trigger a compile time #error or some such I imagine.
[EDIT] We might need to provide a way for user code to detect which it's getting, just for debugging purposes, e.g. #ifdef FL_PNG_LIB_BUNDLED or FL_PNG_LIB_LOCAL, or whatever terminology we settle on. I think already one person was confused as to what "local" meant for the config flags. Maybe "system" vs "bundled", or some such.. |
If I read this correctly [EDIT: you say that] we'd have to change the source code of the bundled libs, something I'd really want to avoid. Adding a namespace would help, but that needs C++ whereas the libs are all pure C code. I tried to compile the libs with a C++ compiler but that failed for several reasons. I imagine an approach like this:
That's only a proof of concept, details need to be worked out. The problem with the bundled libs is that we really need to change the "names" which could ideally be done with such a namespace trick without editing the source code, but unfortunately I couldn't find a similar way to do it for zlib. Still thinking about a good way, but for today I had to give up. FYI: The reason why I absolutely don't want to "edit" the source code of the bundled libs is that it would make the maintenance (upgrading) really hard. And I'm the one who did this in the past, I know what I'm talking about. It's pretty straightforward now (with a little diff magic) but I'm afraid it would be much harder if we had to edit the source code. |
Hmm, could we define a namespace around the entire bundled PNG/JPEG/etc image lib functions, thereby renaming them from the linker's point of view? That might save us the trouble of renaming the functions themselves.. |
That's what I tried and it didn't work. The C code in zlib is ... let's say complicated and strange [1]. My simple approach to do this with only one C file failed miserably with lots of hard to understand error codes. I didn't save any of these, but that wouldn't be helpful anyway. But generally that's the idea I tried to follow... [1] Example code (from gzread.c):
This code doesn't compile with g++. I know this is an ancient C standard, but why does this not compile with a C++ compiler? Was this way to declare arguments dropped? And why are they using it in zlib? Maybe we can find a way, but as I wrote before, for today I gave up. Sometimes sleeping a night and starting over again with a fresh mind helps... |
PS: if you have an idea ... |
Yikes, I was afraid there might be some old C style code in PNG/JPEG/etc. I don't think C++ allows k&r style parameters /at all/. Doing some research, there's tools like I'm afraid that's all I've got offhand, as I'm in the middle of coding in production again, and can't split my time at the moment. I just did a quick google search for "resolving global symbol conflicts in libraries" and found these: Sounds like we could do something like Not surprisingly, seems other tools have been down this path, e.g. DynamoRIO/dynamorio#3348
On a whim, looking at our png library, it /seems/ like they may provide some mechanisms for global/extern symbol name prefixing. I just grepped the .h files for 'prefix' and found a few hits. Might be worth researching, not sure if any of it is relevant or not. This one comment caught my eye:
There was also something called PNG_PREFIX in png.h, but again didn't read into it. |
Short answer:
I'll investigate further... [Edit: missed to post this comment for hours but eventually found it and sent it.] |
Meanwhile I found (my own) STR 3514 "Bundled image libs on Linux are incompatible with Xft". This is about the same issue I mentioned above (shared Xft lib linking with libpng and zlib). The discussion links to a thread in fltk.general with the title "Fl_Native_File_Chooser() crash with FLTK 1.4" which sounds familiar. An interesting comment is "I'd suggest to ... use a unique symbol prefixing for extra insurance. So, this is all for today, but this might be a good starting point for "symbol prefixing" ... |
FTR: I tried FLTK 1.4 with all three bundled image libs (CMake, debug and release builds + configure/make). Neither fluid nor native-filechooser crash in my environment, but they display a warning and some icons (folder + preview checkmark) are missing. Despite of this the image preview works. Error messages:
|
I'm guessing your GTK was built with a similar PNG library that FLTK has, and therefore isn't crashing. On my system, the system PNG libs seem to be:
..and the one inside FLTK is: libpng version 1.6.37 - April 14, 2019 I'm not sure which one of the above GTK was built against, but I'm guessing it's the 15 (1.5?) version, based on this:
I'm on Sci Linux 7, which is old because they stopped continuing releases, so until I upgrade to a newer workstation (dread), I'm stuck in 2018, lol. But a good test considering many customers are running older OS's. |
I'm not sure if the crash in my case was due to the failed assertion test in GTK (the last error before it crashed), or it went on ahead past the assert() and crashed because it walked off memory down a NULL pointer or some such. Either way, sounds like the difference between png 1.5.x and 1.6.x is enough to be crash worthy. To be expected I suppose if, like fltk, they bump the minor release number on ABI breakage. |
What does And yes, I'm pretty sure that 'png15' is 'png 1.5.x'. My system has png16 which is at least the same ABI as we're using. |
Result is:
Yes -- I looked in our fltk driver code, looks like it uses Due to that, ldd won't show anything about GTK from fluid because it's not linked, but rather conditionally `dl_open()'ed at runtime by the native file chooser's call to our drivers. |
[Damn, it happened again. This message sat in the editor and I forgot to post it.
FTR: this is not true for FLTK applications because we link The [newly added text] Looks like you are using libpng 1.5 then. I'll try one of my docker images with older systems if I can reproduce this ... |
I built a docker image with Scientific Linux and I can now reproduce the crash with this docker image:
FWIW: My docker build script 'Dockerfile' (rename to Nice "side effect": the Dockerfile is a short "HowTo install FLTK on Sci Linux" -- of course not in the same '/fltk' folder and not as root -- but it lists all required packages (but not some optional ones). |
That's great -- I assume our gitlab (or whoever it is now) commit hook build logs must show something similar to your docker build script. I recall seeing (I think?) docker related commands at the head of the build logs, can't remember, too lazy to look, lol.. I'm just waking up and need coffee. |
The docker script I posted is used to create a docker image using the The CI builds on GitHub and GitLab are similar pre-configured docker containers which we're using to build the software, mostly based on Ubuntu IIRC, but also macOS and Windows. Installing necessary packages is common to both approaches, hence it's very similar. My intention to use docker in the first place (some weeks ago) was to be able to use different Linux distros and try to build FLTK so I can find out which package installers to use and which packages need to be installed. The goal is to help users and/or document builds on different systems. You can use even much older Linux systems for testing. In this case I just wanted to be able to reproduce the error with "your" My next step will be testing symbol prefixing. Tomorrow. Hopefully. |
FTR: I uploaded an experimental branch to my FLTK fork which adds symbol prefixing to libpng. Other libs (zlib + jpeg) need still to be done, and this code has only been tested on Linux with the FLTK test and demo programs. I looked at the png lib with I'll try to test if this alone fixes the issue and I'll let you know what I found out. If you have a little spare time, feel free to test yourself and report if it helps. TIA. |
Update: I (force-)pushed another commit to my branch |
FTR: I updated zlib as well in my branch prefix-bundled-libs. This needs some more testing, I only tested on Linux so far. I'm not sure if we need to make it optional or if we need to change configure/CMake in some way, maybe requiring bundled zlib if png is using the bundled lib (because it depends on zlib) or anything else ... @erco77 Greg, please test this branch in your environment, if possible. TIA. |
I took a trip in June which bulk-erased this issue from my memory. I don't think any of my gui apps directly calls the image lib functions directly (libpng/libjpeg/etc), so probably won't run into trouble with that. But if I did, I guess I'd need to tweak my code to use whatever prefix you ended up using. (fltk_xxx I'm guessing?) One of my apps does use zlib for doing network packet compression, but that's non-gui (daemon) code, entirely a separate executable that doesn't involve FLTK, so that shouldn't be an issue I expect. |
Actually it's three slightly different techniques depending on what the three libraries offer but it's all transparent to the calling code if (and only if) you're including the "correct" header files. This should however be done automatically (driven by the build system as usual).
No, that's the trick, you don't need to change your code because it's all hidden in the header files. You can even copy the three library dir's from my branch to your working copy and recompile ( Thanks in advance for testing. |
Working on it now.. |
BTW: I built and ran it today on macOS w/o issues. |
I can confirm your fix by doing these steps on an Ubuntu 20.x machine:
I did have to add the -no-pie flag to both compile+link to prevent some weird errors about relocation. Doing that with your branch, I could run native-filechooser and browse files without a crash. Doing the same steps with the current version of FLTK 1.4.x would immediately crash with the following error when I hit the "Pick File" button:
So a tentative thumbs up from me. |
Thanks for your test and the report. However, I'm wondering...
I'm just puzzled. I know that I could reproduce the issue (i.e. error with old master and no error with my branch) on my Sci Linux (Docker) system, hence I was pretty sure it worked anyway but I'm a little surprised by your report. I believe we can merge the branch and you don't need to test on an old Sci Linux system. It would have been nice to have but if you don't need to (re)build such an old system anyway then don't bother. I'm now more interested that the branch works on any system w/o issues and I think this is now clear. My take on it is to merge it tomorrow. Any concerns? |
Clarification: on my system the master branch doesn't crash when running So yes, the error manifests on Linux Mint 20 as well although it doesn't crash the application. |
I did step 3 because it's what I do in my commercial builds. It's a practice I've done for a looong time, so perhaps it's dated. But it ensures no .so files accidentally picked up (by whatever means, e.g. in case both are present in the lib dir), and being explicit ensures the build fails if the .a files are not present. Well, then it's interesting I get the crash with 1.4.x and not your branch; perhaps it's crashing in 1.4.x for another reason then. In case you can replicate, here's the compile/link commands I used for both 1.4.x and your branch.. just a modified version of the default compile/link commands.
Notes for the above; starting with the default fltk build commands, I then:
|
BTW, I just cloned fltk-1.4.x current in case mine was mid-development or something. Redid the experiment, same results: crash in newly cloned 1.4.x with the above build commands, no crash with those build commands with your branch. Curious if you can replicate the crash in one and not the other. |
FWIW, here's the gdb output of the crash: 1.4.x-current, configured with local png/jpeg/zlib + debugging enabled: gdb-1.4.x-crash.txt There were quite a few lines I removed that had ??'s in place of function names in the gtk part of the calling hierarchy, which is why the stack numbers have gaps. |
I can replicate the error (
(that's all, nothing else) but the only noticeable effect in the running program is the missing check marks. I suspect these check marks (white on green background) are the icons "from icon theme" (see error message). The For me it doesn't matter if the application really crashes or not (this may be caused by internal differences). It exhibits an error which it does not if built with the new branch. And it's clear that it has to do with the incompatibilities of the image libraries. Winfried also confirmed in #289 that symbol prefixing (the new branch) fixes the crash in his environment. Good to go? |
Thanks for the stack dump, but it doesn't really help, does it? It shows what we knew already: there's an |
Sounds good to me, go. Yes, the assert that crashes might be due to the presence of gtk development libs, though it's interesting that doesn't get triggered with your branch. That's a scary assert; it's the kind of thing people would run into right when they're saving their work - ugh. At some point I'll try to rebuild my commercial branch on Sci Linux 7, and test with your mods, because I kinda have to for consistency for my build requirements. I have that OS installed on an external USB drive, so I should be able to bring it up on a similar hardware. If I ever run into trouble, I'll open a new issue later. |
OK, I'll merge later today.
Why is this "interesting"? That's the purpose of the modifications, it doesn't happen because "we" are using "our" image libs (with different, prefixed function names) and the GTK libs can now use the system (shared) libs as required.
Please let me know if it works. |
Prefix bundled libs. This fixes issues #232 and #289 and STR 3514 (https://www.fltk.org/str.php?L3514). Parts of this fix are based on the work of GitHub user @darealshinji who provided instructions to create the jpeg header file with prefixes in STR 3347 (https://www.fltk.org/str.php?L3347). Thanks.
Merged in commit 796a9bf. |
Replication with fltk.1.4.x + Linux (SciLinux 7 in my case):
./configure --enable-localpng
make
I assume this is due to a difference in GTK's version of PNG and FLTK's.
Works fine if
--enable-localpng
is not used.Apparently the GTK file browser doesn't like to be linked with FLTK's PNG library.
I'm not sure if this is avoidable, but it's very deadly for the app.
The text was updated successfully, but these errors were encountered: