Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fltk 1.4.x/Linux] application invoking native filechooser crashes under linux if fltk built with --enable-localpng #232

Closed
erco77 opened this issue May 19, 2021 · 49 comments · Fixed by #292
Assignees
Labels
active Somebody is working on it bug Something isn't working fixed The issue or PR was fixed. Platform: X11 platform specific (Unix/X11)) waiting for confirmation waiting for someone's confirmation

Comments

@erco77
Copy link
Contributor

erco77 commented May 19, 2021

Replication with fltk.1.4.x + Linux (SciLinux 7 in my case):

  1. ./configure --enable-localpng
  2. make
  3. `test/native-filechooser
  4. Click "Pick File"
  5. Application core dumps with these errors:
(native-filechooser:15756): GdkPixbuf-WARNING **: 13:34:20.782: Bug! loader 'png' didn't set an error on failure
(native-filechooser:15756): Gtk-WARNING **: 13:34:20.782: Could not load a pixbuf from icon theme.
This may indicate that pixbuf loaders or the mime database could not be found.
(native-filechooser:15756): GdkPixbuf-WARNING **: 13:34:20.782: Bug! loader 'png' didn't set an error on failure
**
Gtk:ERROR:gtkiconhelper.c:494:ensure_surface_for_gicon: assertion failed (error == NULL): Failed to load /usr/share/icons/Adwaita/16x16/status/image-missing.png: Internal error: Image loader module ?png? failed to complete an operation, but didn?t give a reason for the failure (gdk-pixbuf-error-quark, 5)
Abort (core dumped)

I assume this is due to a difference in GTK's version of PNG and FLTK's.

Works fine if --enable-localpng is not used.

Apparently the GTK file browser doesn't like to be linked with FLTK's PNG library.

I'm not sure if this is avoidable, but it's very deadly for the app.

@erco77
Copy link
Contributor Author

erco77 commented May 19, 2021

If building FLTK without --enable-localpng is not an option, a workaround is to set this in your app:

Fl::option(Fl::OPTION_FNFC_USES_GTK, false);

..which forces the native file chooser to use FLTK's own file chooser instead of GTK's.
I'm not sure if there's a better solution, but for now that gets me through QC for my app.

@Albrecht-S
Copy link
Member

Albrecht-S commented May 20, 2021

Unfortunately this is a known issue and can only be avoided properly by not using the bundled image libs (jpeg, png) and zlib. The reason is that some system libs are linked to these shared libs anyway (unless you build "everything" yourself) and you will end up with an ambiguity (at least) to resolve a particular symbol from one of these libs (FLTK or system lib) and maybe the linker will pick the wrong one.
Example (Linux Mint 20, based on Ubuntu 20.04 LTS):

$ ldd /usr/lib/x86_64-linux-gnu/libXft.so.2 | egrep 'png|libz'
	libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007fbc0737c000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fbc07360000)

As you can see, if you're using xft you're already running into that specific conflict with libpng and libz. Since we link GTK dynamically this is even more problematic.

Note that Pango pulls in Cairo and Cairo depends on several other shared libs as well...

@Albrecht-S
Copy link
Member

PS: I'm not aware of another open STR or issue about this library linking problem. There's however STR 3409 "libpng" that mentions a related, similar problem with finding and linking libpng.

@erco77
Copy link
Contributor Author

erco77 commented May 20, 2021

Think it's worth mentioning in the docs for FNFC as another caveat?
We already have a few under "Platform Specific Caveats"
I can write it, and reference this issue.

@Albrecht-S
Copy link
Member

I can write it, and reference this issue.

I'd appreciate that.

@erco77
Copy link
Contributor Author

erco77 commented May 24, 2021

Crap!!!

fluid has this problem too, and I just lost all my work.

I was using fluid to make a new project (fluid somefile.fl), and spent about 1/2 hour building the GUI, and when I hit ^S to save the unnamed project, it freaking /crashed/ and lost everything I'd built. Argh. I guess fluid by default uses native chooser and because I'd built with FLTK's png lib, it blew up.

This is really really bad.

I'm thinking we need a way to probe GTK to see if its PNG version is different than the one we've built with, or something, and cause FNFC to fall back to the FLTK browser.

Either that, or we better configure fluid with the above Fl::option() workaround until we have a fix, as this is gonna affect random users.

@Albrecht-S
Copy link
Member

I feel with you... it's always bad to lose one's work.

As a workaround: I noticed that the native file chooser saves the 'preview' check button state between invocations (even after closing the application). I'm wondering if the file chooser crashes too if you disable the preview. This would, of course, require to use an fltk app that doesn't crash (built with system libs) or maybe it would suffice to open the file chooser in a dir w/o image files. I know it's only a workaround, but if it helps...

I'm thinking we need a way to probe GTK to see if its PNG version is different than the one we've built with, ...

I don't think this is possible. Please keep in mind that using Xft already introduces this issue by linking with the system libpng implicitly, so it's not only the GTK file chooser that has issues.

Just for curiosity: why are you building your app with the FLTK libpng in the first place? If it's for better compatibility with different systems (aka portability) you should maybe reconsider this, at least on Linux systems. But maybe you have reasons I don't see?

@Albrecht-S
Copy link
Member

Albrecht-S commented May 25, 2021

FTR: there has been a proposal to use "prefixing" [1] (or something like that) to rename all lib{png|jpeg|z} symbols (function names etc.) for internal use. This would - if done correctly - essentially make the bundled libs totally different than the system libs. This would avoid name clashes like this but would mean that app code would also have to use the prefixed function names if the app wants to use the (bundled) libs.

I tried to search for an STR or issue but couldn't find one (which doesn't mean that it doesn't exist).

[EDIT:] STR 3514 "Bundled image libs on Linux are incompatible with Xft". See also this comment below.

Generally I don't like this approach because it obfuscates other issues, but if it helps solving this issue it might be a potential solution.

[1] Prefixing works by defining other names like '#define png_something fl_png_something' for all functions and global data of the lib. Some of the bundled libs are prepared for this but not all IIRC.

@erco77
Copy link
Contributor Author

erco77 commented May 25, 2021

I feel with you... it's always bad to lose one's work.

My main concern is others loosing their work too.
FLTK has never lost info before, so this is a scar that takes a while to heal.. I think others would feel the same if we don't prevent this.

Perhaps linking in FLTK's image libs should automatically disable the GTK stuff..?

I'm not sure what the right solution is, but we can't leave a time bomb where a runtime config could cause FLTK widgets to coredump.. esp. a file chooser which will crash just before a save.

Just for curiosity: why are you building your app with the FLTK libpng in the first place?

For the usual reasons; to statically link known compatible libs to avoid missing .so/.dll errors at runtime, and/or subtle incompatibilities with runtime DLLs (such as shown here).

FTR: there has been a proposal to use "prefixing" [1] (or something like that) to rename all
lib{png|jpeg|z} symbols (function names etc.) for internal use.

Yes, I'd thought of that too, so that app code can access the static FLTK libs directly, and not the DLLs GTK uses. I suppose some tricks could be offered via inline wrapper methods so that if the actual image function is called png_something(), and our FLTK version is renamed fl_png_something(), one of our #include's could implement inline int png_something() { return fl_png_something(); } or some such.

Might need an encompassing namespace or class for the wrappers to prevent linker collisions with the OS image libs.

I suppose the timing would be good for 1.4.x where we can break API a bit by forcing folks who want to access the FLTK image libs to include a namespace, or maybe a static class, e.g. Fl_PNG_Lib::png_something() or some such. I don't know all the tricks. namespace might be the best solution, as then the user could just add one line at the top of their source code to get things to build again.

Prefixing works by defining other names like '#define png_something fl_png_something'

Right, though I think we've already seen how macro solutions are bad for stuff like this, as macros expand /everywhere/ (recall the problems #define open(a,b) _open(..) caused on windows, where the macro also affected Fl_SomeWidget::open(..) calls), whereas inline and namespace would at least abide class contexts and such.

@erco77
Copy link
Contributor Author

erco77 commented May 25, 2021

[I heavily edited the above msg, so be sure to read it on github, and not in email]

@Albrecht-S
Copy link
Member

I'm aware of the problems with macros and that's why I wrote that I don't like that solution. I only mentioned it because ISTR that there was a request to use it for building FLTK and some of the bundled libs offer it. For instance: zlib offers #define Z_PREFIX (configure option) to prefix all zlib symbols with 'z_' (see zconf.h).

I agree that a namespace like Fl_PNG_Lib:: sounds much better. I'd guess that most programs wouldn't even notice the difference because they don't use the image lib functions directly - they would only deal with Fl_PNG_Image etc.. Then, if a program decides to deal with the PNG image format directly, they could either link in the png lib of their own choice or use FLTK's Fl_PNG_Lib:: with the namespace.

However, what would we do with the FLTK code to be able to use either the system libs or the bundled libs? Would we just use the namespace by adding using ... or not, depending on the usage of the bundled lib or not, resp.? Unfortunately I think that we'd have to modify the source code of the bundled libs in a way that makes upgrading more difficult. Currently we're using the bundled lib code unmodified except the build files.

Anyway, this is an interesting idea which seems to be worth investigating ...

@erco77
Copy link
Contributor Author

erco77 commented May 25, 2021

However, what would we do with the FLTK code to be able to use either the system libs or the bundled libs?

I figure the bundled code's function names have to all be changed, and the wrapper code would be turned on or off based on the config settings. If turned off at config, any #include for the wrapper .h file would trigger a compile time #error or some such I imagine.

Would we just use the namespace by adding using ... or not, depending on the usage of the bundled lib or not, resp.?

[EDIT]
If I understand correctly, I guess it depends how easily we'd want the user (and FLTK) code that calls e.g. the png lib directly to be controlled by what our proposed #include <FL/Fl_PNG_Lib.H> wrappers would do, either turning on or off a 'using namespace Fl_PNG_Lib') so that both FLTK and user code can call png_something() and get one or the other based on the fltk compile time config settings..?

We might need to provide a way for user code to detect which it's getting, just for debugging purposes, e.g. #ifdef FL_PNG_LIB_BUNDLED or FL_PNG_LIB_LOCAL, or whatever terminology we settle on. I think already one person was confused as to what "local" meant for the config flags. Maybe "system" vs "bundled", or some such..

@Albrecht-S
Copy link
Member

Albrecht-S commented May 25, 2021

I figure the bundled code's function names have to all be changed, ...

If I read this correctly [EDIT: you say that] we'd have to change the source code of the bundled libs, something I'd really want to avoid. Adding a namespace would help, but that needs C++ whereas the libs are all pure C code. I tried to compile the libs with a C++ compiler but that failed for several reasons.

I imagine an approach like this:

$ git diff
diff --git a/FL/Fl_Adjuster.H b/FL/Fl_Adjuster.H
index c2e308128..3af85d80e 100644
--- a/FL/Fl_Adjuster.H
+++ b/FL/Fl_Adjuster.H
@@ -26,6 +26,8 @@
 #include "Fl_Valuator.H"
 #endif
 
+namespace FL_ADJ {
+
 /**
   The Fl_Adjuster widget …
@@ -64,4 +66,8 @@ public:
   int soft() const {return soft_;}
 };
 
+} // namespace FL_ADJ
+
+using namespace FL_ADJ;
+
 #endif
  1. As you can see (almost) the entire header Fl_Adjuster.H (I arbitrarily picked one) is surrounded by namespace FL_ADJ {...} which could in practice be in a separate file that includes the original source file. This way the original source would not need to be changed.
  2. The trailing using namespace FL_ADJ; would normally be expected in the user (consumer) source code but is in the header file for simplicity.
  3. With this simple proof of concept we have changed the entire class Fl_Adjuster to class FL_ADJ::Fl_Adjuster, i.e. we put it into a namespace.
  4. With this simple change the entire FLTK lib and test files can be built.
  5. That's how I would like to deal with the bundled libs.
  6. User code would then only have to include the appropriate header file and get the correct linkage.

That's only a proof of concept, details need to be worked out. The problem with the bundled libs is that we really need to change the "names" which could ideally be done with such a namespace trick without editing the source code, but unfortunately I couldn't find a similar way to do it for zlib. Still thinking about a good way, but for today I had to give up.

FYI: The reason why I absolutely don't want to "edit" the source code of the bundled libs is that it would make the maintenance (upgrading) really hard. And I'm the one who did this in the past, I know what I'm talking about. It's pretty straightforward now (with a little diff magic) but I'm afraid it would be much harder if we had to edit the source code.

@erco77
Copy link
Contributor Author

erco77 commented May 25, 2021

Hmm, could we define a namespace around the entire bundled PNG/JPEG/etc image lib functions, thereby renaming them from the linker's point of view? That might save us the trouble of renaming the functions themselves..
This implies the image libs would have to be built with the C++ compiler (instead of C), but perhaps that'd work?

@Albrecht-S
Copy link
Member

That's what I tried and it didn't work. The C code in zlib is ... let's say complicated and strange [1]. My simple approach to do this with only one C file failed miserably with lots of hard to understand error codes. I didn't save any of these, but that wouldn't be helpful anyway. But generally that's the idea I tried to follow...

[1] Example code (from gzread.c):

/* Local functions */
local int gz_load OF((gz_statep, unsigned char *, unsigned, unsigned *));
local int gz_avail OF((gz_statep));
...
local z_size_t gz_read OF((gz_statep, voidp, z_size_t));
...
local int gz_load(state, buf, len, have)   <--
    gz_statep state;                       <--
    unsigned char *buf;                    <--
    unsigned len;                          <--
    unsigned *have;                        <--
{
    int ret;

This code doesn't compile with g++. I know this is an ancient C standard, but why does this not compile with a C++ compiler? Was this way to declare arguments dropped? And why are they using it in zlib?

Maybe we can find a way, but as I wrote before, for today I gave up. Sometimes sleeping a night and starting over again with a fresh mind helps...

@Albrecht-S
Copy link
Member

PS: if you have an idea ...

@erco77
Copy link
Contributor Author

erco77 commented May 25, 2021

Yikes, I was afraid there might be some old C style code in PNG/JPEG/etc. I don't think C++ allows k&r style parameters /at all/.

Doing some research, there's tools like objcopy --prefix-symbols that can insert a prefix into all global symbols of an already built static lib, so if e.g. our png library is built static, that can be invoked after to rename the symbols. But that surely doesn't apply to all platforms/compilers..

I'm afraid that's all I've got offhand, as I'm in the middle of coding in production again, and can't split my time at the moment.

I just did a quick google search for "resolving global symbol conflicts in libraries" and found these:
https://www.ivolve.com/solving-c-symbol-collisions/
https://stackoverflow.com/questions/6940384/how-to-deal-with-symbol-collisions-between-statically-linked-libraries

Sounds like we could do something like objcopy --prefix-symbols fl_ somepng.o to insert prefixes on all global symbols for a statically built lib, but I'm not sure that carries to all compilers/platforms we support.

Not surprisingly, seems other tools have been down this path, e.g. DynamoRIO/dynamorio#3348
Quoting the OP of that issue:

We're using DR as a static library more and more these days, and the current solution for
hiding its global symbols, running objcopy --localize-hidden, is a fragile post-link step that
conflicts with other build steps such as lld's ICF and which can break when used in complex
 build configurations.

On a whim, looking at our png library, it /seems/ like they may provide some mechanisms for global/extern symbol name prefixing. I just grepped the .h files for 'prefix' and found a few hits. Might be worth researching, not sure if any of it is relevant or not. This one comment caught my eye:

/* To support symbol prefixing it is necessary to know *before* including png.h
 * whether the fixed point (and maybe other) APIs are exported, because if they
 * are not internal definitions may be required.  This is handled below just
 * before png.h is included, but load the configuration now if it is available.
 */

There was also something called PNG_PREFIX in png.h, but again didn't read into it.

@Albrecht-S
Copy link
Member

Albrecht-S commented May 26, 2021

Short answer:

  1. IMHO we can rule out the objcopy approach because it's not portable
  2. all that "prefixing" stuff I mentioned before works with #define ... which is not desirable but may be the only working solution
  3. AFAICT the provided prefixing in zlib uses 'z_' as prefix (no other option), which may suffice for FLTK internally but may again lead to conflicts if the user links with yet another zlib with (the same) prefix

I'll investigate further...

[Edit: missed to post this comment for hours but eventually found it and sent it.]

@Albrecht-S
Copy link
Member

Meanwhile I found (my own) STR 3514 "Bundled image libs on Linux are incompatible with Xft". This is about the same issue I mentioned above (shared Xft lib linking with libpng and zlib).

The discussion links to a thread in fltk.general with the title "Fl_Native_File_Chooser() crash with FLTK 1.4" which sounds familiar.

An interesting comment is "I'd suggest to ... use a unique symbol prefixing for extra insurance.
Here are patches I used to do the symbol prefixing in FLTK 1.3.4: https://github.com/darealshinji/fltk-1.3.4/tree/master/patches"
which is very likely the discussion I had in mind.

So, this is all for today, but this might be a good starting point for "symbol prefixing" ...

@Albrecht-S Albrecht-S changed the title [fltk 1.4.x/Linux] application invoking native chooser crashes under linux if if fltk built with --enable-localpng [fltk 1.4.x/Linux] application invoking native filechooser crashes under linux if fltk built with --enable-localpng May 26, 2021
@Albrecht-S
Copy link
Member

FTR: I tried FLTK 1.4 with all three bundled image libs (CMake, debug and release builds + configure/make). Neither fluid nor native-filechooser crash in my environment, but they display a warning and some icons (folder + preview checkmark) are missing. Despite of this the image preview works.

Error messages:

(fluid:197351): Gtk-WARNING **: 13:13:46.755: Could not load a pixbuf from icon theme.
This may indicate that pixbuf loaders or the mime database could not be found.
----
(native-filechooser:199387): Gtk-WARNING **: 14:17:27.543: Could not load a pixbuf from icon theme.
This may indicate that pixbuf loaders or the mime database could not be found.

@erco77
Copy link
Contributor Author

erco77 commented May 27, 2021

I'm guessing your GTK was built with a similar PNG library that FLTK has, and therefore isn't crashing.

On my system, the system PNG libs seem to be:

[erco@harris] 250 # ls -la /lib64/*png*
lrwxrwxrwx. 1 root root     11 Jun  5  2018 /lib64/libpng.so -> libpng15.so
lrwxrwxrwx. 1 root root     16 Jun  5  2018 /lib64/libpng.so.3 -> libpng.so.3.50.0
-rwxr-xr-x. 1 root root 175256 Nov  4  2016 /lib64/libpng.so.3.50.0
lrwxrwxrwx. 1 root root     18 Jun  5  2018 /lib64/libpng12.so.0 -> libpng12.so.0.50.0
-rwxr-xr-x. 1 root root 162400 Nov  4  2016 /lib64/libpng12.so.0.50.0
lrwxrwxrwx. 1 root root     19 Jun  5  2018 /lib64/libpng15.so -> libpng15.so.15.13.0
lrwxrwxrwx. 1 root root     19 Jun  5  2018 /lib64/libpng15.so.15 -> libpng15.so.15.13.0
-rwxr-xr-x. 1 root root 179296 Dec  9  2015 /lib64/libpng15.so.15.13.0

..and the one inside FLTK is: libpng version 1.6.37 - April 14, 2019

I'm not sure which one of the above GTK was built against, but I'm guessing it's the 15 (1.5?) version, based on this:

[erco@harris] 256 # ls -lad libgtk-*
lrwxrwxrwx. 1 root root      21 Nov 27  2018 libgtk-3.so -> libgtk-3.so.0.2200.30
lrwxrwxrwx. 1 root root      21 Nov 27  2018 libgtk-3.so.0 -> libgtk-3.so.0.2200.30
-rwxr-xr-x. 1 root root 7458680 Oct 31  2018 libgtk-3.so.0.2200.30
lrwxrwxrwx. 1 root root      23 Jun  5  2018 libgtk-vnc-2.0.so.0 -> libgtk-vnc-2.0.so.0.0.2
-rwxr-xr-x. 1 root root   70952 Apr 12  2018 libgtk-vnc-2.0.so.0.0.2
lrwxrwxrwx. 1 root root      27 Jun  5  2018 libgtk-x11-2.0.so -> libgtk-x11-2.0.so.0.2400.31
lrwxrwxrwx. 1 root root      27 Jun  5  2018 libgtk-x11-2.0.so.0 -> libgtk-x11-2.0.so.0.2400.31
-rwxr-xr-x. 1 root root 4817936 Aug  2  2017 libgtk-x11-2.0.so.0.2400.31

Thu 05/27/21 08:55:57 /usr/lib64
[erco@harris] 257 # ldd libgtk-vnc-2.0.so.0.0.2 | grep png
	libpng15.so.15 => /lib64/libpng15.so.15 (0x00007fbe56732000)

I'm on Sci Linux 7, which is old because they stopped continuing releases, so until I upgrade to a newer workstation (dread), I'm stuck in 2018, lol. But a good test considering many customers are running older OS's.

@erco77
Copy link
Contributor Author

erco77 commented May 27, 2021

I'm not sure if the crash in my case was due to the failed assertion test in GTK (the last error before it crashed), or it went on ahead past the assert() and crashed because it walked off memory down a NULL pointer or some such. Either way, sounds like the difference between png 1.5.x and 1.6.x is enough to be crash worthy. To be expected I suppose if, like fltk, they bump the minor release number on ABI breakage.

@Albrecht-S
Copy link
Member

$ man assert
assert - abort the program if assertion is false

What does ldd libgtk-3.so.0.2200.30 | grep png output? I'd think that this is the correct libgtk.so. You can also check an application with ldd and you'll see which one is linked.

And yes, I'm pretty sure that 'png15' is 'png 1.5.x'. My system has png16 which is at least the same ABI as we're using.

@erco77
Copy link
Contributor Author

erco77 commented May 28, 2021

What does ldd libgtk-3.so.0.2200.30 | grep png output?

Result is:

	libpng15.so.15 => /lib64/libpng15.so.15 (0x00007f2d29da6000)

I'd think that this is the correct libgtk.so.
You can also check an application with ldd and you'll see which one is linked.

Yes -- I looked in our fltk driver code, looks like it uses dl_open() at runtime to probe for the different versions of gtk available, starting with gtk3 first, and if not found falls back to gtk2.

Due to that, ldd won't show anything about GTK from fluid because it's not linked, but rather conditionally `dl_open()'ed at runtime by the native file chooser's call to our drivers.

@Albrecht-S
Copy link
Member

[Damn, it happened again. This message sat in the editor and I forgot to post it.
Sorry for duplicating some of your stuff. Here's for completeness what I wrote hours ago ...]

You can also check an application with ldd and you'll see which one is linked.

FTR: this is not true for FLTK applications because we link libgtk dynamically to test if it's available.

The libgtk versions we're looking for are libgtk-3 (GTK-3) and libgtk-x11-2.0 (GTK-2), in this order. For both shared libs we're appending '.so.n' with n = {2, 1, 0} - again in this order (per library), i.e. we prefer GTK-3 and fall back to GTK-2 if GTK-3 is not found. In your case this algorithm should find the symlink libgtk-3.so -> libgtk-3.so.0.2200.30.

[newly added text]

Looks like you are using libpng 1.5 then. I'll try one of my docker images with older systems if I can reproduce this ...

@Albrecht-S
Copy link
Member

Albrecht-S commented May 28, 2021

I built a docker image with Scientific Linux and I can now reproduce the crash with this docker image:

(native-filechooser:1): GdkPixbuf-WARNING **: 13:34:51.006: Bug! loader 'png' didn't set an error on failure

(native-filechooser:1): Gtk-WARNING **: 13:34:51.006: Could not load a pixbuf from icon theme.
This may indicate that pixbuf loaders or the mime database could not be found.

(native-filechooser:1): GdkPixbuf-WARNING **: 13:34:51.006: Bug! loader 'png' didn't set an error on failure
**
Gtk:ERROR:gtkiconhelper.c:494:ensure_surface_for_gicon: assertion failed (error == NULL): Failed to load /usr/share/icons/Adwaita/16x16/status/image-missing.png: Internal error: Image loader module ?png? failed to complete an operation, but didn?t give a reason for the failure (gdk-pixbuf-error-quark, 5)

FWIW: My docker build script 'Dockerfile' (rename to Dockerfile if you like to try it).

Nice "side effect": the Dockerfile is a short "HowTo install FLTK on Sci Linux" -- of course not in the same '/fltk' folder and not as root -- but it lists all required packages (but not some optional ones).

@erco77
Copy link
Contributor Author

erco77 commented May 28, 2021

That's great -- I assume our gitlab (or whoever it is now) commit hook build logs must show something similar to your docker build script. I recall seeing (I think?) docker related commands at the head of the build logs, can't remember, too lazy to look, lol.. I'm just waking up and need coffee.

@Albrecht-S
Copy link
Member

The docker script I posted is used to create a docker image using the docker build command. This loads an existing image (in this case sl == Scientific Linux) and then installs software we need etc.. I decided to "build" the docker image including FLTK downloaded and compiled/built. The result is a docker image I can run and thus use similar to a VM but it's a different technology. Once built it's saved on the local system for reuse. I could run new FLTK builds by starting a docker container using the existing image and running only git pull and make because everything is preinstalled.

The CI builds on GitHub and GitLab are similar pre-configured docker containers which we're using to build the software, mostly based on Ubuntu IIRC, but also macOS and Windows. Installing necessary packages is common to both approaches, hence it's very similar.

My intention to use docker in the first place (some weeks ago) was to be able to use different Linux distros and try to build FLTK so I can find out which package installers to use and which packages need to be installed. The goal is to help users and/or document builds on different systems. You can use even much older Linux systems for testing.

In this case I just wanted to be able to reproduce the error with "your" Sci Linux system to be able to test myself whether symbol prefixing can fix this error w/o installing a full VM. Building a docker image is done in minutes.

My next step will be testing symbol prefixing. Tomorrow. Hopefully.

@Albrecht-S
Copy link
Member

FTR: I uploaded an experimental branch to my FLTK fork which adds symbol prefixing to libpng. Other libs (zlib + jpeg) need still to be done, and this code has only been tested on Linux with the FLTK test and demo programs. I looked at the png lib with nm and saw that the global symbols are correctly prefixed.

I'll try to test if this alone fixes the issue and I'll let you know what I found out.

If you have a little spare time, feel free to test yourself and report if it helps. TIA.

@Albrecht-S
Copy link
Member

Good news: in my docker (Sci Linux) environment native-filechooser and fluid don't crash any more with the prefixed libpng. Screenshot (native-filechooser):
libpng-prefixed

Image and folder icons, the "preview" check mark and the image preview are all working now!

Configure: ./configure --enable-localpng --enable-localzlib --enable-localjpeg

@Albrecht-S
Copy link
Member

Update: I (force-)pushed another commit to my branch prefix-bundled-libs which adds symbol prefixes to libjpeg as well. Remaining part 3: zlib (not today). Builds fine so far on my standard Linux Mint 20 and Sci Linux 7 (docker).

@Albrecht-S
Copy link
Member

FTR: I updated zlib as well in my branch prefix-bundled-libs. This needs some more testing, I only tested on Linux so far. I'm not sure if we need to make it optional or if we need to change configure/CMake in some way, maybe requiring bundled zlib if png is using the bundled lib (because it depends on zlib) or anything else ...

@erco77 Greg, please test this branch in your environment, if possible. TIA.

@erco77
Copy link
Contributor Author

erco77 commented Nov 14, 2021

I took a trip in June which bulk-erased this issue from my memory.
Sounds great that you got the prefixed stuff working.
I'll try to pull it when I get a chance to see what technique you ended up using.

I don't think any of my gui apps directly calls the image lib functions directly (libpng/libjpeg/etc), so probably won't run into trouble with that. But if I did, I guess I'd need to tweak my code to use whatever prefix you ended up using. (fltk_xxx I'm guessing?)

One of my apps does use zlib for doing network packet compression, but that's non-gui (daemon) code, entirely a separate executable that doesn't involve FLTK, so that shouldn't be an issue I expect.

@Albrecht-S
Copy link
Member

Albrecht-S commented Nov 14, 2021

Sounds great that you got the prefixed stuff working.
I'll try to pull it when I get a chance to see what technique you ended up using.

Actually it's three slightly different techniques depending on what the three libraries offer but it's all transparent to the calling code if (and only if) you're including the "correct" header files. This should however be done automatically (driven by the build system as usual).

I don't think any of my gui apps directly calls the image lib functions directly (libpng/libjpeg/etc), so probably won't run into trouble with that. But if I did, I guess I'd need to tweak my code to use whatever prefix you ended up using. (fltk_xxx I'm guessing?)

No, that's the trick, you don't need to change your code because it's all hidden in the header files. You can even copy the three library dir's from my branch to your working copy and recompile (make distclean; [ autoconf; configure; ] make if you're using configure).

Thanks in advance for testing.

@Albrecht-S Albrecht-S linked a pull request Nov 14, 2021 that will close this issue
@Albrecht-S Albrecht-S self-assigned this Nov 14, 2021
@Albrecht-S Albrecht-S added active Somebody is working on it bug Something isn't working fixed The issue or PR was fixed. Platform: X11 platform specific (Unix/X11)) waiting for confirmation waiting for someone's confirmation labels Nov 14, 2021
@Albrecht-S
Copy link
Member

FTR: I updated my branch and opened a Pull Request (#292) which will automatically close this issue when merged.

@erco77 I'm looking forward to your test results and I'm ready to merge the PR if it works for you. TIA.

@erco77
Copy link
Contributor Author

erco77 commented Nov 16, 2021

Working on it now..

@Albrecht-S
Copy link
Member

BTW: I built and ran it today on macOS w/o issues.

@erco77
Copy link
Contributor Author

erco77 commented Nov 16, 2021

I can confirm your fix by doing these steps on an Ubuntu 20.x machine:

    1. Configure fltk using:
    
        ./configure --enable-localjpeg \
                    --enable-localpng \
                    --enable-localzlib
    2. Build fltk: make -j 4
    3. Go into the test/ directory and rebuild native-filechooser, replacing these link line flags:
    -L../lib -lfltk_jpeg -lfltk_png -lfltk_z 
..with instead: 
    ../lib/libfltk_jpeg.a ../lib/libfltk_png.a ../lib/libfltk_z.a

I did have to add the -no-pie flag to both compile+link to prevent some weird errors about relocation.

Doing that with your branch, I could run native-filechooser and browse files without a crash.

Doing the same steps with the current version of FLTK 1.4.x would immediately crash with the following error when I hit the "Pick File" button:

(native-filechooser:267519): Gtk-WARNING **: 16:38:29.631: Could not load a pixbuf from icon theme.
This may indicate that pixbuf loaders or the mime database could not be found.
**
Gtk:ERROR:../../../../gtk/gtkiconhelper.c:494:ensure_surface_for_gicon: assertion failed (error == NULL): ...lots of details trimmed...
Abort (core dumped)

So a tentative thumbs up from me.
I had a different, Sci Linux 7 machine when I opened this issue -- that machine's mobo died, so I no longer have easy access to testing on that same OS, but I might be able to soon, if that helps.

@Albrecht-S
Copy link
Member

Thanks for your test and the report. However, I'm wondering...

  1. I wouldn't have expected this error to manifest on an Ubuntu 20.x machine at all. My Linux Mint is based on Ubuntu 20.04 and doesn't exhibit the error.

  2. Why did you do step 3 of your report? The normal linking should have done exactly the same by finding the local libs (even with the -lfltk_xxx linker flags).

I'm just puzzled.

I know that I could reproduce the issue (i.e. error with old master and no error with my branch) on my Sci Linux (Docker) system, hence I was pretty sure it worked anyway but I'm a little surprised by your report.

I believe we can merge the branch and you don't need to test on an old Sci Linux system. It would have been nice to have but if you don't need to (re)build such an old system anyway then don't bother. I'm now more interested that the branch works on any system w/o issues and I think this is now clear.

My take on it is to merge it tomorrow. Any concerns?

@Albrecht-S
Copy link
Member

Albrecht-S commented Nov 16, 2021

Clarification: on my system the master branch doesn't crash when running test/native-filechooser but it does show an error: the 'Preview' and 'Show hidden files' buttons check marks don't appear whereas everything is fine with the new branch.

So yes, the error manifests on Linux Mint 20 as well although it doesn't crash the application.

@erco77
Copy link
Contributor Author

erco77 commented Nov 16, 2021

I did step 3 because it's what I do in my commercial builds. It's a practice I've done for a looong time, so perhaps it's dated. But it ensures no .so files accidentally picked up (by whatever means, e.g. in case both are present in the lib dir), and being explicit ensures the build fails if the .a files are not present.

Well, then it's interesting I get the crash with 1.4.x and not your branch; perhaps it's crashing in 1.4.x for another reason then.

In case you can replicate, here's the compile/link commands I used for both 1.4.x and your branch.. just a modified version of the default compile/link commands.

g++ -I..  -I../png -I../jpeg -I../zlib \
    -Os -Wall -Wunused -Wno-format-y2k  \
    -fno-exceptions -fno-strict-aliasing -ffunction-sections -fdata-sections \
    -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_THREAD_SAFE -D_REENTRANT \
    -I/usr/include/uuid -I/usr/include/freetype2 \
    -I/usr/include/uuid \
    -I/usr/include/freetype2 \
    -c native-filechooser.cxx -o native-filechooser.o
g++  -Os -Wall -Wunused -Wno-format-y2k  \
     -fno-exceptions -fno-strict-aliasing -ffunction-sections -fdata-sections \
     -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_THREAD_SAFE -D_REENTRANT \
     -I/usr/include/uuid -I/usr/include/freetype2 \
     -I/usr/include/uuid -I/usr/include/freetype2 \
     -Os -Wall -Wunused -Wno-format-y2k  -fno-exceptions -fno-strict-aliasing \
     -ffunction-sections -fdata-sections   native-filechooser.o -o native-filechooser \
     ../lib/libfltk_images.a \
     ../lib/libfltk.a \
     ../lib/libfltk_jpeg.a \
     ../lib/libfltk_png.a \
     ../lib/libfltk_z.a \
     -lXrender -lXcursor -lXfixes -lXft -lfontconfig -pthread -lpthread -ldl -lm  -lX11 

Notes for the above; starting with the default fltk build commands, I then:

  1. Replaced:
    -L../lib -lfltk_jpeg -lfltk_png -lfltk_z
    With:
    ../lib/libfltk_jpeg.a ../lib/libfltk_png.a ../lib/libfltk_z.a
  2. Removed all references to -I/usr/include/libpng16 because I wasn't sure what that was about.
  3. The redundant -I's for uuid+freetype2 in the link stage are in my fltk's build output, not sure why. But I kept them in place. The linker surely ignores the -I flags anyway.

@erco77
Copy link
Contributor Author

erco77 commented Nov 16, 2021

BTW, I just cloned fltk-1.4.x current in case mine was mid-development or something. Redid the experiment, same results: crash in newly cloned 1.4.x with the above build commands, no crash with those build commands with your branch.

Curious if you can replicate the crash in one and not the other.

@erco77
Copy link
Contributor Author

erco77 commented Nov 16, 2021

FWIW, here's the gdb output of the crash: 1.4.x-current, configured with local png/jpeg/zlib + debugging enabled: gdb-1.4.x-crash.txt

There were quite a few lines I removed that had ??'s in place of function names in the gtk part of the calling hierarchy, which is why the stack numbers have gaps.

@Albrecht-S
Copy link
Member

I can replicate the error (Gtk-WARNING) with 1.4 master but it doesn't crash. I used your compile and link commands but had to add -lXext -lXinerama. I can see a similar error message when I click 'Pick File' or 'Pick Dir':

$ ./native-filechooser 

(native-filechooser:227235): Gtk-WARNING **: 04:47:29.786: Could not load a pixbuf from icon theme.
This may indicate that pixbuf loaders or the mime database could not be found.

(that's all, nothing else) but the only noticeable effect in the running program is the missing check marks. I suspect these check marks (white on green background) are the icons "from icon theme" (see error message). The Gtk:ERROR .. message seems to be a follow-up message. Maybe your system is built with debug messages in GTK libs enabled which causes the assert failure whereas mine is not. Or something like that.

For me it doesn't matter if the application really crashes or not (this may be caused by internal differences). It exhibits an error which it does not if built with the new branch. And it's clear that it has to do with the incompatibilities of the image libraries. Winfried also confirmed in #289 that symbol prefixing (the new branch) fixes the crash in his environment.

Good to go?

@Albrecht-S
Copy link
Member

Thanks for the stack dump, but it doesn't really help, does it? It shows what we knew already: there's an assert failure which causes an abort.

@erco77
Copy link
Contributor Author

erco77 commented Nov 16, 2021

Sounds good to me, go.

Yes, the assert that crashes might be due to the presence of gtk development libs, though it's interesting that doesn't get triggered with your branch.

That's a scary assert; it's the kind of thing people would run into right when they're saving their work - ugh.

At some point I'll try to rebuild my commercial branch on Sci Linux 7, and test with your mods, because I kinda have to for consistency for my build requirements. I have that OS installed on an external USB drive, so I should be able to bring it up on a similar hardware.

If I ever run into trouble, I'll open a new issue later.

@Albrecht-S
Copy link
Member

Sounds good to me, go.

OK, I'll merge later today.

Yes, the assert that crashes might be due to the presence of gtk development libs, though it's interesting that doesn't get triggered with your branch.

Why is this "interesting"? That's the purpose of the modifications, it doesn't happen because "we" are using "our" image libs (with different, prefixed function names) and the GTK libs can now use the system (shared) libs as required.

At some point I'll try to rebuild my commercial branch on Sci Linux 7 ...

Please let me know if it works.

Albrecht-S pushed a commit that referenced this issue Nov 16, 2021
Prefix bundled libs. This fixes issues #232 and #289 and STR 3514 (https://www.fltk.org/str.php?L3514).
Parts of this fix are based on the work of GitHub user @darealshinji who provided instructions to create the jpeg header file with prefixes in STR 3347 (https://www.fltk.org/str.php?L3347). Thanks.
@Albrecht-S
Copy link
Member

Merged in commit 796a9bf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
active Somebody is working on it bug Something isn't working fixed The issue or PR was fixed. Platform: X11 platform specific (Unix/X11)) waiting for confirmation waiting for someone's confirmation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants