FastCGI mapserv gets unexpected signal 11 when apache attempts to shutdown the process #5254

Open
akrherz opened this Issue Feb 23, 2016 · 7 comments

Projects

None yet

3 participants

@akrherz
Contributor
akrherz commented Feb 23, 2016

Using a recent mapserv with FastCGI (v2.4.0) support, I find the following logged when Apache attempts to shutdown the process when it has reached its maximum request lifetime (ie FcgidMaxRequestsPerProcess 10000).

Feb 23 10:07:39 xxx httpd[5156]: [fcgid:error] [pid 5156] mod_fcgid: process /xxx/mapserv.fcgi(24131) exit(lifetime expired), get unexpected signal 11

I assume this means an 'unclean' shutdown. I don't see any bad impacts from this, but it would be nice to resolve to keep the logs clean :) I am at a loss on how to properly debug this and provide you with some more to go on. I can easily reproduce this on my development system, so I'd be happy to do whatever you need to provide more detail!

PS, here's my question on the forums email list a few years back.

@tbonfort
Member

I tried to quickly reproduce this by setting FcgidMaxRequestsPerProcess to 2, but was not able to get that error message. Try compiling a debug version of the mapserv binary and configuring apache to dump a core file so we can at least have a backtrace

@akrherz
Contributor
akrherz commented Feb 24, 2016

Hi @tbonfort , thanks so much for your continued help. So I attempted to make a debug build cmake .. -DCMAKE_BUILD_TYPE=Debug and I can't reproduce with that build :( Apache's error log simply fills with

In msCleanupOnSignal.
In msCleanupOnSignal.
In msCleanupOnSignal.
In msCleanupOnSignal.

which is expected. Without the debug build, I am also struggling to get apache to dump cores. Maybe the signal 11 is some other issue? I'll keep after it and try to give you something to work off of. Thanks for your patience.

@akrherz
Contributor
akrherz commented Feb 24, 2016

A miracle occurred and it dropped a core :)

# gdb mapserv.fcgi core.10723 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/akrherz/projects/iem/cgi-bin/mapserv/mapserv.fcgi...done.
[New LWP 10723]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/mesonet/www/apps/iemwebsite/cgi-bin/mapserv/mapserv.fcgi'.
Program terminated with signal 11, Segmentation fault.
#0  __GI_getenv (name=0x7fce5a2f4645 "L_DEBUG") at getenv.c:89
89        if (name_start == ep_start && !strncmp (*ep + 2, name, len)
(gdb) bt full
#0  __GI_getenv (name=0x7fce5a2f4645 "L_DEBUG") at getenv.c:89
        ep_start = <error reading variable ep_start (Cannot access memory at address 0x4804370)>
        len = 7
        ep = 0x20f2fc8
        name_start = 20547
#1  0x00007fce59f08536 in CPLGetConfigOption (pszKey=0x7fce5a2f4643 "CPL_DEBUG", pszDefault=0x0) at cpl_conv.cpp:1569
        pszResult = 0x0
        papszTLConfigOptions = <optimized out>
#2  0x00007fce59f0b0c2 in CPLDebug (pszCategory=0x7fce5a2cf513 "GDAL", pszFormat=0x7fce5a2f607d "GDALClose(%s, this=%p)") at cpl_error.cpp:383
        psCtx = 0x20f1ae0
        pszMessage = <optimized out>
        args = {{gp_offset = 290455933, fp_offset = 0, overflow_arg_area = 0x7fce5a6d4c80, reg_save_area = 0x0}}
        pszDebug = <optimized out>
#3  0x00007fce59eda1b2 in GDALDataset::~GDALDataset (this=0x20f3db0, __in_chrg=<optimized out>) at gdaldataset.cpp:253
        i = <optimized out>
#4  0x00007fce59ce9495 in GTiffDataset::~GTiffDataset (this=0x20f3db0, __in_chrg=<optimized out>) at geotiff.cpp:3575
No locals.
#5  0x00007fce59ce9739 in GTiffDataset::~GTiffDataset (this=0x20f3db0, __in_chrg=<optimized out>) at geotiff.cpp:3579
No locals.
#6  0x00007fce59edb31c in GDALClose (hDS=0x20f3db0) at gdaldataset.cpp:2508
        poDS = 0x20f3db0
        oHolder = {hMutex = 0x20f14d0, pszFile = 0x7fce5a2f6094 "gdaldataset.cpp", nLine = 2488}
        oLocaleForcer = {pszOldLocale = 0x0}
#7  0x00007fce5c320622 in msGDALCleanup () at /home/akrherz/BUILD/mapserver/mapgdal.c:96
        i = 0
        pahDSList = 0x2091600
        nDSCount = 1
        bDidSomething = 0
        iRepeat = 5
#8  0x00007fce5c2a2d52 in msCleanup () at /home/akrherz/BUILD/mapserver/maputil.c:1944
No locals.
#9  0x000000000040126d in msCleanupOnSignal (nInData=15) at /home/akrherz/BUILD/mapserver/mapserv.c:56
No locals.
#10 <signal handler called>
No locals.
#11 0x00007fce58b6cf1e in __accept_nocancel () at ../sysdeps/unix/syscall-template.S:81
No locals.
#12 0x00007fce5bf479ea in OS_Accept (listen_sock=0, fail_on_intr=0, webServerAddrs=0x0) at os_unix.c:1175
        len = 112
        socket = -1
        sa = {un = {sun_family = 1537, 
            sun_path = "\000\001\000\000\000\000\200\251\v\003", '\000' <repeats 12 times>, "`\264\v\003", '\000' <repeats 12 times>, "\200\251\v\003\000\000\000\000\001\003\000\001\000\b\000\000\000\000\000\000\000\000\000\000\360\255\215=\376\177\000\000\060\060\017\002\000\000\000\000\340\222\017\002\000\000\000\000\b\244\024\\\316\177\000\000\220\256\215=\376\177\000\000\000\206e\016\217D"}, in = {sin_family = 1537, sin_port = 256, sin_addr = {s_addr = 0}, 
            sin_zero = "\200\251\v\003\000\000\000"}}
#13 0x00007fce5bf459ac in FCGX_Accept_r (reqDataPtr=0x7fce5c14a3e0 <the_request>) at fcgiapp.c:2196
No locals.
#14 0x00007fce5bf45ae5 in FCGX_Accept (in=in@entry=0x7ffe3d8dae78, out=out@entry=0x7ffe3d8dae80, err=err@entry=0x7ffe3d8dae88, envp=envp@entry=0x7ffe3d8dae90)
---Type <return> to continue, or q <return> to quit---
    at fcgiapp.c:2141
        rc = <optimized out>
#15 0x00007fce5bf45c6a in FCGI_Accept () at fcgi_stdio.c:140
        in = 0x7fce5c2c05a7 <msResetErrorList+13>
        out = 0x30b7840
        error = 0x20912b0
        envp = 0x7fce5c68f3e0 <ms_error.10688>
        acceptResult = <optimized out>
#16 0x000000000040174e in main (argc=1, argv=0x7ffe3d8db008) at /home/akrherz/BUILD/mapserver/mapserv.c:239
        iArg = 1
        sendheaders = 1
        execstarttime = {tv_sec = 0, tv_usec = 0}
        execendtime = {tv_sec = 1, tv_usec = 4200493}
        requeststarttime = {tv_sec = 1, tv_usec = 140729931116432}
        requestendtime = {tv_sec = 140729931116552, tv_usec = 140524284956864}
        mapserv = 0x20912b0
(gdb) 
@tbonfort tbonfort added a commit to tbonfort/mapserver that referenced this issue Feb 25, 2016
@tbonfort tbonfort Don't print message in fastcgi when exiting on signal (#5254) 5c99ea9
@tbonfort
Member

/Maybe/ the fact we were calling printf inside the signal handler caused an issue. Can you try out the referenced commit and report back ?

@akrherz
Contributor
akrherz commented Feb 25, 2016

My deleted comment spoke too soon, I reproduced the issue. Interestingly, it only happened on 4 exits out of ~410 total exits. Which is about the reproducability that I was seeing before. I feel like I am forgetting a bug in GDAL that I looked at a number of years back, I am getting old, uffties! This is GDAL 1.11.4, btw

@hschiebold

@akrherz Hi, we have the same issue here. Could you solve the problem with the mapserv.c version or is it a bug in GDAL? We have GDAL 1.11.3 at the moment.

@akrherz
Contributor
akrherz commented Mar 31, 2016

@hschiebold I have not found a solution to the problem yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment