Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProxySQL segfaults when configured with servers outside the hosts network on alpine linux #715

Closed
ryanschwartz opened this issue Oct 7, 2016 · 14 comments
Assignees

Comments

@ryanschwartz
Copy link

We run apps in alpine containers on Google Container Engine, which is essentially a hosted kubernetes infrastructure. We run would like to run proxysql inside these containers as a database proxy layer.

I've run into a problem where proxysql will run for ~10 seconds, then it segfaults, but the only time it does so is if there are servers in the mysql_servers configuration block that exist in other sub-networks. For example from a client at 10.128.0.2 running proxysql with servers 10.128.0.3 and 10.128.0.4, proxysql runs fine. When I add 10.140.0.3 as a server, proxysql segfaults after 10s.

I've confirmed network connectivity is allowed.

I've run proxysql under gdb and it seems that the segfaults occur in MySQL_Monitor.cpp at line 167 or 173:

Thread 8 "proxysql" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 145]
MySQL_Monitor_Connection_Pool::purge_idle_connections (this=0x555555aa0800) at MySQL_Monitor.cpp:167
167 MySQL_Monitor.cpp: No such file or directory.
(gdb) thread apply 8 bt

Thread 8 (LWP 145):
#0  MySQL_Monitor_Connection_Pool::purge_idle_connections (this=0x555555aa0800) at MySQL_Monitor.cpp:167
#1  0x00005555555ece9c in MySQL_Monitor::run (this=0x555555aa0740) at MySQL_Monitor.cpp:1465
#2  0x00007ffff7242c8a in execute_native_thread_routine () from /usr/lib/libstdc++.so.6
#3  0x00007ffff7dc44fc in ?? () from /lib/ld-musl-x86_64.so.1
#4  0x0000000000000000 in ?? ()
Thread 8 "proxysql" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 46]
MySQL_Monitor_Connection_Pool::purge_idle_connections (this=0x555555aa82c0) at MySQL_Monitor.cpp:173
173 MySQL_Monitor.cpp: No such file or directory.
(gdb) thread apply 8 bt

Thread 8 (LWP 46):
#0  MySQL_Monitor_Connection_Pool::purge_idle_connections (this=0x555555aa82c0) at MySQL_Monitor.cpp:173
#1  0x00005555555ece9c in MySQL_Monitor::run (this=0x555555aa8200) at MySQL_Monitor.cpp:1465
#2  0x00007ffff7242c8a in execute_native_thread_routine () from /usr/lib/libstdc++.so.6
#3  0x00007ffff7dc44fc in ?? () from /lib/ld-musl-x86_64.so.1
#4  0x0000000000000000 in ?? ()

The other common segfault occurs here:

Thread 8 "proxysql" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 121]
0x00007ffff7dc2bd8 in memcpy () from /lib/ld-musl-x86_64.so.1
(gdb) thread apply 8 bt

Thread 8 (LWP 121):
#0  0x00007ffff7dc2bd8 in memcpy () from /lib/ld-musl-x86_64.so.1
#1  0x00005555555ebb3c in memcpy (__n=8, __os=<optimized out>, __od=0x7ffff7e9a5b0) at /usr/include/fortify/string.h:51
#2  MySQL_Monitor_Connection_Pool::purge_idle_connections (this=0x555555aa0840) at MySQL_Monitor.cpp:173
#3  0x00005555555ece9c in MySQL_Monitor::run (this=0x555555aa0780) at MySQL_Monitor.cpp:1465
#4  0x00007ffff7242c8a in execute_native_thread_routine () from /usr/lib/libstdc++.so.6
#5  0x00007ffff7dc44fc in ?? () from /lib/ld-musl-x86_64.so.1
#6  0x0000000000000000 in ?? ()
Thread 8 "proxysql" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 170]
0x00007ffff7dc2bd8 in memcpy () from /lib/ld-musl-x86_64.so.1
(gdb) thread apply 8 bt

Thread 8 (LWP 170):
#0  0x00007ffff7dc2bd8 in memcpy () from /lib/ld-musl-x86_64.so.1
#1  0x00005555555ebb3c in memcpy (__n=8, __os=<optimized out>, __od=0x7ffff7e9a5b0) at /usr/include/fortify/string.h:51
#2  MySQL_Monitor_Connection_Pool::purge_idle_connections (this=0x555555aa0800) at MySQL_Monitor.cpp:173
#3  0x00005555555ece9c in MySQL_Monitor::run (this=0x555555aa0740) at MySQL_Monitor.cpp:1465
#4  0x00007ffff7242c8a in execute_native_thread_routine () from /usr/lib/libstdc++.so.6
#5  0x00007ffff7dc44fc in ?? () from /lib/ld-musl-x86_64.so.1
#6  0x0000000000000000 in ?? ()

I have confirmed that this behavior effects both 1.2.1 and 1.2.4.

I'm happy to provide what further information I can, including tcpdump captures if that would be helpful.

@renecannao
Copy link
Contributor

Hi Ryan,
Thank you for the report.
I will look into this as soon as possible. Do you have any workaround for now? For example, is disabling monitoring an option?
Thanks

@ryanschwartz
Copy link
Author

Our intention in using proxysql is as a companion to select the current master for writes, for our environment in which MHA manages the replication topology. In this use case, monitoring is definitely a requirement to manage the replication groups.

Current workaround is to not use proxysql and configure apps to direct connect to the database. In the event of a topology change, we would manually update application configs.

You make a good point about monitoring though. I checked to see if proxysql can connect to servers in non-local subnets at all, and I was able to update hostgroup IDs manually to select servers in two other subnets and get query results back, so this is likely scoped to monitoring.

renecannao added a commit that referenced this issue Oct 7, 2016
@renecannao
Copy link
Contributor

Sounds like disabling monitoring is really not an option.
I create a hot fix for this issue, and available in branch v1.2.4-715 .
It is still not clear to me what causes this issue: I will investigate and fix the root cause, but for now this bug fix should work.

Thanks,
René

@renecannao
Copy link
Contributor

@ryanschwartz : is it possible to have a core dump?
Thanks

@renecannao
Copy link
Contributor

@ryanschwartz , I revisited the code and I can't find any bug. And in fact, I believe 3ad6c2d won't solve the issue.
The portion of code where it crash is related to std::list , and this is the only place where lists are used: could be possible that this bug is strictly related to musl implementation?
Can you verify if this bug happen also with distros using glibc?
If this bug is happening only with musl, I can replace std::list with something simpler.

@ryanschwartz
Copy link
Author

Attaching a couple core dumps.
core.74.gz
core.8.gz

This does not appear to be an issue in non-alpine distributions, and I concur - 3ad6c2d does not resolve the crash.

Here's the Dockerfile I'm using to build my test container - we have a script that performs the same build steps in our other application containers.

FROM alpine

MAINTAINER Ryan Schwartz <ryan.schwartz@ingramcontent.com>

WORKDIR /tmp
RUN apk update && \
    apk add -t runtime-depends libgcc libstdc++ && \
    apk add -t build-depends build-base automake bzip2 patch git cmake openssl-dev libc6-compat && \
    apk add --no-cache -t edge-build-depends --repository http://dl-3.alpinelinux.org/alpine/edge/main libexecinfo-dev && \
    git clone https://github.com/sysown/proxysql.git && \
    cd proxysql && \
    git checkout v1.2.4-715 && \
    sed -i -e '/PROXYSQL_VERSION/s:1.2.4:1.2.4-715:g' include/proxysql.h && \
    NOJEMALLOC=1 make && \
    cp src/proxysql /usr/bin/proxysql && \
    apk del build-depends edge-build-depends && \
    cd && rm -rf /tmp/* /var/cache/apk/*

COPY proxysql.cnf /proxysql/proxysql.cnf
RUN mkdir -p /var/lib/proxysql
ENTRYPOINT ["proxysql", "-f", "-c", "/proxysql/proxysql.cnf"]

@ryanschwartz
Copy link
Author

@renecannao - can I provide any further information on this? We're in a holding pattern on deploying proxysql until we can use it in our alpine containers.

renecannao added a commit that referenced this issue Nov 29, 2016
@renecannao
Copy link
Contributor

@ryanschwartz - please try 1.2.5-715 branch .
It replaces std::list with PtrArray . Not sure it will solve you issue tho, as I am unable to reproduce yet.
Thank you.

renecannao added a commit that referenced this issue Nov 29, 2016
renecannao added a commit that referenced this issue Nov 29, 2016
renecannao added a commit that referenced this issue Nov 29, 2016
renecannao added a commit that referenced this issue Nov 29, 2016
@renecannao
Copy link
Contributor

@ryanschwartz : I think I finally narrowed this issue, and managed to reproduce it also without Alpine Linux.
Please try the latest 1.2.5-715 branch
Thanks

@ryanschwartz
Copy link
Author

1.2.5-715 would not build for me in alpine. Make fails here:

../lib/libproxysql.a(MySQL_Monitor.oo): In function `monitor_connect_pthread(void*)':
/tmp/proxysql/lib/MySQL_Monitor.cpp:284: undefined reference to `mallctl'
../lib/libproxysql.a(MySQL_Monitor.oo): In function `monitor_ping_pthread(void*)':
/tmp/proxysql/lib/MySQL_Monitor.cpp:291: undefined reference to `mallctl'
../lib/libproxysql.a(MySQL_Monitor.oo): In function `monitor_read_only_pthread(void*)':
/tmp/proxysql/lib/MySQL_Monitor.cpp:298: undefined reference to `mallctl'
../lib/libproxysql.a(MySQL_Monitor.oo): In function `monitor_replication_lag_pthread(void*)':
/tmp/proxysql/lib/MySQL_Monitor.cpp:305: undefined reference to `mallctl'
../lib/libproxysql.a(thread.oo): In function `Thread::start(bool)':
/tmp/proxysql/lib/thread.cpp:53: undefined reference to `mallctl'
collect2: error: ld returned 1 exit status
make[1]: *** [proxysql] Error 1
make: *** [build_src] Error 2
Makefile:66: recipe for target 'proxysql' failed
make[1]: Leaving directory '/tmp/proxysql/src'
Makefile:37: recipe for target 'build_src' failed

My updated Dockerfile is:

FROM alpine

MAINTAINER Ryan Schwartz <ryan.schwartz@ingramcontent.com>

WORKDIR /tmp
RUN apk update && \
    apk add -t runtime-depends libgcc libstdc++ && \
    apk add -t build-depends build-base automake bzip2 patch git cmake openssl-dev libc6-compat && \
    apk add --no-cache -t edge-build-depends --repository http://dl-3.alpinelinux.org/alpine/edge/main libexecinfo-dev && \
    git clone https://github.com/sysown/proxysql.git && \
    cd proxysql && \
    git checkout v1.2.5-715 && \
    sed -i -e '/PROXYSQL_VERSION/s:1.2.4:1.2.5-715:g' include/proxysql.h && \
    NOJEMALLOC=1 make && \
    cp src/proxysql /usr/bin/proxysql && \
    apk del build-depends edge-build-depends && \
    cd && rm -rf /tmp/* /var/cache/apk/*

COPY proxysql.cnf /proxysql/proxysql.cnf
RUN mkdir -p /var/lib/proxysql
ENTRYPOINT ["proxysql", "-f", "-c", "/proxysql/proxysql.cnf"]

@renecannao
Copy link
Contributor

Indeed, NOJEMALLOC=1 didn't work in 1.2.5 due to some partial ports from 1.3.0 .
The last commit b85ed48 in v1.2.5-715 completes the support.
Small note about your Dockerfile, should be:

sed -i -e '/PROXYSQL_VERSION/s:1.2.5:1.2.5-715:g' include/proxysql.h 

Instead of:

sed -i -e '/PROXYSQL_VERSION/s:1.2.4:1.2.5-715:g' include/proxysql.h

@ryanschwartz
Copy link
Author

Thanks for catching that sed problem - missed it in the copy/paste.

So far, 1.2.5-715 looks good, no segfaults. Thank you very much. 🎉 I will see how things go for the next few days and report back if anything pops up.

@renecannao
Copy link
Contributor

Thank you!

minichate pushed a commit to minichate/proxysql that referenced this issue Mar 6, 2017
@renecannao
Copy link
Contributor

Fixed long time ago

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants