Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

occasional buffer overflows #17

Closed
nareshov opened this issue Sep 17, 2014 · 7 comments
Closed

occasional buffer overflows #17

nareshov opened this issue Sep 17, 2014 · 7 comments

Comments

@nareshov
Copy link

Here's a trace from upstart logs:

[2014-09-17 10:54:56] failed to write() to 127.0.0.1:2013: Broken pipe
*** buffer overflow detected ***: /usr/local/bin/relay terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7fe2dc89c287]
/lib/x86_64-linux-gnu/libc.so.6(+0x10a180)[0x7fe2dc89b180]
/lib/x86_64-linux-gnu/libc.so.6(+0x10b23e)[0x7fe2dc89c23e]
/usr/local/bin/relay[0x40886c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fe2dcb58e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe2dc88573d]
======= Memory map: ========
00400000-0040e000 r-xp 00000000 09:02 1185849                            /usr/local/bin/relay
0060d000-0060e000 r--p 0000d000 09:02 1185849                            /usr/local/bin/relay
0060e000-0060f000 rw-p 0000e000 09:02 1185849                            /usr/local/bin/relay
02018000-02039000 rw-p 00000000 00:00 0                                  [heap]
7fe29c000000-7fe29c021000 rw-p 00000000 00:00 0 
7fe29c021000-7fe2a0000000 ---p 00000000 00:00 0 
7fe2a4000000-7fe2a4021000 rw-p 00000000 00:00 0 
7fe2a4021000-7fe2a8000000 ---p 00000000 00:00 0 
7fe2a8000000-7fe2a8021000 rw-p 00000000 00:00 0 
7fe2a8021000-7fe2ac000000 ---p 00000000 00:00 0 
7fe2adeef000-7fe2b0000000 rw-p 00000000 00:00 0 
7fe2b0000000-7fe2b007b000 rw-p 00000000 00:00 0 
7fe2b007b000-7fe2b4000000 ---p 00000000 00:00 0 
7fe2b4000000-7fe2b407c000 rw-p 00000000 00:00 0 
7fe2b407c000-7fe2b8000000 ---p 00000000 00:00 0 
7fe2b8000000-7fe2b807d000 rw-p 00000000 00:00 0 
7fe2b807d000-7fe2bc000000 ---p 00000000 00:00 0 
7fe2bc000000-7fe2bc07a000 rw-p 00000000 00:00 0 
7fe2bc07a000-7fe2c0000000 ---p 00000000 00:00 0 
7fe2c0000000-7fe2c007c000 rw-p 00000000 00:00 0 
7fe2c007c000-7fe2c4000000 ---p 00000000 00:00 0 
7fe2c4000000-7fe2c4087000 rw-p 00000000 00:00 0 
7fe2c4087000-7fe2c8000000 ---p 00000000 00:00 0 
7fe2c8000000-7fe2c807a000 rw-p 00000000 00:00 0 
7fe2c807a000-7fe2cc000000 ---p 00000000 00:00 0 
7fe2cc000000-7fe2cc079000 rw-p 00000000 00:00 0 
7fe2cc079000-7fe2d0000000 ---p 00000000 00:00 0 
7fe2d0000000-7fe2d10a9000 rw-p 00000000 00:00 0 
7fe2d10a9000-7fe2d4000000 ---p 00000000 00:00 0 
7fe2d5953000-7fe2d5968000 r-xp 00000000 09:02 9175092                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe2d5968000-7fe2d5b67000 ---p 00015000 09:02 9175092                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe2d5b67000-7fe2d5b68000 r--p 00014000 09:02 9175092                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe2d5b68000-7fe2d5b69000 rw-p 00015000 09:02 9175092                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fe2d5b69000-7fe2d5b6a000 ---p 00000000 00:00 0 
7fe2d5b6a000-7fe2d636a000 rw-p 00000000 00:00 0 
7fe2d636a000-7fe2d636b000 ---p 00000000 00:00 0 
7fe2d636b000-7fe2d6b6b000 rw-p 00000000 00:00 0 
7fe2d6b6b000-7fe2d6b6c000 ---p 00000000 00:00 0 
7fe2d6b6c000-7fe2d736c000 rw-p 00000000 00:00 0 
7fe2d736c000-7fe2d736d000 ---p 00000000 00:00 0 
7fe2d736d000-7fe2d7b6d000 rw-p 00000000 00:00 0 
7fe2d7b6d000-7fe2d7b6e000 ---p 00000000 00:00 0 
7fe2d7b6e000-7fe2d836e000 rw-p 00000000 00:00 0 
7fe2d836e000-7fe2d836f000 ---p 00000000 00:00 0 
7fe2d836f000-7fe2d8b6f000 rw-p 00000000 00:00 0 
7fe2d8b6f000-7fe2d8b70000 ---p 00000000 00:00 0 
7fe2d8b70000-7fe2d9370000 rw-p 00000000 00:00 0 
7fe2d9370000-7fe2d9371000 ---p 00000000 00:00 0 
7fe2d9371000-7fe2d9b71000 rw-p 00000000 00:00 0 
7fe2d9b71000-7fe2d9b72000 ---p 00000000 00:00 0 
7fe2d9b72000-7fe2da372000 rw-p 00000000 00:00 0 
7fe2da372000-7fe2da373000 ---p 00000000 00:00 0 
7fe2da373000-7fe2dab73000 rw-p 00000000 00:00 0 
7fe2dab73000-7fe2dab74000 ---p 00000000 00:00 0 
7fe2dab74000-7fe2db374000 rw-p 00000000 00:00 0 
7fe2db374000-7fe2db375000 ---p 00000000 00:00 0 
7fe2db375000-7fe2dbb75000 rw-p 00000000 00:00 0 
7fe2dbb75000-7fe2dbb76000 ---p 00000000 00:00 0 
7fe2dbb76000-7fe2dc376000 rw-p 00000000 00:00 0 
7fe2dc376000-7fe2dc38c000 r-xp 00000000 09:02 9175268                    /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7fe2dc38c000-7fe2dc58b000 ---p 00016000 09:02 9175268                    /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7fe2dc58b000-7fe2dc58c000 r--p 00015000 09:02 9175268                    /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7fe2dc58c000-7fe2dc58d000 rw-p 00016000 09:02 9175268                    /lib/x86_64-linux-gnu/libz.so.1.2.3.4
7fe2dc58d000-7fe2dc58f000 r-xp 00000000 09:02 9178486                    /lib/x86_64-linux-gnu/libdl-2.15.so
7fe2dc58f000-7fe2dc78f000 ---p 00002000 09:02 9178486                    /lib/x86_64-linux-gnu/libdl-2.15.so
7fe2dc78f000-7fe2dc790000 r--p 00002000 09:02 9178486                    /lib/x86_64-linux-gnu/libdl-2.15.so
7fe2dc790000-7fe2dc791000 rw-p 00003000 09:02 9178486                    /lib/x86_64-linux-gnu/libdl-2.15.so
7fe2dc791000-7fe2dc946000 r-xp 00000000 09:02 9178488                    /lib/x86_64-linux-gnu/libc-2.15.so
7fe2dc946000-7fe2dcb46000 ---p 001b5000 09:02 9178488                    /lib/x86_64-linux-gnu/libc-2.15.so
7fe2dcb46000-7fe2dcb4a000 r--p 001b5000 09:02 9178488                    /lib/x86_64-linux-gnu/libc-2.15.so
7fe2dcb4a000-7fe2dcb4c000 rw-p 001b9000 09:02 9178488                    /lib/x86_64-linux-gnu/libc-2.15.so
7fe2dcb4c000-7fe2dcb51000 rw-p 00000000 00:00 0 
7fe2dcb51000-7fe2dcb69000 r-xp 00000000 09:02 9178479                    /lib/x86_64-linux-gnu/libpthread-2.15.so
7fe2dcb69000-7fe2dcd68000 ---p 00018000 09:02 9178479                    /lib/x86_64-linux-gnu/libpthread-2.15.so
7fe2dcd68000-7fe2dcd69000 r--p 00017000 09:02 9178479                    /lib/x86_64-linux-gnu/libpthread-2.15.so
7fe2dcd69000-7fe2dcd6a000 rw-p 00018000 09:02 9178479                    /lib/x86_64-linux-gnu/libpthread-2.15.so
7fe2dcd6a000-7fe2dcd6e000 rw-p 00000000 00:00 0 
7fe2dcd6e000-7fe2dcf1f000 r-xp 00000000 09:02 9178635                    /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fe2dcf1f000-7fe2dd11f000 ---p 001b1000 09:02 9178635                    /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fe2dd11f000-7fe2dd13a000 r--p 001b1000 09:02 9178635                    /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fe2dd13a000-7fe2dd145000 rw-p 001cc000 09:02 9178635                    /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fe2dd145000-7fe2dd149000 rw-p 00000000 00:00 0 
7fe2dd149000-7fe2dd16b000 r-xp 00000000 09:02 9178480                    /lib/x86_64-linux-gnu/ld-2.15.so
7fe2dd2fc000-7fe2dd362000 rw-p 00000000 00:00 0 
7fe2dd367000-7fe2dd36b000 rw-p 00000000 00:00 0 
7fe2dd36b000-7fe2dd36c000 r--p 00022000 09:02 9178480                    /lib/x86_64-linux-gnu/ld-2.15.so
7fe2dd36c000-7fe2dd36e000 rw-p 00023000 09:02 9178480                    /lib/x86_64-linux-gnu/ld-2.15.so
7fff93756000-7fff93777000 rw-p 00000000 00:00 0                          [stack]
7fff937ff000-7fff93800000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
[2014-09-17 10:55:01] starting carbon-c-relay v0.33 (02696d)
configuration:
    relay hostname = IAD01-GRAPHITE02.INTERNAL.NET
    listen port = 2003
    workers = 8
    send batch size = 2500
    server queue size = 25000
    routes configuration = /etc/relay.conf

parsed configuration follows:
cluster all
    fnv1a_ch replication 1
        127.0.0.1:2013
        127.0.0.1:2113
    ;

match *
    send to all
    stop
    ;

listening on tcp4 0.0.0.0 port 2003
listening on UNIX socket /tmp/.s.carbon-c-relay.2003
starting 8 workers
starting statistics collector
[2014-09-17 10:55:01] failed to connect() to 127.0.0.1:2013: Connection refused
[2014-09-17 10:55:02] server 127.0.0.1:2013: OK

What other information can I provide?

@grobian
Copy link
Owner

grobian commented Sep 17, 2014

Please try this with (the just released) v0.34. I fixed a typo yesterday, which could very well cause this problem.

@nareshov
Copy link
Author

Looks good. Thanks.

@nareshov
Copy link
Author

Whenever I restart the backend carbon-cache twistd process, I see a broken pipe message (understandable), but sometimes, I see a "uncomplete write" like this and then a bunch of metrics gets dropped like this:

[2014-09-17 11:07:54] failed to write() to 127.0.0.1:2013: Broken pipe
[2014-09-17 11:07:59] server 127.0.0.1:2013: OK
[2014-09-17 11:08:03] failed to write() to 127.0.0.1:2113: Broken pipe
[2014-09-17 11:08:09] server 127.0.0.1:2113: OK
[2014-09-17 11:10:29] failed to write() to 127.0.0.1:2013: uncomplete write
server 127.0.0.1:2013: dropping metric: SIN01-ACCEL14.rpcserver.function.executed_per_sec.CacheService.del_key 0 1410951930
server 127.0.0.1:2013: dropping metric: SIN01-ACCEL14.rpcserver.function.executed_per_sec.CacheService.del_key 0 1410951931

until it recovers this way:

server 127.0.0.1:2113: dropping metric: MOW02-ACCEL03.XXXetch.upstream.requests_per_sec 17 1410951398
server 127.0.0.1:2113: dropping metric: LIN02-ACCEL01.originfetch.XXX_health_client.rtt_ms.IAD01-ACCEL11.max 115 1410951562
server 127.0.0.1:2113: dropping metric: LIN02-ACCEL01.originfetch.XXX_health_client.errors_per_sec.SJC01-ACCEL09 0 1410951552
[2014-09-17 11:10:36] server 127.0.0.1:2113: OK
[2014-09-17 11:11:40] failed to write() to 127.0.0.1:2013: uncomplete write
[2014-09-17 11:11:41] server 127.0.0.1:2013: OK

Is there something I should try tuning in my config? This is how I'm currently invoking relay:

/usr/local/bin/relay -f /etc/relay.conf -w 8

with the config pasted in the original post.

@grobian
Copy link
Owner

grobian commented Sep 17, 2014

Have a look at carbon.relays.IAD01-GRAPHITE02_INTERNAL_NET.metricsQeued
If that's constantly large, the relay has problems pushing data to the servers fast enough. You could increase the queuesize to give you more room once a restart happens, but you should monitor closely if it is able to catch up (shrink the queue).

@thardie
Copy link

thardie commented Apr 24, 2015

I work with nareshov - We are seeing this same crash again. I updated to the latest version a few minutes ago.

@grobian
Copy link
Owner

grobian commented Apr 27, 2015

would it be possible to provide me a (gdb) stacktrace?

@grobian grobian reopened this Apr 27, 2015
@grobian
Copy link
Owner

grobian commented Apr 27, 2015

we'll continue in issue #62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants