Corrupted graphite stream #387

Closed
slayer opened this Issue Jan 17, 2014 · 6 comments

Comments

Projects
None yet
4 participants
@slayer

slayer commented Jan 17, 2014

Hi

I have experienced strange issue - corrupted data in communication from statsd to graphite:


    master graphite-test /opt/statsd # tcpick -i lo -yP -C "port 2003"
    Starting tcpick 0.2.1 at 2014-01-17 03:47 UTC
    Timeout for connections is 600
    tcpick: listening on lo
    setting filter: "port 2003"
    1      SYN-SENT       127.0.0.1:60085 > 127.0.0.1:cfinger
    1      SYN-RECEIVED   127.0.0.1:60085 > 127.0.0.1:cfinger
    1      ESTABLISHED    127.0.0.1:60085 > 127.0.0.1:cfinger
    stats.statsd.bad_lines_seen 0 1389926872
    stats_counts.statsd.bad_lines_seen 0 1389926872
    stats.statsd.packets_received 0 1389926872
    ---- <many good lines> ----
    stats_counts.statsd.packets_received 0 1389926872
    stats.server1.worker2.requests 0 1389926872
    stats.server1.worker2.src1.op.........................................................................................
    ---- <many dots> ---
    ............................................................................................       ...B...B...F.T....R..q"........................
    ....................................E..4.W@.@.mj............;4. .W&=.....(.....
    .=fb.=fb...............................................................
    9926872
    stats.server1.worker2.src3.op_fail 0 1389926872
    stats_counts.server1.worker2.src3.op_fail 0 1389926872
    stats.server1.worker2.src3.op_success 0 1389926872
    stats_counts.server1.worker2.src3.op_success 0 1389926872

where '.' is 0x00 byte

  master graphite-test /opt/statsd # node -v
  v0.10.24

statsd is latest master

Do you have any ideas?

@mrtazz

This comment has been minimized.

Show comment Hide comment
@mrtazz

mrtazz Apr 14, 2014

Member

I haven't seen this before, can you give a bit more detail about the environment you run statsd in?

Member

mrtazz commented Apr 14, 2014

I haven't seen this before, can you give a bit more detail about the environment you run statsd in?

@amosshapira

This comment has been minimized.

Show comment Hide comment
@amosshapira

amosshapira Jan 15, 2015

I see the same here - sniffing traffic between statsd and graphite using socat:

socat -v -d -d -lf/root/socat.log TCP4-LISTEN:2000,bind=0.0.0.0,reuseaddr,fork TCP4-CONNECT:localhost:2003,bind=0.0.0.0
> 2015/01/15 00:02:35.979062  length=586 from=0 to=585
stats.statsd.bad_lines_seen 0 1421280156
stats_counts.statsd.bad_lines_seen 0 1421280156
stats.statsd.packets_received 0 1421280156
stats_counts.statsd.packets_received 0 1421280156
stats.gauges.statsd.timestamp_lag 0 1421280156
statsd.numStats 3 1421280156
stats.statsd.graphiteStats.calculationtime 0 1421280156
stats.statsd.processing_time 0 1421280156
stats.statsd.graphiteStats.last_exception 1421279195 1421280156
stats.statsd.graphiteStats.last_flush 1421280146 1421280156
stats.statsd.graphiteStats.flush_time 1 1421280156
stats.statsd.graphiteStats.flush_length 586 1421280156
...

Notice the stats_counts.statsd.packets_received 0 1421280156 even though I keep sending packets to UDP port 1825 using:

echo "sample.gauge:12|g" | nc -u -w0 127.0.0.1 8125

nodejs v0.10.25 on Ubuntu 14.04 LTS on EC2. iptables is disabled and all traffic is over loopback.

I tried this both with the git master branch and the latest release 0.7.2

The whisper file for "sample.gauge" isn't created. sometimes (pretty rarely) statsd picks up the data and sends it over and causes graphite to create the file, but it seems to happen in about <10% of the times I tried.

I see the same here - sniffing traffic between statsd and graphite using socat:

socat -v -d -d -lf/root/socat.log TCP4-LISTEN:2000,bind=0.0.0.0,reuseaddr,fork TCP4-CONNECT:localhost:2003,bind=0.0.0.0
> 2015/01/15 00:02:35.979062  length=586 from=0 to=585
stats.statsd.bad_lines_seen 0 1421280156
stats_counts.statsd.bad_lines_seen 0 1421280156
stats.statsd.packets_received 0 1421280156
stats_counts.statsd.packets_received 0 1421280156
stats.gauges.statsd.timestamp_lag 0 1421280156
statsd.numStats 3 1421280156
stats.statsd.graphiteStats.calculationtime 0 1421280156
stats.statsd.processing_time 0 1421280156
stats.statsd.graphiteStats.last_exception 1421279195 1421280156
stats.statsd.graphiteStats.last_flush 1421280146 1421280156
stats.statsd.graphiteStats.flush_time 1 1421280156
stats.statsd.graphiteStats.flush_length 586 1421280156
...

Notice the stats_counts.statsd.packets_received 0 1421280156 even though I keep sending packets to UDP port 1825 using:

echo "sample.gauge:12|g" | nc -u -w0 127.0.0.1 8125

nodejs v0.10.25 on Ubuntu 14.04 LTS on EC2. iptables is disabled and all traffic is over loopback.

I tried this both with the git master branch and the latest release 0.7.2

The whisper file for "sample.gauge" isn't created. sometimes (pretty rarely) statsd picks up the data and sends it over and causes graphite to create the file, but it seems to happen in about <10% of the times I tried.

@mrtazz mrtazz added the Bug label Jan 22, 2015

@wincus

This comment has been minimized.

Show comment Hide comment
@wincus

wincus Jun 26, 2015

Hi are hitting the same problem. Statsd sends random data from time to time to graphite backend.
node v0.10.38 on Ubuntu 14.04.1 LTS

Any update/workaround on this will be much appreciated!

wincus commented Jun 26, 2015

Hi are hitting the same problem. Statsd sends random data from time to time to graphite backend.
node v0.10.38 on Ubuntu 14.04.1 LTS

Any update/workaround on this will be much appreciated!

@amosshapira

This comment has been minimized.

Show comment Hide comment
@amosshapira

amosshapira Jun 27, 2015

@wincus I later discovered that the "socat" or "netcat" version I used for testing has a bug which makes it send extra characters in debug mode, once I switched to another implementation things worked correctly.

@wincus I later discovered that the "socat" or "netcat" version I used for testing has a bug which makes it send extra characters in debug mode, once I switched to another implementation things worked correctly.

@wincus

This comment has been minimized.

Show comment Hide comment
@wincus

wincus Jun 29, 2015

Thanks @amosshapira for the update. I am using tcpick for the network traffic.

Even when I didnt find any explanation to the noise sent to the graphite backend, we found the solution to our problem.

You see, our backend showed blanks in regular intervals, and I thought it was related with the noise.
But actually was due an old server that was still running on the network with the configuration option deleteIdleStats false , and as consequence was sending "0" even when wasn't getting the real values.

wincus commented Jun 29, 2015

Thanks @amosshapira for the update. I am using tcpick for the network traffic.

Even when I didnt find any explanation to the noise sent to the graphite backend, we found the solution to our problem.

You see, our backend showed blanks in regular intervals, and I thought it was related with the noise.
But actually was due an old server that was still running on the network with the configuration option deleteIdleStats false , and as consequence was sending "0" even when wasn't getting the real values.

@mrtazz

This comment has been minimized.

Show comment Hide comment
@mrtazz

mrtazz Jul 6, 2015

Member

Sorry for the delay on this but I'm glad to hear this has cleared up :)

Member

mrtazz commented Jul 6, 2015

Sorry for the delay on this but I'm glad to hear this has cleared up :)

@mrtazz mrtazz closed this Jul 6, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment