Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of range #565

Closed
pfischermx opened this issue May 21, 2014 · 16 comments
Closed

Index out of range #565

pfischermx opened this issue May 21, 2014 · 16 comments
Milestone

Comments

@pfischermx
Copy link

First off, thanks so much for this wonderful tool. I've been using it over the last days integrated with gmond (ganglia)+grafana.

Now, today morning I was going to check grafana and noted it was having problem in connecting to influxdb. After checking influxdb I noted it was not running and it was failing to start.

I'm using 0.6.5.

When I try to start it I get:

[05/21/14 18:07:36] [INFO] Loading configuration file config.toml

+---------------------------------------------+
| _____ __ _ _____ ____ |
| |_ | / | | | __ | _ \ |
| | | _ __ | |
| |
___ | | | | |) | |
| | | | '_ | | | | | \ / / | | | _ < |
| | || | | | | | | |
| |> <| || | |_) | |
| |
|| ||| ||,//_____/|____/ |
+---------------------------------------------+

panic: runtime error: index out of range

goroutine 23 [running]:
runtime.panic(0x88d2e0, 0x1012e97)
/home/vagrant/bin/go/src/pkg/runtime/panic.c:266 +0xb6
cluster.(_ClusterConfiguration).GetShardToWriteToBySeriesAndTime(0xc21006dc40, 0xc210000718, 0x7, 0x0, 0x0, ...)
/home/vagrant/influxdb/src/cluster/cluster_configuration.go:660 +0x583
coordinator.(_CoordinatorImpl).CommitSeriesData(0xc2100cd860, 0xc210000718, 0x7, 0xc210000080, 0x1, ...)
/home/vagrant/influxdb/src/coordinator/coordinator.go:644 +0x299
coordinator.(_CoordinatorImpl).WriteSeriesData(0xc2100cd860, 0x7f37f9ac1c00, 0xc210070e40, 0xc210000718, 0x7, ...)
/home/vagrant/influxdb/src/coordinator/coordinator.go:480 +0x13f
api/udp.(_Server).HandleSocket(0xc2100ce410, 0xc21017d010)
/home/vagrant/influxdb/src/api/udp/api.go:94 +0x50f
api/udp.(_Server).ListenAndServe(0xc2100ce410)
/home/vagrant/influxdb/src/api/udp/api.go:62 +0x2ac
created by server.(_Server).ListenAndServe
/home/vagrant/influxdb/src/server/server.go:146 +0x6aa

goroutine 1 [IO wait]:
net.runtime_pollWait(0x7f37f9ab6fd0, 0x72, 0x0)
/tmp/makerelease886106415/go/src/pkg/runtime/netpoll.goc:116 +0x6a
net.(_pollDesc).Wait(0xc210141060, 0x72, 0x7f37f9aacf88, 0xb)
/home/vagrant/bin/go/src/pkg/net/fd_poll_runtime.go:81 +0x34
net.(_pollDesc).WaitRead(0xc210141060, 0xb, 0x7f37f9aacf88)
/home/vagrant/bin/go/src/pkg/net/fd_poll_runtime.go:86 +0x30
net.(_netFD).accept(0xc210141000, 0xa00be0, 0x0, 0x7f37f9aacf88, 0xb)
/home/vagrant/bin/go/src/pkg/net/fd_unix.go:382 +0x2c2
net.(_TCPListener).AcceptTCP(0xc2101c8070, 0x18, 0xc210039038, 0x5cdd03)
/home/vagrant/bin/go/src/pkg/net/tcpsock_posix.go:233 +0x47
net.(_TCPListener).Accept(0xc2101c8070, 0x50, 0x8dbc60, 0x7f37f9ac1b50, 0x0)
/home/vagrant/bin/go/src/pkg/net/tcpsock_posix.go:243 +0x27
net/http.(_Server).Serve(0xc2101d1050, 0x7f37f9ab6200, 0xc2101c8070, 0x0, 0x0)
/home/vagrant/bin/go/src/pkg/net/http/server.go:1622 +0x91
api/http.(_HttpServer).serveListener(0xc21008fd20, 0x7f37f9ab6200, 0xc2101c8070, 0xc2101c8078)
/home/vagrant/influxdb/src/api/http/api.go:186 +0x98
api/http.(_HttpServer).Serve(0xc21008fd20, 0x7f37f9ab6200, 0xc2101c8070)
/home/vagrant/influxdb/src/api/http/api.go:156 +0xc7c
api/http.(_HttpServer).ListenAndServe(0xc21008fd20)
/home/vagrant/influxdb/src/api/http/api.go:79 +0x16c
server.(_Server).ListenAndServe(0xc2100501c0, 0xc2100501c0, 0x0)
/home/vagrant/influxdb/src/server/server.go:154 +0x7a6
main.main()
/home/vagrant/influxdb/src/daemon/influxd.go:158 +0xd17

goroutine 3 [syscall]:
os/signal.loop()
/home/vagrant/bin/go/src/pkg/os/signal/signal_unix.go:21 +0x1e
created by os/signal.init·1
/home/vagrant/bin/go/src/pkg/os/signal/signal_unix.go:27 +0x31

goroutine 4 [chan receive]:
code.google.com/p/log4go.ConsoleLogWriter.run(0xc210069000, 0x7f37f9aad0e8, 0xc210000008)
/home/vagrant/influxdb/src/code.google.com/p/log4go/termlog.go:27 +0x60
created by code.google.com/p/log4go.NewConsoleLogWriter
/home/vagrant/influxdb/src/code.google.com/p/log4go/termlog.go:19 +0x67

goroutine 5 [syscall]:
runtime.goexit()
/home/vagrant/bin/go/src/pkg/runtime/proc.c:1394

goroutine 6 [select]:
code.google.com/p/log4go.func·002()
/home/vagrant/influxdb/src/code.google.com/p/log4go/filelog.go:84 +0x84c
created by code.google.com/p/log4go.NewFileLogWriter
/home/vagrant/influxdb/src/code.google.com/p/log4go/filelog.go:116 +0x2d1

goroutine 7 [runnable]:
wal.(*WAL).processEntries(0xc210072e00)
/home/vagrant/influxdb/src/wal/wal.go:252 +0x3f
created by wal.NewWAL
/home/vagrant/influxdb/src/wal/wal.go:103 +0x9f3

goroutine 8 [sleep]:
time.Sleep(0x8bb2c97000)
/tmp/makerelease886106415/go/src/pkg/runtime/time.goc:31 +0x31
cluster.func·001()
/home/vagrant/influxdb/src/cluster/cluster_configuration.go:132 +0x35
created by cluster.(*ClusterConfiguration).CreateFutureShardsAutomaticallyBeforeTimeComes
/home/vagrant/influxdb/src/cluster/cluster_configuration.go:137 +0x63

goroutine 10 [chan receive]:
main.waitForSignals(0x7f37f9ab6180, 0xc2100501c0)
/home/vagrant/influxdb/src/daemon/null_profiler.go:23 +0x126
created by main.startProfiler
/home/vagrant/influxdb/src/daemon/null_profiler.go:15 +0x38

goroutine 11 [IO wait]:
net.runtime_pollWait(0x7f37f9ab71c8, 0x72, 0x0)
/tmp/makerelease886106415/go/src/pkg/runtime/netpoll.goc:116 +0x6a
net.(_pollDesc).Wait(0xc210050290, 0x72, 0x7f37f9aacf88, 0xb)
/home/vagrant/bin/go/src/pkg/net/fd_poll_runtime.go:81 +0x34
net.(_pollDesc).WaitRead(0xc210050290, 0xb, 0x7f37f9aacf88)
/home/vagrant/bin/go/src/pkg/net/fd_poll_runtime.go:86 +0x30
net.(_netFD).accept(0xc210050230, 0xa00be0, 0x0, 0x7f37f9aacf88, 0xb)
/home/vagrant/bin/go/src/pkg/net/fd_unix.go:382 +0x2c2
net.(_TCPListener).AcceptTCP(0xc2100fdf00, 0x18, 0xc21010c810, 0x5cdd03)
/home/vagrant/bin/go/src/pkg/net/tcpsock_posix.go:233 +0x47
net.(_TCPListener).Accept(0xc2100fdf00, 0x0, 0x0, 0x0, 0x0)
/home/vagrant/bin/go/src/pkg/net/tcpsock_posix.go:243 +0x27
net/http.(_Server).Serve(0xc2100ce460, 0x7f37f9ab6200, 0xc2100fdf00, 0x0, 0x0)
/home/vagrant/bin/go/src/pkg/net/http/server.go:1622 +0x91
coordinator.func·007()
/home/vagrant/influxdb/src/coordinator/raft_server.go:526 +0x3a
created by coordinator.(*RaftServer).Serve
/home/vagrant/influxdb/src/coordinator/raft_server.go:530 +0x4d9

goroutine 13 [finalizer wait]:
runtime.park(0x4518e0, 0x1029db8, 0x10147e8)
/home/vagrant/bin/go/src/pkg/runtime/proc.c:1342 +0x66
runfinq()
/home/vagrant/bin/go/src/pkg/runtime/mgc0.c:2279 +0x84
runtime.goexit()
/home/vagrant/bin/go/src/pkg/runtime/proc.c:1394

goroutine 14 [select]:
github.com/goraft/raft.(_server).leaderLoop(0xc210119240)
/home/vagrant/influxdb/src/github.com/goraft/raft/server.go:765 +0x5fe
github.com/goraft/raft.(_server).loop(0xc210119240)
/home/vagrant/influxdb/src/github.com/goraft/raft/server.go:568 +0x33f
created by github.com/goraft/raft.(*server).Start
/home/vagrant/influxdb/src/github.com/goraft/raft/server.go:472 +0x7af

goroutine 15 [select]:
coordinator.(_RaftServer).CompactLog(0xc2100af790)
/home/vagrant/influxdb/src/coordinator/raft_server.go:316 +0x2ef
created by coordinator.(_RaftServer).startRaft
/home/vagrant/influxdb/src/coordinator/raft_server.go:370 +0x375

goroutine 17 [select]:
coordinator.(_RaftServer).raftLeaderLoop(0xc2100af790, 0xc210070d80)
/home/vagrant/influxdb/src/coordinator/raft_server.go:426 +0x29c
created by coordinator.(_RaftServer).raftEventHandler
/home/vagrant/influxdb/src/coordinator/raft_server.go:415 +0x1d0

goroutine 19 [IO wait]:
net.runtime_pollWait(0x7f37f9ab7120, 0x72, 0x0)
/tmp/makerelease886106415/go/src/pkg/runtime/netpoll.goc:116 +0x6a
net.(_pollDesc).Wait(0xc210113bc0, 0x72, 0x7f37f9aacf88, 0xb)
/home/vagrant/bin/go/src/pkg/net/fd_poll_runtime.go:81 +0x34
net.(_pollDesc).WaitRead(0xc210113bc0, 0xb, 0x7f37f9aacf88)
/home/vagrant/bin/go/src/pkg/net/fd_poll_runtime.go:86 +0x30
net.(_netFD).accept(0xc210113b60, 0xa00be0, 0x0, 0x7f37f9aacf88, 0xb)
/home/vagrant/bin/go/src/pkg/net/fd_unix.go:382 +0x2c2
net.(_TCPListener).AcceptTCP(0xc2101890c8, 0xc2100c8fe0, 0x0, 0x7f37f9ab61d0)
/home/vagrant/bin/go/src/pkg/net/tcpsock_posix.go:233 +0x47
net.(_TCPListener).Accept(0xc2101890c8, 0xc2100c8fe0, 0x7f37f9919f38, 0x1, 0x1)
/home/vagrant/bin/go/src/pkg/net/tcpsock_posix.go:243 +0x27
coordinator.(_ProtobufServer).ListenAndServe(0xc2100709c0)
/home/vagrant/influxdb/src/coordinator/protobuf_server.go:64 +0x1c7
created by server.(*Server).ListenAndServe
/home/vagrant/influxdb/src/server/server.go:117 +0x218

goroutine 20 [chan receive]:
wal.(_WAL).Commit(0xc210072e00, 0x1059f4715, 0x59f471500000002, 0x0)
/home/vagrant/influxdb/src/wal/wal.go:121 +0xa2
cluster.(_WriteBuffer).write(0xc210132ee0, 0xc2101b9b00)
/home/vagrant/influxdb/src/cluster/write_buffer.go:95 +0x13c
cluster.(*WriteBuffer).handleWrites(0xc210132ee0)
/home/vagrant/influxdb/src/cluster/write_buffer.go:78 +0xb7
created by cluster.NewWriteBuffer
/home/vagrant/influxdb/src/cluster/write_buffer.go:43 +0x24f

goroutine 22 [IO wait]:
net.runtime_pollWait(0x7f37f9ab7078, 0x72, 0x0)
/tmp/makerelease886106415/go/src/pkg/runtime/netpoll.goc:116 +0x6a
net.(_pollDesc).Wait(0xc21014d5a0, 0x72, 0x7f37f9aacf88, 0xb)
/home/vagrant/bin/go/src/pkg/net/fd_poll_runtime.go:81 +0x34
net.(_pollDesc).WaitRead(0xc21014d5a0, 0xb, 0x7f37f9aacf88)
/home/vagrant/bin/go/src/pkg/net/fd_poll_runtime.go:86 +0x30
net.(_netFD).accept(0xc21014d540, 0xa00be0, 0x0, 0x7f37f9aacf88, 0xb)
/home/vagrant/bin/go/src/pkg/net/fd_unix.go:382 +0x2c2
net.(_TCPListener).AcceptTCP(0xc2101c8cb8, 0x18, 0xc2100b2810, 0x5cdd03)
/home/vagrant/bin/go/src/pkg/net/tcpsock_posix.go:233 +0x47
net.(_TCPListener).Accept(0xc2101c8cb8, 0x50, 0x1035c00, 0x18, 0x0)
/home/vagrant/bin/go/src/pkg/net/tcpsock_posix.go:243 +0x27
net/http.(_Server).Serve(0xc2101d1320, 0x7f37f9ab6200, 0xc2101c8cb8, 0x0, 0x0)
/home/vagrant/bin/go/src/pkg/net/http/server.go:1622 +0x91
net/http.Serve(0x7f37f9ab6200, 0xc2101c8cb8, 0x7f37f9ac1a80, 0xc2101cda10, 0x7, ...)
/home/vagrant/bin/go/src/pkg/net/http/server.go:1561 +0x70
admin.(_HttpServer).ListenAndServe(0xc210070a40)
/home/vagrant/influxdb/src/admin/http_server.go:35 +0x170
created by server.(_Server).ListenAndServe
/home/vagrant/influxdb/src/server/server.go:131 +0x460

@jvshahid
Copy link
Contributor

That's because you're trying to write data with an empty series name, I fixed the code to print an error instead of a panic. That won't fix your problem since all your writes will be ignored, you need to find out which client is sending udp packets with an empty series name.

@jvshahid jvshahid added this to the 0.7.0 milestone May 21, 2014
@pfischermx
Copy link
Author

Thanks! That seem to work.

@porjo
Copy link

porjo commented Jul 11, 2014

I'm using install from here: http://s3.amazonaws.com/influxdb/influxdb-latest-1.x86_64.rpm which appears to be v0.7.3

I'm seeing this error when series name is empty string:

[2014/07/11 06:18:43 BST] [EROR] (common.RecoverFunc:20) ********************************BUG********************************
Database: monitor
Query: [select  mean(value) from "" where  time > now() - 5m     group by time(0.1s)  order asc]
Error: runtime error: index out of range. Stacktrace: goroutine 241 [running]:
common.RecoverFunc(0xc21005fc86, 0x7, 0xc2101d6540, 0x57, 0x0)
    /home/vagrant/influxdb/src/common/recover.go:14 +0xb7
runtime.panic(0x8a29e0, 0x1042d57)
    /home/vagrant/bin/go/src/pkg/runtime/panic.c:248 +0x106
parser.(*QuerySpec).ShouldQueryShortTermAndLongTerm(0xc21005fd20, 0x7fd3dda60000)
    /home/vagrant/influxdb/src/parser/query_spec.go:161 +0xf4
cluster.(*ClusterConfiguration).GetShards(0xc21007ba80, 0xc21005fd20, 0x0, 0x0, 0x0)
    /home/vagrant/influxdb/src/cluster/cluster_configuration.go:812 +0xb0
coordinator.(*CoordinatorImpl).getShardsAndProcessor(0xc2100b32e0, 0xc21005fd20, 0x7fd3ddc1d630, 0xc21011a330, 0x500480, ...)
    /home/vagrant/influxdb/src/coordinator/coordinator.go:279 +0x8f
coordinator.(*CoordinatorImpl).runQuerySpec(0xc2100b32e0, 0xc21005fd20, 0x7fd3ddc1d630, 0xc21011a330, 0x0, ...)
    /home/vagrant/influxdb/src/coordinator/coordinator.go:414 +0x74
coordinator.(*CoordinatorImpl).runQuery(0xc2100b32e0, 0xc21005fd20, 0x7fd3ddc1d630, 0xc21011a330, 0x0, ...)
    /home/vagrant/i

@indykish
Copy link

@pfischermx Can you throw some light on "How did you setup gmond (ganglia)+grafana" with influxdb ?

How does gmond send metrics to influxdb ?

@cboggs
Copy link

cboggs commented Sep 14, 2014

I believe you have to configure gmetad to push metrics in graphite format,
pointing at the graphite udp port you enable in the influxdb node(s).

In my opinion, the data layout with this mechanism is not ideal, as you'll
get a distinct series per metric per host. However it's a decent stop-gap
until you can get something like collectd deployed. (Even that might not be
much better in terms of data organization and performance, haven't tried it
out yet.)
On Sep 14, 2014 9:26 AM, "Kishorekumar Neelamegam" notifications@github.com
wrote:

@pfischermx https://github.com/pfischermx Can you throw some light on
"How you have setup gmond (ganglia)+grafana" with influxdb ?

How does gmond send metrics to influxdb ?


Reply to this email directly or view it on GitHub
#565 (comment).

@indykish
Copy link

@cboggs Thanks for your reply. I am looking at ways in avoiding storing redundant data
2 ways I am thinking on

  • gmond -> gmetad (rrds) - graphite -> influxdb (I see a blog post and you have mentioned it too)
    This has redundant data in rrds (gmetad) & influxdb
  • gmond -> proxy's timeseries data direct to influxdb (I am exploring this approach).

We have VM's which needs to send time series data to influxdb (new architecture) as opposed to gmetad.

@cboggs
Copy link

cboggs commented Sep 14, 2014

I'll have to check when I get home, but I think you can spin up gmetad
without the RRD writer, and still have it output to a graphite-friendly
destination. That way gmetad becomes your non-redundant proxy layer,
effectively. :-)
On Sep 14, 2014 9:48 AM, "Kishorekumar Neelamegam" notifications@github.com
wrote:

@cboggs https://github.com/cboggs I am looking at ways in avoiding
storing redundant data
2 ways

  • gmond -> gmetad (rrds) - graphite -> influxdb (I see a blog post)
    This has redundant data in rrds (gmetad) & influxdb
  • gmond -> proxy's timeseries data direct to influxdb (I am exploring
    this approach).


Reply to this email directly or view it on GitHub
#565 (comment).

@indykish
Copy link

Cool. will check out this option too.

@pfischermx
Copy link
Author

@indykish, the way we have it running is:

We have a dedicated host. The host has a (perl) script I wrote that basically checks the XML of ganglia (port 8652) of each one of our ganglia hosts (we have many). Then it parses the XML and sends the stats to InfluxDB (via UDP).

Since our ganglia instances have a LOT of data we ended up with the following setup on influxdb:

  • We create about 10/15 UDP ports.
  • Via the script I assign an available port (or use a least-used) per ganglia cluster.

At the beginning I was sending all the data to just one available UDP port, that was a bad idea, there were a lot of metrics getting drop and we noted that the more we could "split" the metrics across influxdb the better.

We also tried to use the carbon UDP port that comes with influx (and that in Ganglia, via gmetad you can use). But I really... did not liked it. The metrics were getting saved as the default carbon format (CLUSTER.$host.$metric).. and in grafana it was looking pretty ugly to have those.

The other reason of why we gave up with the carbon-UDP port was because we have other metric systems that we also wanted to pull data from, so the best thing was to make the perl script a little bit more perfect to our needs.

It may sound a bit complicated but that is the way we did it because we found too many performance issues (due to the nature of UDP and the early days of InfluxDB). Also, since we moved to RocksDB most of the problems went away.

@XANi
Copy link

XANi commented Sep 15, 2014

Could you define "a lot" ? On my bigger cluster that gets about 8k metrics per second I have:

  • CollectD on machines to collectd data
  • Riemann that generally does calculating counters/derives into gauges. Main reason is so we don't have to worry if a given metric is counter or gauge when we add it to grafana.
  • Riemann sends via TCP, using graphite format to HAProxy, using 16 TCP streams (for loadbalancing)
  • HAProxy splits traffic between 2 nodes using leastconn policy
  • Nodes accept graphite TCP traffic

We use graphite protocol for 2 reasons:

  • Had graphite before so it was easier to migrate that way
  • Riemann's InfluxDB plugin adds too much fields I dont care about; I prefer to have most of data in metric name and only use columns for continous queries results

Riemann is also used to format metrics in $location.$host.$plugin so grafana's $1-$9 can be used

  • hq.xen1.ups.voltage.output = $1 $3: $5 - "xen1 voltage: output"
  • hq.filer.disk.md124.disk_ops.write = $1 $3: $5 - "filer md124: write"

@indykish
Copy link

@pfischermx Thanks. I guess you are using a pull model from gmetad.
Wouldn't there be redundant data in rrds (gmetad that runs in port 8652 stores in rrds) and in influxdb ?

@XANi Nice option using HAProxy.
The only reason I want to stay away from reimann as its java and more memory. I would like a slim solution on golang/c/c++.
The way I am looking at is our VMs monitored using gmond, use loadbalancers as you said and have gmonds (metrics collectors) send it direct to influxdb shards.

@XANi
Copy link

XANi commented Sep 16, 2014

Actually riemann is clojure ;] on our instance with ~88k metrics, with each one averaged over 10 seconds (this is mostly so misbehaving software cant swamp backend with hundreds of writes) + calculating derives into gauges it eats 3.4 GB RSS and 1 core of old Xeon E5540 with event rate of about 8k/sec,

So it comes out to using about 40kbytes per metric, which is entirely reasonable

Smaller instances will eat less, my home one uses about 450 MB RSS for parsing 3k different series and only few % CPU

@indykish
Copy link

@XANi thanks cool. I meant "clojure" still needs a jvm hence java/clojure/scala all are in the same category to me.
Thanks for the info.

@XANi
Copy link

XANi commented Sep 16, 2014

@indykish bigger problem (or feature for some) is that riemann's config is also in clojure.

But it allows doing some neat things like you can pipe all http server request log into it and parse it so backend only gets min/max/avg/percentiles/histograms every few second

@jvshahid
Copy link
Contributor

I'm not sure why you are commenting on this issue. You guys been commenting on this issue for the last 2 days, but I don't think this is related to the issue. If this is a general discussion I'd suggest to use the mailing list to benefit everyone and reduce the noise in github issues. If I misunderstood these comments and you feel there's something actionable here, please let me know.

@indykish
Copy link

@jvshahid Thanks. sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants