You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First off, apologies but I don't have a reproducible test for this yet.
The behaviour we see with Influxdb 0.9.3 with real life clients sending graphite metrics to the graphite plugin port of influxdb is this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32484 influxdb 20 0 1047m 745m 11m S 102.6 4.7 68:50.94 influxd
The db keeps spinning on CPU and we see a slew of write timeouts in log in between wal flushes 'due to idle'. I presume both of these cannot be happening at same time. Is this a case of a deadlock somewhere?
Can see the measurements in show measurements but data points are missing. No surprise there, given the write timeouts. The surprise is there's nothing hitting the disk and the wal is seemingly idle.
The actual data being sent is ~450K graphite metrics/minute from about 25k unique series. This is real data from real graphite clients and is being ingested in both graphite and influxdb 0.8 with no issues.
I am in the process of trying to replicate this locally - no luck yet. Would appreciate any pointers in the meantime. I can provide a profile of influxdb while it is doing this if that would be helpful.
Iostat shows data drive being hit very very little, few hundred KB per minute.
Influxdb logs and induced stack trace by SIGQUIT while this is happening:
[metastore] 2015/08/27 15:19:30 Using data dir: /var/opt/influxdb/meta
[metastore] 2015/08/27 15:19:30 Node at localhost:8088 [Follower]
[metastore] 2015/08/27 15:19:32 Node at localhost:8088 [Leader]. peers=[]
[metastore] 2015/08/27 15:19:32 Created local node: id=1, host=localhost:8088
[store] 2015/08/27 15:19:32 Using data dir: /var/opt/influxdb/data
[handoff] 2015/08/27 15:19:32 Starting hinted handoff service
[handoff] 2015/08/27 15:19:32 Using data dir: /var/opt/influxdb/hh
[tcp] 2015/08/27 15:19:32 Starting cluster service
[shard-precreation] 2015/08/27 15:19:32 Starting precreation service with check interval of 10m0s, advance period of 30m0s
[snapshot] 2015/08/27 15:19:32 Starting snapshot service
[admin] 2015/08/27 15:19:32 Starting admin service
[admin] 2015/08/27 15:19:32 Listening on HTTP: 0.0.0.0:8083
[continuous_querier] 2015/08/27 15:19:32 Starting continuous query service
[httpd] 2015/08/27 15:19:32 Starting HTTP service
[httpd] 2015/08/27 15:19:32 Authentication enabled: false
[httpd] 2015/08/27 15:19:32 Listening on HTTP: 0.0.0.0:8086
[retention] 2015/08/27 15:19:32 Starting rentention policy enforcement service
[graphite] 2015/08/27 15:19:32 Starting graphite service, batch size 1000, batch timeout 1s
[graphite] 2015/08/27 15:19:32 Listening on UDP: 127.0.0.1:2013
2015/08/27 15:19:32 InfluxDB starting, version 0.9.3, branch 0.9.3, commit 5d42b212fca8facfe9db0c83822f09b88be643ec
2015/08/27 15:19:32 GOMAXPROCS set to 4
[run] 2015/08/27 15:19:32 Listening for signals
[wal] 2015/08/27 15:19:37 WAL starting with 30720 ready series size, 0.50 compaction threshold, and 20971520 partition size threshold
[wal] 2015/08/27 15:19:37 WAL writing to /db/influxdb_wal/wal/graphite/default/1
[wal] 2015/08/27 15:19:38 WAL starting with 30720 ready series size, 0.50 compaction threshold, and 20971520 partition size threshold
[wal] 2015/08/27 15:19:38 WAL writing to /db/influxdb_wal/wal/graphite/default/2
[wal] 2015/08/27 15:19:43 WAL starting with 30720 ready series size, 0.50 compaction threshold, and 20971520 partition size threshold
[wal] 2015/08/27 15:19:43 WAL writing to /db/influxdb_wal/wal/graphite/default/3
[graphite] 2015/08/27 15:20:22 failed to write point batch to database "graphite": timeout
[wal] 2015/08/27 15:20:32 WAL starting with 30720 ready series size, 0.50 compaction threshold, and 20971520 partition size threshold
[wal] 2015/08/27 15:20:32 WAL writing to /db/influxdb_wal/wal/graphite/default/4
[graphite] 2015/08/27 15:21:22 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:21:27 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:21:32 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:22:16 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:22:21 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:22:26 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:22:31 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:22:36 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:22:43 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:23:18 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:23:23 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:23:28 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:23:33 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:23:38 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:24:16 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:24:21 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:24:26 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:24:31 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:24:36 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:24:41 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:25:20 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:25:25 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:25:30 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:25:35 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:25:41 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:26:18 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:26:23 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:26:28 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:26:33 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:26:38 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:26:43 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:26:48 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:26:53 failed to write point batch to database "graphite": timeout
[wal] 2015/08/27 15:26:57 WAL starting with 30720 ready series size, 0.50 compaction threshold, and 20971520 partition size threshold
[wal] 2015/08/27 15:26:57 WAL writing to /db/influxdb_wal/wal/graphite/default/5
[graphite] 2015/08/27 15:26:58 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:27:18 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:27:23 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:27:28 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:27:33 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:27:38 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:27:43 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:27:48 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:27:53 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:28:18 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:28:23 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:28:28 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:28:33 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:28:38 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:28:43 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:28:48 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:28:53 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:28:58 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:29:17 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:29:22 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:29:27 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:29:32 failed to write point batch to database "graphite": timeout
[retention] 2015/08/27 15:29:32 retention policy enforcement check commencing
[retention] 2015/08/27 15:29:32 retention policy shard deletion check commencing
[metastore] 2015/08/27 15:29:32 new shard group 6 successfully created for database graphite, retention policy default
[metastore] 2015/08/27 15:29:32 new shard group 7 successfully created for database graphite, retention policy default
[graphite] 2015/08/27 15:29:37 failed to write point batch to database "graphite": timeout
[wal] 2015/08/27 15:29:38 Flushing 43861 measurements and 104881 series to index
[wal] 2015/08/27 15:29:38 Flushing 168 measurements and 170 series to index
[wal] 2015/08/27 15:29:38 Metadata flush took 36.446349ms
[wal] 2015/08/27 15:29:39 Metadata flush took 746.320935ms
[graphite] 2015/08/27 15:29:42 failed to write point batch to database "graphite": timeout
[wal] 2015/08/27 15:29:43 Flushing 109 measurements and 109 series to index
[wal] 2015/08/27 15:29:43 Metadata flush took 46.66375ms
[graphite] 2015/08/27 15:29:47 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:29:55 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:30:20 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:30:25 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:30:30 failed to write point batch to database "graphite": timeout
[wal] 2015/08/27 15:30:32 Flushing 17 measurements and 17 series to index
[wal] 2015/08/27 15:30:32 Metadata flush took 64.250652ms
[graphite] 2015/08/27 15:30:35 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:30:40 failed to write point batch to database "graphite": timeout
[wal] 2015/08/27 15:30:43 Flush due to idle. Flushing 22 series with 1207 bytes from partition 4. Compacting 0 series
[wal] 2015/08/27 15:30:43 compaction of partition 4 took 23.559258ms
[wal] 2015/08/27 15:30:43 Flush due to idle. Flushing 23 series with 1275 bytes from partition 5. Compacting 0 series
[wal] 2015/08/27 15:30:43 compaction of partition 5 took 12.574245ms
[wal] 2015/08/27 15:30:43 Flush due to idle. Flushing 25 series with 1377 bytes from partition 1. Compacting 0 series
[wal] 2015/08/27 15:30:43 compaction of partition 1 took 30.696511ms
[wal] 2015/08/27 15:30:43 Flush due to idle. Flushing 23 series with 1224 bytes from partition 2. Compacting 0 series
[wal] 2015/08/27 15:30:43 compaction of partition 2 took 13.580917ms
[wal] 2015/08/27 15:30:43 Flush due to idle. Flushing 16 series with 850 bytes from partition 3. Compacting 0 series
[wal] 2015/08/27 15:30:43 compaction of partition 3 took 14.095429ms
[graphite] 2015/08/27 15:30:45 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:30:50 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:30:55 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:00 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:05 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:10 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:15 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:20 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:25 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:30 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:37 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:42 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:47 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:52 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:31:57 failed to write point batch to database "graphite": timeout
[wal] 2015/08/27 15:31:59 Flush due to idle. Flushing 5 series with 85 bytes from partition 1. Compacting 0 series
[wal] 2015/08/27 15:31:59 compaction of partition 1 took 55.903703ms
[wal] 2015/08/27 15:31:59 Flush due to idle. Flushing 14 series with 238 bytes from partition 2. Compacting 0 series
[wal] 2015/08/27 15:31:59 compaction of partition 2 took 20.480222ms
[wal] 2015/08/27 15:31:59 Flush due to idle. Flushing 11 series with 187 bytes from partition 3. Compacting 0 series
[wal] 2015/08/27 15:31:59 compaction of partition 3 took 12.124952ms
[wal] 2015/08/27 15:31:59 Flush due to idle. Flushing 9 series with 153 bytes from partition 4. Compacting 0 series
[wal] 2015/08/27 15:31:59 compaction of partition 4 took 23.220238ms
[wal] 2015/08/27 15:31:59 Flush due to idle. Flushing 14 series with 238 bytes from partition 5. Compacting 0 series
[wal] 2015/08/27 15:31:59 compaction of partition 5 took 30.866042ms
[graphite] 2015/08/27 15:32:02 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:32:07 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:32:16 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:32:21 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:32:26 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:32:31 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:32:36 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:32:41 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:32:47 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:32:52 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:32:57 failed to write point batch to database "graphite": timeout
[graphite] 2015/08/27 15:33:02 failed to write point batch to database "graphite": timeout
[wal] 2015/08/27 15:33:04 Flush due to idle. Flushing 2 series with 153 bytes from partition 5. Compacting 0 series
[wal] 2015/08/27 15:33:04 compaction of partition 5 took 25.157718ms
[wal] 2015/08/27 15:33:04 Flush due to idle. Flushing 1 series with 68 bytes from partition 1. Compacting 0 series
[wal] 2015/08/27 15:33:04 compaction of partition 1 took 10.731526ms
[wal] 2015/08/27 15:33:04 Flush due to idle. Flushing 3 series with 221 bytes from partition 2. Compacting 0 series
[wal] 2015/08/27 15:33:04 compaction of partition 2 took 17.326634ms
[wal] 2015/08/27 15:33:04 Flush due to idle. Flushing 8 series with 612 bytes from partition 3. Compacting 0 series
[wal] 2015/08/27 15:33:04 compaction of partition 3 took 33.873323ms
[wal] 2015/08/27 15:33:04 Flush due to idle. Flushing 3 series with 204 bytes from partition 4. Compacting 0 series
[wal] 2015/08/27 15:33:04 compaction of partition 4 took 7.881754ms
SIGQUIT: quit
PC=0x4415cb
goroutine 1144 [running]:
runtime.cmpbody()
/root/.gvm/gos/go1.4.2/src/runtime/asm_amd64.s:1195 +0x2b fp=0xc2080f28d0 sp=0xc2080f28c8
runtime.cmpstring(0xc2081f6320, 0x46, 0xc20da7d900, 0x4d, 0xffffffffffffffff)
/root/.gvm/gos/go1.4.2/src/runtime/asm_amd64.s:1161 +0x19 fp=0xc2080f28d8 sp=0xc2080f28d0
sort.(*StringSlice).Less(0xc2148ca060, 0xaee1, 0xb722, 0xb301)
<autogenerated>:13 +0xfa fp=0xc2080f2928 sp=0xc2080f28d8
sort.doPivot(0x2b8236b61560, 0xc2148ca060, 0xaee1, 0xb869, 0xb869, 0xaee1)
/root/.gvm/gos/go1.4.2/src/sort/sort.go:138 +0x231 fp=0xc2080f2990 sp=0xc2080f2928
sort.quickSort(0x2b8236b61560, 0xc2148ca060, 0xaee1, 0xb869, 0x1b)
/root/.gvm/gos/go1.4.2/src/sort/sort.go:173 +0xa9 fp=0xc2080f29d8 sp=0xc2080f2990
sort.quickSort(0x2b8236b61560, 0xc2148ca060, 0xa555, 0xb869, 0x1c)
/root/.gvm/gos/go1.4.2/src/sort/sort.go:180 +0x178 fp=0xc2080f2a20 sp=0xc2080f29d8
sort.quickSort(0x2b8236b61560, 0xc2148ca060, 0x65be, 0xcb7b, 0x1e)
/root/.gvm/gos/go1.4.2/src/sort/sort.go:180 +0x178 fp=0xc2080f2a68 sp=0xc2080f2a20
sort.quickSort(0x2b8236b61560, 0xc2148ca060, 0x0, 0xcb7b, 0x1f)
/root/.gvm/gos/go1.4.2/src/sort/sort.go:180 +0x178 fp=0xc2080f2ab0 sp=0xc2080f2a68
sort.Sort(0x2b8236b61560, 0xc2148ca060)
/root/.gvm/gos/go1.4.2/src/sort/sort.go:200 +0x7b fp=0xc2080f2ae0 sp=0xc2080f2ab0
sort.Strings(0xc21df00000, 0xcb7b, 0xec00)
/root/.gvm/gos/go1.4.2/src/sort/sort.go:276 +0x82 fp=0xc2080f2b30 sp=0xc2080f2ae0
github.com/influxdb/influxdb/tsdb.(*DatabaseIndex).CreateMeasurementIndexIfNotExists(0xc2082ce140, 0xc20b98e2d0, 0x4e, 0x4e)
/root/go/src/github.com/influxdb/influxdb/tsdb/meta.go:110 +0x48d fp=0xc2080f2c28 sp=0xc2080f2b30
github.com/influxdb/influxdb/tsdb.(*DatabaseIndex).CreateSeriesIndexIfNotExists(0xc2082ce140, 0xc20b98e2d0, 0x4e, 0xc20983ccf0, 0xc20983cc60)
/root/go/src/github.com/influxdb/influxdb/tsdb/meta.go:88 +0x8e fp=0xc2080f2c68 sp=0xc2080f2c28
github.com/influxdb/influxdb/tsdb.(*Shard).WritePoints(0xc2082e06c0, 0xc214d84000, 0x3e8, 0x400, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tsdb/shard.go:165 +0x1dd fp=0xc2080f2e90 sp=0xc2080f2c68
github.com/influxdb/influxdb/tsdb.(*Store).WriteToShard(0xc2080321a0, 0x1, 0xc214d84000, 0x3e8, 0x400, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tsdb/store.go:303 +0xf7 fp=0xc2080f2ed8 sp=0xc2080f2e90
github.com/influxdb/influxdb/cluster.func·002(0x1, 0x1, 0xc214d84000, 0x3e8, 0x400)
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:241 +0xfc fp=0xc2080f2fb8 sp=0xc2080f2ed8
runtime.goexit()
/root/.gvm/gos/go1.4.2/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc2080f2fc0 sp=0xc2080f2fb8
created by github.com/influxdb/influxdb/cluster.(*PointsWriter).writeToShard
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:271 +0x30b
goroutine 1 [chan receive, 13 minutes]:
main.(*Main).Run(0xc20806da80, 0xc20800a010, 0x4, 0x4, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/cmd/influxd/main.go:91 +0x777
main.main()
/root/go/src/github.com/influxdb/influxdb/cmd/influxd/main.go:42 +0xdc
goroutine 6 [syscall, 13 minutes]:
os/signal.loop()
/root/.gvm/gos/go1.4.2/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
/root/.gvm/gos/go1.4.2/src/os/signal/signal_unix.go:27 +0x35
goroutine 17 [syscall, 13 minutes, locked to thread]:
runtime.goexit()
/root/.gvm/gos/go1.4.2/src/runtime/asm_amd64.s:2232 +0x1
goroutine 8 [IO wait, 13 minutes]:
net.(*pollDesc).Wait(0xc20800e450, 0x72, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc20800e450, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).accept(0xc20800e3f0, 0x0, 0x2b8236b59f40, 0xc2080fe000)
/root/.gvm/gos/go1.4.2/src/net/fd_unix.go:419 +0x40b
net.(*TCPListener).AcceptTCP(0xc208036de8, 0x0, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/tcpsock_posix.go:234 +0x4e
net.(*TCPListener).Accept(0xc208036de8, 0x0, 0x0, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/tcpsock_posix.go:244 +0x4c
github.com/influxdb/influxdb/tcp.(*Mux).Serve(0xc2080e29c0, 0x2b8236b5c5d0, 0xc208036de8, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tcp/mux.go:45 +0x57
created by github.com/influxdb/influxdb/cmd/influxd/run.func·003
/root/go/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:313 +0x8f2
goroutine 9 [chan receive, 13 minutes]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc208036df0, 0x0, 0x0, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tcp/mux.go:117 +0x80
github.com/influxdb/influxdb/meta.(*raftLayer).Accept(0xc2080e2b10, 0x0, 0x0, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/meta/store.go:2058 +0x64
github.com/hashicorp/raft.(*NetworkTransport).listen(0xc20800e4d0)
/root/go/src/github.com/hashicorp/raft/net_transport.go:346 +0x50
created by github.com/hashicorp/raft.NewNetworkTransport
/root/go/src/github.com/hashicorp/raft/net_transport.go:138 +0x2c4
goroutine 10 [select]:
github.com/hashicorp/raft.(*Raft).leaderLoop(0xc2080c41c0)
/root/go/src/github.com/hashicorp/raft/raft.go:826 +0xc7c
github.com/hashicorp/raft.(*Raft).runLeader(0xc2080c41c0)
/root/go/src/github.com/hashicorp/raft/raft.go:798 +0x74e
github.com/hashicorp/raft.(*Raft).run(0xc2080c41c0)
/root/go/src/github.com/hashicorp/raft/raft.go:595 +0xc9
github.com/hashicorp/raft.*Raft.(github.com/hashicorp/raft.run)·fm()
/root/go/src/github.com/hashicorp/raft/raft.go:245 +0x27
github.com/hashicorp/raft.func·011()
/root/go/src/github.com/hashicorp/raft/state.go:152 +0x51
created by github.com/hashicorp/raft.(*raftState).goFunc
/root/go/src/github.com/hashicorp/raft/state.go:153 +0xe3
goroutine 11 [select, 3 minutes]:
github.com/hashicorp/raft.(*Raft).runFSM(0xc2080c41c0)
/root/go/src/github.com/hashicorp/raft/raft.go:502 +0x1043
github.com/hashicorp/raft.*Raft.(github.com/hashicorp/raft.runFSM)·fm()
/root/go/src/github.com/hashicorp/raft/raft.go:246 +0x27
github.com/hashicorp/raft.func·011()
/root/go/src/github.com/hashicorp/raft/state.go:152 +0x51
created by github.com/hashicorp/raft.(*raftState).goFunc
/root/go/src/github.com/hashicorp/raft/state.go:153 +0xe3
goroutine 12 [select]:
github.com/hashicorp/raft.(*Raft).runSnapshots(0xc2080c41c0)
/root/go/src/github.com/hashicorp/raft/raft.go:1613 +0x466
github.com/hashicorp/raft.*Raft.(github.com/hashicorp/raft.runSnapshots)·fm()
/root/go/src/github.com/hashicorp/raft/raft.go:247 +0x27
github.com/hashicorp/raft.func·011()
/root/go/src/github.com/hashicorp/raft/state.go:152 +0x51
created by github.com/hashicorp/raft.(*raftState).goFunc
/root/go/src/github.com/hashicorp/raft/state.go:153 +0xe3
goroutine 13 [select, 13 minutes]:
github.com/influxdb/influxdb/meta.(*localRaft).logLeaderChanges(0xc208072de0)
/root/go/src/github.com/influxdb/influxdb/meta/state.go:187 +0x46e
created by github.com/influxdb/influxdb/meta.(*localRaft).open
/root/go/src/github.com/influxdb/influxdb/meta/state.go:177 +0x130c
goroutine 34 [chan receive, 13 minutes]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc208036df8, 0x0, 0x0, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tcp/mux.go:117 +0x80
github.com/influxdb/influxdb/meta.(*Store).serveExecListener(0xc208062000)
/root/go/src/github.com/influxdb/influxdb/meta/store.go:619 +0x8f
created by github.com/influxdb/influxdb/meta.(*Store).Open
/root/go/src/github.com/influxdb/influxdb/meta/store.go:239 +0x391
goroutine 35 [chan receive, 13 minutes]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc208036e00, 0x0, 0x0, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tcp/mux.go:117 +0x80
github.com/influxdb/influxdb/meta.(*Store).serveRPCListener(0xc208062000)
/root/go/src/github.com/influxdb/influxdb/meta/store.go:723 +0xc9
created by github.com/influxdb/influxdb/meta.(*Store).Open
/root/go/src/github.com/influxdb/influxdb/meta/store.go:242 +0x3db
goroutine 14 [select]:
github.com/influxdb/influxdb/services/hh.(*Service).retryWrites(0xc208045180)
/root/go/src/github.com/influxdb/influxdb/services/hh/service.go:103 +0x2e7
created by github.com/influxdb/influxdb/services/hh.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/services/hh/service.go:67 +0x20e
goroutine 37 [select, 13 minutes]:
github.com/influxdb/influxdb/cmd/influxd/run.(*Server).monitorErrorChan(0xc2080ac0e0, 0xc2080724e0)
/root/go/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:442 +0x14a
created by github.com/influxdb/influxdb/cmd/influxd/run.func·003
/root/go/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:319 +0xa63
goroutine 15 [select, 13 minutes]:
github.com/influxdb/influxdb/services/hh.(*Service).expireWrites(0xc208045180)
/root/go/src/github.com/influxdb/influxdb/services/hh/service.go:121 +0x2a4
created by github.com/influxdb/influxdb/services/hh.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/services/hh/service.go:68 +0x228
goroutine 16 [chan receive, 13 minutes]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc208036e08, 0x0, 0x0, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tcp/mux.go:117 +0x80
github.com/influxdb/influxdb/cluster.(*Service).serve(0xc20800e2a0)
/root/go/src/github.com/influxdb/influxdb/cluster/service.go:83 +0x102
created by github.com/influxdb/influxdb/cluster.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/cluster/service.go:60 +0x138
goroutine 50 [select, 3 minutes]:
github.com/influxdb/influxdb/services/precreator.(*Service).runPrecreation(0xc20800adc0)
/root/go/src/github.com/influxdb/influxdb/services/precreator/service.go:75 +0x407
created by github.com/influxdb/influxdb/services/precreator.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/services/precreator/service.go:53 +0x20a
goroutine 51 [chan receive, 13 minutes]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc208036e10, 0x0, 0x0, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tcp/mux.go:117 +0x80
github.com/influxdb/influxdb/services/snapshotter.(*Service).serve(0xc20800aeb0)
/root/go/src/github.com/influxdb/influxdb/services/snapshotter/service.go:75 +0xad
created by github.com/influxdb/influxdb/services/snapshotter.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/services/snapshotter/service.go:48 +0x133
goroutine 52 [IO wait, 13 minutes]:
net.(*pollDesc).Wait(0xc20816c0d0, 0x72, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc20816c0d0, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).accept(0xc20816c070, 0x0, 0x2b8236b59f40, 0xc2080540d0)
/root/.gvm/gos/go1.4.2/src/net/fd_unix.go:419 +0x40b
net.(*TCPListener).AcceptTCP(0xc208164028, 0xc20801e700, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/tcpsock_posix.go:234 +0x4e
net.(*TCPListener).Accept(0xc208164028, 0x0, 0x0, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/tcpsock_posix.go:244 +0x4c
net/http.(*Server).Serve(0xc208072180, 0x2b8236b5c5d0, 0xc208164028, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/http/server.go:1728 +0x92
net/http.Serve(0x2b8236b5c5d0, 0xc208164028, 0x2b8236b61340, 0xc2080540c0, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/http/server.go:1606 +0xa1
github.com/influxdb/influxdb/services/admin.(*Service).serve(0xc20800af50)
/root/go/src/github.com/influxdb/influxdb/services/admin/service.go:107 +0x147
created by github.com/influxdb/influxdb/services/admin.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/services/admin/service.go:70 +0x472
goroutine 53 [select]:
github.com/influxdb/influxdb/services/continuous_querier.(*Service).backgroundLoop(0xc20800e310)
/root/go/src/github.com/influxdb/influxdb/services/continuous_querier/service.go:157 +0x3e9
created by github.com/influxdb/influxdb/services/continuous_querier.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/services/continuous_querier/service.go:93 +0x2df
goroutine 54 [IO wait, 13 minutes]:
net.(*pollDesc).Wait(0xc20816c140, 0x72, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc20816c140, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).accept(0xc20816c0e0, 0x0, 0x2b8236b59f40, 0xc2080dc010)
/root/.gvm/gos/go1.4.2/src/net/fd_unix.go:419 +0x40b
net.(*TCPListener).AcceptTCP(0xc208164030, 0xc208110000, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/tcpsock_posix.go:234 +0x4e
net.(*TCPListener).Accept(0xc208164030, 0x0, 0x0, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/tcpsock_posix.go:244 +0x4c
net/http.(*Server).Serve(0xc208112120, 0x2b8236b5c5d0, 0xc208164030, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/http/server.go:1728 +0x92
net/http.Serve(0x2b8236b5c5d0, 0xc208164030, 0x2b8236b61368, 0xc20800e380, 0x0, 0x0)
/root/.gvm/gos/go1.4.2/src/net/http/server.go:1606 +0xa1
github.com/influxdb/influxdb/services/httpd.(*Service).serve(0xc20800b090)
/root/go/src/github.com/influxdb/influxdb/services/httpd/service.go:108 +0x79
created by github.com/influxdb/influxdb/services/httpd.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/services/httpd/service.go:76 +0x68b
goroutine 55 [select, 3 minutes]:
github.com/influxdb/influxdb/services/retention.(*Service).deleteShardGroups(0xc208072a20)
/root/go/src/github.com/influxdb/influxdb/services/retention/service.go:69 +0x342
created by github.com/influxdb/influxdb/services/retention.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/services/retention/service.go:45 +0x138
goroutine 56 [select, 3 minutes]:
github.com/influxdb/influxdb/services/retention.(*Service).deleteShards(0xc208072a20)
/root/go/src/github.com/influxdb/influxdb/services/retention/service.go:102 +0x71b
created by github.com/influxdb/influxdb/services/retention.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/services/retention/service.go:46 +0x152
goroutine 66 [chan send]:
github.com/influxdb/influxdb/tsdb.func·001()
/root/go/src/github.com/influxdb/influxdb/tsdb/batcher.go:58 +0x6c
github.com/influxdb/influxdb/tsdb.func·002()
/root/go/src/github.com/influxdb/influxdb/tsdb/batcher.go:89 +0x402
created by github.com/influxdb/influxdb/tsdb.(*PointBatcher).Start
/root/go/src/github.com/influxdb/influxdb/tsdb/batcher.go:104 +0x29a
goroutine 67 [select]:
github.com/influxdb/influxdb/cluster.(*PointsWriter).WritePoints(0xc20800e230, 0xc20faab540, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:210 +0x634
github.com/influxdb/influxdb/services/graphite.(*Service).processBatches(0xc20809e0c0, 0xc2080737a0)
/root/go/src/github.com/influxdb/influxdb/services/graphite/service.go:260 +0x239
created by github.com/influxdb/influxdb/services/graphite.(*Service).Open
/root/go/src/github.com/influxdb/influxdb/services/graphite/service.go:105 +0x76c
goroutine 68 [chan send]:
github.com/influxdb/influxdb/services/graphite.(*Service).handleLine(0xc20809e0c0, 0xc214b72ba0, 0x5e)
/root/go/src/github.com/influxdb/influxdb/services/graphite/service.go:246 +0x2a7
github.com/influxdb/influxdb/services/graphite.func·003()
/root/go/src/github.com/influxdb/influxdb/services/graphite/service.go:219 +0x1f7
created by github.com/influxdb/influxdb/services/graphite.(*Service).openUDPServer
/root/go/src/github.com/influxdb/influxdb/services/graphite/service.go:222 +0x2fa
goroutine 69 [select, 13 minutes]:
github.com/influxdb/influxdb/cmd/influxd/run.(*Command).monitorServerErrors(0xc2080ae300)
/root/go/src/github.com/influxdb/influxdb/cmd/influxd/run/command.go:140 +0x2da
created by github.com/influxdb/influxdb/cmd/influxd/run.(*Command).Run
/root/go/src/github.com/influxdb/influxdb/cmd/influxd/run/command.go:122 +0xdef
goroutine 74 [select]:
github.com/influxdb/influxdb/tsdb/engine/wal.(*Log).autoflusher(0xc2080ac620, 0xc208c40960)
/root/go/src/github.com/influxdb/influxdb/tsdb/engine/wal/wal.go:603 +0x4d3
created by github.com/influxdb/influxdb/tsdb/engine/wal.(*Log).Open
/root/go/src/github.com/influxdb/influxdb/tsdb/engine/wal/wal.go:219 +0x79b
goroutine 26 [select]:
github.com/influxdb/influxdb/tsdb/engine/wal.(*Log).autoflusher(0xc2080ac1c0, 0xc20825c360)
/root/go/src/github.com/influxdb/influxdb/tsdb/engine/wal/wal.go:603 +0x4d3
created by github.com/influxdb/influxdb/tsdb/engine/wal.(*Log).Open
/root/go/src/github.com/influxdb/influxdb/tsdb/engine/wal/wal.go:219 +0x79b
goroutine 1186 [semacquire]:
sync.(*Mutex).Lock(0xc2082ce140)
/root/.gvm/gos/go1.4.2/src/sync/mutex.go:66 +0xd3
sync.(*RWMutex).Lock(0xc2082ce140)
/root/.gvm/gos/go1.4.2/src/sync/rwmutex.go:82 +0x33
github.com/influxdb/influxdb/tsdb.(*Shard).WritePoints(0xc2082e06c0, 0xc214d4c000, 0x3e8, 0x400, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tsdb/shard.go:163 +0x14c
github.com/influxdb/influxdb/tsdb.(*Store).WriteToShard(0xc2080321a0, 0x1, 0xc214d4c000, 0x3e8, 0x400, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tsdb/store.go:303 +0xf7
github.com/influxdb/influxdb/cluster.func·002(0x1, 0x1, 0xc214d4c000, 0x3e8, 0x400)
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:241 +0xfc
created by github.com/influxdb/influxdb/cluster.(*PointsWriter).writeToShard
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:271 +0x30b
goroutine 100 [select]:
github.com/influxdb/influxdb/tsdb/engine/wal.(*Log).autoflusher(0xc2080ac700, 0xc209323a40)
/root/go/src/github.com/influxdb/influxdb/tsdb/engine/wal/wal.go:603 +0x4d3
created by github.com/influxdb/influxdb/tsdb/engine/wal.(*Log).Open
/root/go/src/github.com/influxdb/influxdb/tsdb/engine/wal/wal.go:219 +0x79b
goroutine 1188 [semacquire]:
sync.(*RWMutex).RLock(0xc2082ce140)
/root/.gvm/gos/go1.4.2/src/sync/rwmutex.go:36 +0x5f
github.com/influxdb/influxdb/tsdb.(*Shard).validateSeriesAndFields(0xc2082e06c0, 0xc212794000, 0x3e8, 0x400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/root/go/src/github.com/influxdb/influxdb/tsdb/shard.go:336 +0x135
github.com/influxdb/influxdb/tsdb.(*Shard).WritePoints(0xc2082e06c0, 0xc212794000, 0x3e8, 0x400, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tsdb/shard.go:156 +0x75
github.com/influxdb/influxdb/tsdb.(*Store).WriteToShard(0xc2080321a0, 0x1, 0xc212794000, 0x3e8, 0x400, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tsdb/store.go:303 +0xf7
github.com/influxdb/influxdb/cluster.func·002(0x1, 0x1, 0xc212794000, 0x3e8, 0x400)
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:241 +0xfc
created by github.com/influxdb/influxdb/cluster.(*PointsWriter).writeToShard
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:271 +0x30b
goroutine 167 [select]:
github.com/influxdb/influxdb/tsdb/engine/wal.(*Log).autoflusher(0xc209908620, 0xc20b9c0f00)
/root/go/src/github.com/influxdb/influxdb/tsdb/engine/wal/wal.go:603 +0x4d3
created by github.com/influxdb/influxdb/tsdb/engine/wal.(*Log).Open
/root/go/src/github.com/influxdb/influxdb/tsdb/engine/wal/wal.go:219 +0x79b
goroutine 790 [select]:
github.com/influxdb/influxdb/tsdb/engine/wal.(*Log).autoflusher(0xc2080acc40, 0xc211e45860)
/root/go/src/github.com/influxdb/influxdb/tsdb/engine/wal/wal.go:603 +0x4d3
created by github.com/influxdb/influxdb/tsdb/engine/wal.(*Log).Open
/root/go/src/github.com/influxdb/influxdb/tsdb/engine/wal/wal.go:219 +0x79b
goroutine 1160 [semacquire]:
sync.(*Mutex).Lock(0xc2082ce140)
/root/.gvm/gos/go1.4.2/src/sync/mutex.go:66 +0xd3
sync.(*RWMutex).Lock(0xc2082ce140)
/root/.gvm/gos/go1.4.2/src/sync/rwmutex.go:82 +0x33
github.com/influxdb/influxdb/tsdb.(*Shard).WritePoints(0xc2082e06c0, 0xc214db6000, 0x381, 0x400, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tsdb/shard.go:163 +0x14c
github.com/influxdb/influxdb/tsdb.(*Store).WriteToShard(0xc2080321a0, 0x1, 0xc214db6000, 0x381, 0x400, 0x0, 0x0)
/root/go/src/github.com/influxdb/influxdb/tsdb/store.go:303 +0xf7
github.com/influxdb/influxdb/cluster.func·002(0x1, 0x1, 0xc214db6000, 0x381, 0x400)
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:241 +0xfc
created by github.com/influxdb/influxdb/cluster.(*PointsWriter).writeToShard
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:271 +0x30b
goroutine 1187 [select]:
github.com/influxdb/influxdb/cluster.(*PointsWriter).writeToShard(0xc20800e230, 0xc21337ca00, 0xb37f10, 0x8, 0xc2080546e8, 0x7, 0x1, 0xc212794000, 0x3e8, 0x400, ...)
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:278 +0x913
github.com/influxdb/influxdb/cluster.func·001(0xc21337ca00, 0xb37f10, 0x8, 0xc2080546e8, 0x7, 0xc212794000, 0x3e8, 0x400)
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:205 +0xad
created by github.com/influxdb/influxdb/cluster.(*PointsWriter).WritePoints
/root/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:206 +0x447
rax 0xb723
rbx 0x46
rcx 0xc210a1ca20
rdx 0x4d
rdi 0xc20da7d900
rsi 0xc2081f6320
rbp 0x46
rsp 0xc2080f28c8
r8 0xc2080f28e8
r9 0xa555
r10 0xc21df00000
r11 0xa90fe0
r12 0x61a
r13 0x0
r14 0x0
r15 0x51d
rip 0x4415cb
rflags 0x206
cs 0x33
fs 0x0
gs 0x0
The text was updated successfully, but these errors were encountered:
Writes could timeout and when adding new measurement names to the
index if the sort took a long time. The names slice was never
actually used (except a test) so keeping it in index wastes memory
and sort it wastes CPU and increases lock contention. The sorting
was happening while the shard held a write-lock.
Fixes#3869
Hello,
First off, apologies but I don't have a reproducible test for this yet.
The behaviour we see with Influxdb
0.9.3
with real life clients sending graphite metrics to the graphite plugin port of influxdb is this:The db keeps spinning on CPU and we see a slew of write timeouts in log in between wal flushes 'due to idle'. I presume both of these cannot be happening at same time. Is this a case of a deadlock somewhere?
Can see the measurements in show measurements but data points are missing. No surprise there, given the write timeouts. The surprise is there's nothing hitting the disk and the wal is seemingly idle.
The actual data being sent is ~450K graphite metrics/minute from about 25k unique series. This is real data from real graphite clients and is being ingested in both graphite and influxdb 0.8 with no issues.
I am in the process of trying to replicate this locally - no luck yet. Would appreciate any pointers in the meantime. I can provide a profile of influxdb while it is doing this if that would be helpful.
Iostat shows data drive being hit very very little, few hundred KB per minute.
Influxdb logs and induced stack trace by SIGQUIT while this is happening:
The text was updated successfully, but these errors were encountered: