Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9.0-rc30] drop measurement while writing points to that measurement has race condition that can panic #2608

Closed
liyichao opened this issue May 19, 2015 · 4 comments
Assignees

Comments

@liyichao
Copy link

Below is the log:

[http] 2015/05/19 18:07:35 127.0.0.1 - - [19/May/2015:18:07:35 +0800] GET /query?db=collectd&q=drop+measurement+%22value%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 df45ba25-fe0e-11e4-86b7-000000000000 163.657862ms
panic: field does not exist for value

after the panic, I can never find any measurements at all (while there should be many):

> show measurements;
ERR: database not found: collectd; (/root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/server.go:2704)
@liyichao liyichao changed the title drop measurment panics [0.9.0-rc30] drop measurment panics May 19, 2015
@beckettsean
Copy link
Contributor

A few things to note:
drop measurement "value" should be nonsense, unless you've actually created a measurement named "value", which is not a good idea. For one, that's a keyword in InfluxDB and thus likely to cause confusion if used as an identifier, especially with third party tools that assume "value" is only used for field values. You'll end up with queries that look like SELECT mean(value) FROM "value"... and those will be difficult to understand for others.

I suspect the error is related to that. Can you drop other measurements without a panic?

In the last snippet, the error message is stating that the collectd database does not exist. Naturally a non-existent database has no measurements. What are the results of SHOW DATABASES? Did you run use collectd in the CLI before running SHOW MEASUREMENTS?

@beckettsean
Copy link
Contributor

It's not anything to do with "value", as I was able to create a measurement with that name and drop it without a panic:

> show measurements
name: measurements
------------------
name
another_series
myseries
triangle
value

> drop measurement "value"
> show measurements
name: measurements
------------------
name
another_series
myseries
triangle

0.9.0-RC30, no panic.

[http] 2015/05/19 13:20:48 127.0.0.1 - - [19/May/2015:13:20:48 -0700] GET /query?db=mydb&q=show+measurements HTTP/1.1 200 117 - InfluxDBShell/0.9 89c35e14-fe64-11e4-8230-000000000000 597.572µs
[http] 2015/05/19 13:20:59 127.0.0.1 - - [19/May/2015:13:20:59 -0700] GET /query?db=mydb&q=drop+measurement+%22value%22 HTTP/1.1 200 40 - InfluxDBShell/0.9 90128cc2-fe64-11e4-823b-000000000000 224.844391ms
[http] 2015/05/19 13:21:15 127.0.0.1 - - [19/May/2015:13:21:15 -0700] GET /query?db=mydb&q=show+measurements HTTP/1.1 200 115 - InfluxDBShell/0.9 997674d5-fe64-11e4-824c-000000000000 4.483294ms

Somehow it looks like your collectd database disappeared before the drop measurement query, at least that's my best guess.

Can you provide the ~50 log lines on either side of the panic?

@liyichao
Copy link
Author

I am sure collectd exists, and I use collectd before show measurements and sure there are many other measurements existing. I drop measurement when data is feeding in. It may be due to race conditions, I do not know. As to why measurement is called value, because influxdb's collectd plugin is not useable (drops a lot packet), so we use graphite input and use carbon-c-relay to rewrite the metric name, and there are some rules mistake 😄 . Below is the log:

[http] 2015/05/19 17:58:55 127.0.0.1 - - [19/May/2015:17:58:55 +0800] GET /query?db=&q=show+databases HTTP/1.1 200 105 - InfluxDBShell/0.9.0-rc30 a9291655-fe0d-11e4-849c-000000000000 4.235873ms
[http] 2015/05/19 17:59:03 127.0.0.1 - - [19/May/2015:17:59:03 +0800] GET /query?db=collectd&q=show+measurements HTTP/1.1 200 245 - InfluxDBShell/0.9.0-rc30 add7d2f6-fe0d-11e4-84a4-000000000000 621.348µs
[broker] 2015/05/19 17:59:43 executing truncation check for 3 topic(s)
[broker] 2015/05/19 17:59:43 topic /data/data/influxdb/raft/2, 5 segments, 52429025 bytes, deleted
[srvr] 2015/05/19 17:59:44 retention policy enforcement check commencing
[http] 2015/05/19 18:03:58 127.0.0.1 - - [19/May/2015:18:03:58 +0800] GET /ping HTTP/1.1 204 0 - InfluxDBShell/0.9.0-rc30 5e1a0b9c-fe0e-11e4-85cd-000000000000 69.276µs
[http] 2015/05/19 18:04:05 127.0.0.1 - - [19/May/2015:18:04:05 +0800] GET /query?db=collectd&q=show+measurements%3B HTTP/1.1 200 245 - InfluxDBShell/0.9.0-rc30 622882f9-fe0e-11e4-85d5-000000000000 787.561µs
[http] 2015/05/19 18:05:49 127.0.0.1 - - [19/May/2015:18:05:49 +0800] GET /query?db=collectd&q=drop+measurement+in HTTP/1.1 400 100 - InfluxDBShell/0.9.0-rc30 9ff7116e-fe0e-11e4-863d-000000000000 457.102µs
[http] 2015/05/19 18:05:54 127.0.0.1 - - [19/May/2015:18:05:54 +0800] GET /query?db=collectd&q=drop+measurement+%22in%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 a31a43fe-fe0e-11e4-8643-000000000000 93.626816ms
[http] 2015/05/19 18:06:00 127.0.0.1 - - [19/May/2015:18:06:00 +0800] GET /query?db=collectd&q=show+measurements HTTP/1.1 200 241 - InfluxDBShell/0.9.0-rc30 a682b73a-fe0e-11e4-864a-000000000000 967.212µs
[http] 2015/05/19 18:06:08 127.0.0.1 - - [19/May/2015:18:06:07 +0800] GET /query?db=collectd&q=drop+measurement+%22out%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 aaeebb8b-fe0e-11e4-8653-000000000000 137.200317ms
[http] 2015/05/19 18:06:17 127.0.0.1 - - [19/May/2015:18:06:17 +0800] GET /query?db=collectd&q=drop+measurement+%22longterm%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 b067f092-fe0e-11e4-865d-000000000000 63.377715ms
[http] 2015/05/19 18:06:22 127.0.0.1 - - [19/May/2015:18:06:22 +0800] GET /query?db=collectd&q=drop+measurement+%22majflt%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 b3bac445-fe0e-11e4-8663-000000000000 207.99404ms
[http] 2015/05/19 18:06:29 127.0.0.1 - - [19/May/2015:18:06:29 +0800] GET /query?db=collectd&q=drop+measurement+%22midterm%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 b79b1e7a-fe0e-11e4-866b-000000000000 353.221423ms
[http] 2015/05/19 18:06:34 127.0.0.1 - - [19/May/2015:18:06:34 +0800] GET /query?db=collectd&q=drop+measurement+%22minflt%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 ba9e4a53-fe0e-11e4-8671-000000000000 249.017647ms
[http] 2015/05/19 18:06:39 127.0.0.1 - - [19/May/2015:18:06:39 +0800] GET /query?db=collectd&q=drop+measurement+%22out%22 HTTP/1.1 200 136 - InfluxDBShell/0.9.0-rc30 be0b267c-fe0e-11e4-8678-000000000000 24.401357ms
[http] 2015/05/19 18:06:47 127.0.0.1 - - [19/May/2015:18:06:47 +0800] GET /query?db=collectd&q=show+measurements HTTP/1.1 200 226 - InfluxDBShell/0.9.0-rc30 c26f1ef4-fe0e-11e4-8680-000000000000 632.822µs
[http] 2015/05/19 18:06:54 127.0.0.1 - - [19/May/2015:18:06:53 +0800] GET /query?db=collectd&q=drop+measurement+%22read%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 c665c7d4-fe0e-11e4-8688-000000000000 337.456338ms
[http] 2015/05/19 18:06:57 127.0.0.1 - - [19/May/2015:18:06:57 +0800] GET /query?db=collectd&q=drop+measurement+%22rx%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 c8983369-fe0e-11e4-868c-000000000000 232.940782ms
[http] 2015/05/19 18:07:00 127.0.0.1 - - [19/May/2015:18:07:00 +0800] GET /query?db=collectd&q=drop+measurement+%22tx%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 ca54956a-fe0e-11e4-8690-000000000000 50.335832ms
[http] 2015/05/19 18:07:17 127.0.0.1 - - [19/May/2015:18:07:17 +0800] GET /query?db=collectd&q=drop+measurement+%22value%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 d45bd4ea-fe0e-11e4-86a2-000000000000 403.778294ms
[http] 2015/05/19 18:07:21 127.0.0.1 - - [19/May/2015:18:07:21 +0800] GET /query?db=collectd&q=drop+measurement+%22write%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 d6cca1ef-fe0e-11e4-86a7-000000000000 374.240688ms
[http] 2015/05/19 18:07:27 127.0.0.1 - - [19/May/2015:18:07:27 +0800] GET /query?db=collectd&q=show+measurements HTTP/1.1 200 217 - InfluxDBShell/0.9.0-rc30 da1d7f45-fe0e-11e4-86ae-000000000000 2.349196ms
[http] 2015/05/19 18:07:35 127.0.0.1 - - [19/May/2015:18:07:35 +0800] GET /query?db=collectd&q=drop+measurement+%22value%22 HTTP/1.1 200 40 - InfluxDBShell/0.9.0-rc30 df45ba25-fe0e-11e4-86b7-000000000000 163.657862ms
panic: field does not exist for value

goroutine 4486600 [running]:
github.com/influxdb/influxdb.(*FieldCodec).EncodeFields(0xc20f575610, 0xc20f4079e0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/database.go:801 +0x2fc
github.com/influxdb/influxdb.func·035(0x0, 0x0)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/server.go:1858 +0x75d
github.com/influxdb/influxdb.(*Server).WriteSeries(0xc208076300, 0xc2080ec000, 0x8, 0x9b86b0, 0x7, 0xc20dcc22c0, 0x1, 0x1, 0x0, 0x0, ...)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/server.go:1876 +0xbb5
github.com/influxdb/influxdb/graphite.(*TCPServer).handleConnection(0xc2081c29a0, 0x7f5600750dd8, 0xc208106098)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/graphite/graphite_tcp.go:117 +0x3c2
created by github.com/influxdb/influxdb/graphite.func·001
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/graphite/graphite_tcp.go:71 +0x3cd

goroutine 1 [chan receive (nil chan), 27 minutes]:
main.(*RunCommand).Run(0xc20803d380, 0xc20800a010, 0x4, 0x4, 0x0, 0x0)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/cmd/influxd/run.go:264 +0x806
main.main()
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/cmd/influxd/main.go:81 +0x495

goroutine 5 [syscall, 27 minutes]:
os/signal.loop()
        /root/.gvm/gos/go1.4.2/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
        /root/.gvm/gos/go1.4.2/src/os/signal/signal_unix.go:27 +0x35

goroutine 7 [chan receive, 27 minutes]:
main.func·004()
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/cmd/influxd/main.go:199 +0xed
created by main.startProfiling
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/cmd/influxd/main.go:202 +0x2dc

goroutine 8 [IO wait]:
net.(*pollDesc).Wait(0xc208010760, 0x72, 0x0, 0x0)
        /root/.gvm/gos/go1.4.2/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc208010760, 0x0, 0x0)
        /root/.gvm/gos/go1.4.2/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).accept(0xc208010700, 0x0, 0x7f560074bdb0, 0xc21217d1a8)
        /root/.gvm/gos/go1.4.2/src/net/fd_unix.go:419 +0x40b
net.(*TCPListener).AcceptTCP(0xc208038560, 0xc21591edc8, 0x0, 0x0)
        /root/.gvm/gos/go1.4.2/src/net/tcpsock_posix.go:234 +0x4e
net.(*TCPListener).Accept(0xc208038560, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.4.2/src/net/tcpsock_posix.go:244 +0x4c
net/http.(*Server).Serve(0xc2080d2060, 0x7f560074c878, 0xc208038560, 0x0, 0x0)

the goroutine that panics:

goroutine 4491542 [select]:
net/http.(*persistConn).writeLoop(0xc20dd7cbb0)
        /root/.gvm/gos/go1.4.2/src/net/http/transport.go:945 +0x41d
created by net/http.(*Transport).dialConn
        /root/.gvm/gos/go1.4.2/src/net/http/transport.go:661 +0xcbc
panic: field does not exist for value

@beckettsean beckettsean changed the title [0.9.0-rc30] drop measurment panics [0.9.0-rc30] drop measurement while writing points to that measurement has race condition that can panic May 20, 2015
@beckettsean
Copy link
Contributor

I'm able to reproduce, looks like a timing issue. When the database is under write load, occasionally issuing duplicate drop measurement statements causes an error just as reported.

Repro steps:

  1. start writing batches of 2000 points as fast as curl can send them
  2. from the CLI, issue drop measurement foo, where foo exists in the batches of points being written
  3. repeat drop measurement foo until stack trace occurs

When the stack trace happens, the terminal writing the curl statements had this output:

curl: (52) Empty reply from server
{"error":"series not found"}

Note that repeatedly issuing drop measurement foo where foo doesn't exist does not cause any issues. Also, repeatedly issuing drop measurement foo when not actively writing points to foo works just fine.

The error happens when writing points to a measurement that's also being dropped. I think that's safely classified as an edge case, since it's odd to drop an entire measurement and still be writing data into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants