Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash after convert from bz1 -> tsm1 (0.9.6.1 -> 0.10.0-beta2/Nightly) #5468

Closed
ljagiello opened this issue Jan 28, 2016 · 10 comments
Closed
Assignees
Labels

Comments

@ljagiello
Copy link

Metrics were collected with telegraf (rabbitmq plugin), saved in Influxdb 0.9.6.1 (default bz1 engine). Database converted with influx_tsm bz1 -> tsm1. After start and query instant crash (0.10-beta2 and nightly build is the same).

2016-01-28_01:12:45.55646 [query] 2016/01/28 01:12:45 SELECT last(message_bytes) FROM telegraf."default".rabbitmq_queue WHERE host =~ /^rabbit-s/ AND queue !~ /(^celeryev|pidbox$|^federation)/ AND time > now() - 1h GROUP BY queue
2016-01-28_01:12:45.55649 [query] 2016/01/28 01:12:45 SELECT last(messages) FROM telegraf."default".rabbitmq_queue WHERE host =~ /^rabbit-s/ AND queue !~ /(^celeryev|pidbox$|^federation)/ AND time > now() - 1h GROUP BY queue
2016-01-28_01:12:45.55675 [cluster] 2016/01/28 01:12:45 accept remote connection from 127.0.0.1:57966
2016-01-28_01:12:45.55676 [cluster] 2016/01/28 01:12:45 accept remote connection from 127.0.0.1:57967
2016-01-28_01:12:45.57212 panic: interface conversion: interface is float64, not int64
2016-01-28_01:12:45.57214
2016-01-28_01:12:45.57215 goroutine 168 [running]:
2016-01-28_01:12:45.57215 github.com/influxdb/influxdb/tsdb.greaterThan(0x9b1bc0, 0xc20aa04e20, 0x9b0c40, 0xc20aa04a88, 0xc219683700)
2016-01-28_01:12:45.57215       /tmp/tmp.6XaWVnbu0C/src/github.com/influxdb/influxdb/tsdb/functions.go:1723 +0x1d1
2016-01-28_01:12:45.57216 github.com/influxdb/influxdb/tsdb.ReduceLast(0xc219683600, 0x2, 0x2, 0x0, 0x0)
2016-01-28_01:12:45.57216       /tmp/tmp.6XaWVnbu0C/src/github.com/influxdb/influxdb/tsdb/functions.go:1086 +0x39b
2016-01-28_01:12:45.57216 github.com/influxdb/influxdb/tsdb.(*AggregateExecutor).execute(0xc2233d31a0, 0xc22683c180, 0xc22683a420)
2016-01-28_01:12:45.57217       /tmp/tmp.6XaWVnbu0C/src/github.com/influxdb/influxdb/tsdb/aggregate.go:144 +0x1449
2016-01-28_01:12:45.57217 created by github.com/influxdb/influxdb/tsdb.(*AggregateExecutor).Execute
2016-01-28_01:12:45.57217       /tmp/tmp.6XaWVnbu0C/src/github.com/influxdb/influxdb/tsdb/aggregate.go:48 +0x64
2016-01-28_01:12:45.57218
2016-01-28_01:12:45.57218 goroutine 1 [chan receive]:
2016-01-28_01:12:45.57218 main.(*Main).Run(0xc20802d700, 0xc20800a010, 0x4, 0x4, 0x0, 0x0)
2016-01-28_01:12:45.57218       /tmp/tmp.6XaWVnbu0C/src/github.com/influxdb/influxdb/cmd/influxd/main.go:96 +0x7a1
2016-01-28_01:12:45.57219 main.main()
2016-01-28_01:12:45.57219       /tmp/tmp.6XaWVnbu0C/src/github.com/influxdb/influxdb/cmd/influxd/main.go:46 +0xdc
2016-01-28_01:12:45.57219
2016-01-28_01:12:45.57219 goroutine 6 [syscall]:
2016-01-28_01:12:45.57220 os/signal.loop()
2016-01-28_01:12:45.57220       /root/.gvm/gos/go1.4.3/src/os/signal/signal_unix.go:21 +0x1f
2016-01-28_01:12:45.57220 created by os/signal.init·1
2016-01-28_01:12:45.57221       /root/.gvm/gos/go1.4.3/src/os/signal/signal_unix.go:27 +0x35
2016-01-28_01:12:45.57221

After removing databases and starting from scratch with 0.10.0-beta2 everything with the same query is working perfectly fine.

@joelegasse
Copy link
Contributor

It looks like the field being compared is storing two different value types (integer and floating point). We should probably be handling this gracefully, rather than assuming that all values in a given field have the same type.

@joelegasse joelegasse self-assigned this Jan 28, 2016
@joelegasse
Copy link
Contributor

On further review, a field shouldn't be able to have more than one value type.

@ljagiello Are you able to consistently reproduce this? Do you have more detailed instructions or a sample database that I could use to better track down how/why this is happening? If not, I can try to reproduce it locally, but it might take longer to figure out how you got multiple value types in the same field.

@ljagiello
Copy link
Author

@joelegasse I'll try to create a small test for that later today.

@desa
Copy link
Contributor

desa commented Jan 29, 2016

@joelegasse its possible to have the same field be a different type for the same measurement, provided that they're in different shards.

@joelegasse
Copy link
Contributor

@mjdesa Thanks for the info. I'll start running that down in the morning. It sounds like we are going to have to gracefully handle comparisons of various types.

@jwilder jwilder added the panic label Jan 29, 2016
@ljagiello
Copy link
Author

@joelegasse I tried with multiple data sets and I was unable to reproduce this bug in my dev environment. It might be as @mjdesa suggest different type in a different shard.

@joelegasse
Copy link
Contributor

@ljagiello Are you running a cluster of nodes, or just a single node?

@ljagiello
Copy link
Author

Just a single node

@rossmcdonald
Copy link
Contributor

@ljagiello Can you send us the output to a SHOW SERVERS command?

@joelegasse
Copy link
Contributor

I'm going to close this issue and focus on #5478, since it looks like clustering is causing the problem. The beta 2 release had an issue that caused some single-node instances to split into multiple data nodes, and I think that's why you saw the clustering panic on a single node.

@ljagiello Please let me know if you run into this again on a single node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants