Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DROP SUBSCRIPTION query called on invalid retention policy corrupts database #5545

Closed
robinsk opened this issue Feb 5, 2016 · 6 comments
Closed
Assignees
Milestone

Comments

@robinsk
Copy link

robinsk commented Feb 5, 2016

It seems like I accidentally corrupted the database by running a DROP subscription query while playing around with kapacitor.

  • CentOS release 6.6 (Final), 2.6.32-504.16.2.el6.x86_64
  • InfluxDB version 0.9.6.1-1 installed from yum

We installed kapacitor and started it with this config.

[influxdb]
  # Connect to an InfluxDB cluster
  # Kapacitor can subscribe, query and write to this cluster.
  # Using InfluxDB is not required and can be disabled.
  enabled = true
  urls = ["http://localhost:8086"]
  username = "telegraf"
  password = "<%= scope.lookupvar('influx::params::telegraf_password') %>"
  timeout = 0
  # Subscriptions use the UDP network protocl.
  # The following options of for the created UDP listeners for each subscription.
  # Number of packets to buffer when reading packets off the socket.
  udp-buffer = 1000
  # The size in bytes of the OS read buffer for the UDP socket.
  # A value of 0 indicates use the OS default.
  udp-read-buffer = 0

  [influxdb.subscriptions]
    # Set of databases and retention policies to subscribe to.
    # If empty will subscribe to all, minus the list in
    # influxdb.excluded-subscriptions
    #
    # Format
    # db_name = <list of retention policies>
    #
    # Example:
    # my_database = [ "default", "longterm" ]
  [influxdb.excluded-subscriptions]
    # Set of databases and retention policies to exclude from the subscriptions.
    # If influxdb.subscriptions is empty it will subscribe to all
    # except databases listed here.
    #
    # Format
    # db_name = <list of retention policies>
    #
    # Example:
    # my_database = [ "default", "longterm" ]

Since we didn't specify any subscriptions, kapacitor set up the following:

curl -G http://localhost:8086/query --data-urlencode "q=SHOW subscriptions"

name: _internal
---------------
retention_policy    name        mode    destinations
monitor         kapacitor   ANY [udp://localhost:35359]


name: telegraf
--------------
retention_policy    name        mode    destinations
default         kapacitor   ANY [udp://localhost:50576]

I changed the kapacitor config to only have the subscription telegraf = ["default"], and figured I should delete the old subscription on _internal:

curl -G http://localhost:8086/query --data-urlencode "q=drop subscription kapacitor on _internal.default"

That produced the following output:

ERR: error parsing query: found DEFAULT, expected identifier at line 1, char 42

So I found this documentation (https://github.com/influxdata/influxdb/blob/master/influxql/INFLUXQL.md) and added quotes to it:

curl -G http://localhost:8086/query --data-urlencode 'q=drop subscription kapacitor on "_internal"."default"'

Output:

ERR: Get http://localhost:8086/query?db=_internal&epoch=ns&q=drop+subscription+kapacitor+on+%22_internal%22.%22default%22: EOF

I didn't care too much about the output, thinking I still had my drop syntax wrong, but when I ran select * from _internal.monitor I got this:

ERR: Get http://localhost:8086/query?db=_internal&epoch=ns&q=select+%2A+from+_internal.monitor: dial tcp 127.0.0.1:8086: connection refused

It turns out influxdb died. Here's what the logs said: https://gist.github.com/robinsk/ad7564ddeb469fdb69e7#file-influxdb-log

Mainly:

[query] 2016/02/05 09:59:22 DROP SUBSCRIPTION kapacitor ON _internal."default"
panic: runtime error: invalid memory address or nil pointer dereference

When trying to start the influxdb service again, it fails, with almost the same things in the logs:
https://gist.github.com/robinsk/ad7564ddeb469fdb69e7#file-influxdb_starting-log

Is there any way to restore normal operations on this database, or do I have to restore from backup? This is in a vm in vagrant for testing, so it didn't matter that much this time, but that makes me reluctant of running the same in production.

@robinsk
Copy link
Author

robinsk commented Feb 5, 2016

Cannot reproduce this in latest version (0.10.0).

@zstyblik
Copy link

zstyblik commented Feb 5, 2016

@robinsk can the issue be closed then?

@robinsk
Copy link
Author

robinsk commented Feb 7, 2016

I guess. Though it would be interesting to know if there is a way to get the data out and use in a new db. It seems the data is intact, because the new upgrade util didn't fail, but the server fails starting when replaying a drop subscription statement. It's weird that the statement didn't fail in a fresh 0.10.0 install, but it was there for the upgraded database, with a slightly different stacktrace.

@rossmcdonald
Copy link
Contributor

I've hit the same issue with 0.10.0. Stack trace here:

[query] 2016/02/08 21:46:21 DROP SUBSCRIPTION kapacitor ON _internal."default"
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x48 pc=0x56d1b0]

goroutine 12 [running]:
github.com/influxdb/influxdb/services/meta.(*Data).DropSubscription(0xc208c33290, 0xc2083ebec0, 0x9, 0xc2083ebe78, 0x7, 0xc2083ebea0, 0x9, 0x0, 0x0)
        /tmp/tmp.r1Nea5VXEw/src/github.com/influxdb/influxdb/services/meta/data.go:606 +0xc0
github.com/influxdb/influxdb/services/meta.(*storeFSM).applyDropSubscriptionCommand(0xc20800c1b0, 0xc2081ee540, 0x0, 0x0)
        /tmp/tmp.r1Nea5VXEw/src/github.com/influxdb/influxdb/services/meta/store_fsm.go:404 +0x1c3
github.com/influxdb/influxdb/services/meta.func·009(0x0, 0x0)
        /tmp/tmp.r1Nea5VXEw/src/github.com/influxdb/influxdb/services/meta/store_fsm.go:66 +0x6f9
github.com/influxdb/influxdb/services/meta.(*storeFSM).Apply(0xc20800c1b0, 0xc2083c8200, 0x0, 0x0)
        /tmp/tmp.r1Nea5VXEw/src/github.com/influxdb/influxdb/services/meta/store_fsm.go:94 +0x2de
github.com/hashicorp/raft.(*Raft).runFSM(0xc208084380)
        /tmp/tmp.r1Nea5VXEw/src/github.com/hashicorp/raft/raft.go:564 +0xdfc
github.com/hashicorp/raft.*Raft.(github.com/hashicorp/raft.runFSM)·fm()
        /tmp/tmp.r1Nea5VXEw/src/github.com/hashicorp/raft/raft.go:253 +0x27
github.com/hashicorp/raft.func·011()
        /tmp/tmp.r1Nea5VXEw/src/github.com/hashicorp/raft/state.go:152 +0x51
created by github.com/hashicorp/raft.(*raftState).goFunc
        /tmp/tmp.r1Nea5VXEw/src/github.com/hashicorp/raft/state.go:153 +0xe3

When trying to restart the database, it crashes again:

2016/02/08 21:50:15 InfluxDB starting, version 0.10.0, branch 0.10.0, commit b8bb32ecad9808ef00219e7d2469514890a0987a, built 2016-02-04T17:06:04.850564
2016/02/08 21:50:15 Go version go1.4.3, GOMAXPROCS set to 1
2016/02/08 21:50:15 Using configuration at: /etc/influxdb/influxdb.conf
[meta] 2016/02/08 21:50:15 Starting meta service
[meta] 2016/02/08 21:50:15 Listening on HTTP: [::]:8091
[metastore] 2016/02/08 21:50:15 Using data dir: /var/lib/influxdb/meta
[metastore] 2016/02/08 21:50:15 Node at kapacitor1:8088 [Follower]
[metastore] 2016/02/08 21:50:17 Node at kapacitor1:8088 [Leader]. peers=[kapacitor1:8088]
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x48 pc=0x56d1b0]

goroutine 12 [running]:
github.com/influxdb/influxdb/services/meta.(*Data).DropSubscription(0xc20800d8c0, 0xc2080f0fc0, 0x9, 0xc2080f0fa9, 0x7, 0xc2080f0fa0, 0x9, 0x0, 0x0)
        /tmp/tmp.r1Nea5VXEw/src/github.com/influxdb/influxdb/services/meta/data.go:606 +0xc0
github.com/influxdb/influxdb/services/meta.(*storeFSM).applyDropSubscriptionCommand(0xc20800c240, 0xc208126270, 0x0, 0x0)
        /tmp/tmp.r1Nea5VXEw/src/github.com/influxdb/influxdb/services/meta/store_fsm.go:404 +0x1c3
github.com/influxdb/influxdb/services/meta.func·009(0x0, 0x0)
        /tmp/tmp.r1Nea5VXEw/src/github.com/influxdb/influxdb/services/meta/store_fsm.go:66 +0x6f9
github.com/influxdb/influxdb/services/meta.(*storeFSM).Apply(0xc20800c240, 0xc2080e5940, 0x0, 0x0)
        /tmp/tmp.r1Nea5VXEw/src/github.com/influxdb/influxdb/services/meta/store_fsm.go:94 +0x2de
github.com/hashicorp/raft.(*Raft).runFSM(0xc2080c2380)
        /tmp/tmp.r1Nea5VXEw/src/github.com/hashicorp/raft/raft.go:564 +0xdfc
github.com/hashicorp/raft.*Raft.(github.com/hashicorp/raft.runFSM)·fm()
        /tmp/tmp.r1Nea5VXEw/src/github.com/hashicorp/raft/raft.go:253 +0x27
github.com/hashicorp/raft.func·011()
        /tmp/tmp.r1Nea5VXEw/src/github.com/hashicorp/raft/state.go:152 +0x51
created by github.com/hashicorp/raft.(*raftState).goFunc
        /tmp/tmp.r1Nea5VXEw/src/github.com/hashicorp/raft/state.go:153 +0xe3

@rossmcdonald
Copy link
Contributor

@robinsk It looks like you ran the exact same command that I did. I think this is caused by a DROP SUBSCRIPTION command being called on an invalid RP. The only retention policy enabled on the _internal database (by default) is called monitor, not default. I've tested the equivalent:

DROP SUBSCRIPTION kapacitor ON _internal."monitor"

Which works without issue. I've also tested this on a cluster, and it ends up corrupting every node, which is not good.

@rossmcdonald rossmcdonald changed the title DROP subscription query corrupted the database DROP SUBSCRIPTION query called on invalid retention policy corrupts database Feb 8, 2016
@jwilder jwilder added this to the 0.11.0 milestone Feb 8, 2016
@e-dard e-dard self-assigned this Feb 9, 2016
e-dard added a commit that referenced this issue Feb 9, 2016
e-dard added a commit that referenced this issue Feb 9, 2016
@e-dard e-dard closed this as completed in cfbb219 Feb 10, 2016
e-dard added a commit that referenced this issue Feb 10, 2016
e-dard added a commit that referenced this issue Feb 29, 2016
e-dard added a commit that referenced this issue Mar 2, 2016
@earthnut
Copy link

@rossmcdonald is there a way to get the data back ?
I run the same command "ON _internal."monitor"" . And influxDB failed to start up.
Is there a way to get the data back ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants