Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Influxdb crashes silently during drop/delete of database, retains points after re-create #570

Closed
hydandata opened this issue May 22, 2014 · 2 comments

Comments

@hydandata
Copy link

I have ~10GiB of data in a single influxdb database ~2k series (~100 series with ~2k columns, but most of them null). When I try to drop the database influxdb crashes silently (log level at debug) after period of intensive memory and cpu usage. Database disappears from the list, but upon re-creating it, all the old series/data is still there. There are some select queries coming in regularly (including when drop database is issued). Can reproduce by re-creating the database and deleting it again.

InfluxDB v0.6.4

This is during database deletion, queries that are coming in fail (presumably due to the db not being there any more. Actual select queries edited out):

[2014/05/22 19:55:54 CEST] DEBG (raft:75209d1) Executing leader loop.
[2014/05/22 19:55:55 CEST] INFO Compacting shard
[2014/05/22 19:55:55 CEST] INFO Shard compaction is done
[2014/05/22 19:55:55 CEST] DEBG (raft:75209d1) Executing leader loop.
...
[2014/05/22 19:56:10 CEST] DEBG (raft:75209d1) Executing leader loop.
[2014/05/22 19:56:11 CEST] DEBG Trying to auth as a db user
[2014/05/22 19:56:11 CEST] DEBG (raft:75209d1) Authenticating password for kpi:root
[2014/05/22 19:56:11 CEST] DEBG Authenticating as a db user failed with Invalid username/password (401)
...
[2014/05/22 19:56:11 CEST] DEBG BUFFER SIZE: %!(EXTRA int=1000)
[2014/05/22 19:56:11 CEST] DEBG Querying shards sequentially
[2014/05/22 19:56:11 CEST] DEBG Shard concurrent limit: %!(EXTRA int=1)
[2014/05/22 19:56:11 CEST] DEBG BUFFER SIZE: %!(EXTRA int=1000)
[2014/05/22 19:56:11 CEST] DEBG QUERYING: shard: %!(EXTRA int=0, string=[ID: 2, START: 1400112000000000, END: 1400716800000000, LOCAL: true, SERVERS: []])
...
[2014/05/22 19:57:15 CEST] DEBG Testing if we should compact the raft logs
[2014/05/22 19:57:16 CEST] DEBG (raft:75209d1) Executing leader loop.

After crash, service influxdb start:

[05/22/14 20:20:52] [INFO] Loading configuration file /opt/influxdb/shared/config.toml
[2014/05/22 20:20:52 CEST] INFO Redirectoring logging to /opt/influxdb/shared/log.tx
[2014/05/22 20:20:52 CEST] INFO Starting Influx Server InfluxDB v0.6.4 (git: 6e81a63) (levedb: 1.15) bound to 0.0.0.0...
[2014/05/22 20:20:52 CEST] INFO Opening database at /opt/influxdb/shared/data/db
[2014/05/22 20:20:52 CEST] INFO Opening wal in /opt/influxdb/shared/data/wal
[2014/05/22 20:20:52 CEST] INFO Opening log file /opt/influxdb/shared/data/wal/log1
[2014/05/22 20:20:53 CEST] INFO Opening index file /opt/influxdb/shared/data/wal/idex.1
[2014/05/22 20:20:53 CEST] DEBG suffix: 1, first suffix: 1
[2014/05/22 20:20:53 CEST] INFO Ssl will be disabled since the ssl pot or certificate path weren't set
[2014/05/22 20:20:53 CEST] INFO Initializing Raft HTTP server
[2014/05/22 20:20:53 CEST] INFO Raft Server Listening at 0.0.0.0:8090
[2014/05/22 20:20:53 CEST] INFO Initializing Raft Server: http://apkpivm:8090
[2014/05/22 20:20:53 CEST] INFO Recovering the cluster confiuration
[2014/05/22 20:20:53 CEST] INFO DATASTORE: openin or creating shard /opt/influxdb/shared/data/db/shard_db/00003
[2014/05/22 20:20:53 CEST] INFO DATASTORE: openin or creating shard /opt/influxdb/shared/data/db/shard_db/00002
[2014/05/22 20:20:53 CEST] INFO DATASTORE: openin or creating shard /opt/influxdb/shared/data/db/shard_db/00001
[2014/05/22 20:20:53 CEST] INFO Recovered from log
[2014/05/22 20:20:53 CEST] INFO Waiting for local server to be added
[2014/05/22 20:20:53 CEST] INFO Setting server id to 1 and recovering
[2014/05/22 20:20:53 CEST] DEBG Getting file size for /opt/influxdb/shared/data/wa/log.1[6]
[2014/05/22 20:20:53 CEST] INFO Checking /opt/influxdb/shared/data/wal/log.1, last 2733603065, size: 3416366177
[2014/05/22 20:20:53 CEST] DEBG Replaying from file offset 2733603065
[2014/05/22 20:20:53 CEST] INFO replaying from file location 273360065
[2014/05/22 20:20:53 CEST] DEBG recovery requestsSinceLastIndex: 1, requestNumber:5001
[2014/05/22 20:20:53 CEST] DEBG largestrequestnumber: 5001
...
[2014/05/22 20:21:13 CEST] DEBG largestrequestnumber: 5319
[2014/05/22 20:21:13 CEST] DEBG recovery requestsSinceLastIndex: 320, requestNumber: 5320
[2014/05/22 20:21:13 CEST] DEBG largestrequestnumber: 5320
[2014/05/22 20:21:13 CEST] DEBG Finished wal recovery
[2014/05/22 20:21:18 CEST] INFO Recovering from log...
[2014/05/22 20:21:18 CEST] INFO local: Initializing write buffer with buffer sie of 10000
[2014/05/22 20:21:18 CEST] INFO Waiting for servers t recover
[2014/05/22 20:21:18 CEST] INFO Recovering local server
[2014/05/22 20:21:18 CEST] DEBG replaying wal for server 1 ad shardIds []uint32{0x3, 0x2, 0x1}
[2014/05/22 20:21:18 CEST] INFO Recovering server 1 from requet 5321
[2014/05/22 20:21:18 CEST] INFO Recovered local server
[2014/05/22 20:21:18 CEST] INFO recovered
[2014/05/22 20:21:18 CEST] INFO Connecting toother nodes in the cluster
[2014/05/22 20:21:18 CEST] INFO Starting admin interface on port 8083
[2014/05/22 20:21:18 CEST] INFO Starting Http Api server on port 8086
[2014/05/22 20:21:18 CEST] INFO ProtobufServer listening n 0.0.0.0:8099
[2014/05/22 20:21:19 CEST] DEBG Trying to auth as adb user
[2014/05/22 20:21:19 CEST] DEBG (raft:75209d1) Authnticating password for kpi:root
[2014/05/22 20:21:19 CEST] DEBG Authenticating as adb user failed with Invalid username/password (401)

After re-creating the database:

[2014/05/22 20:27:34 CEST] DEBG Created database kpi
[2014/05/22 20:27:53 CEST] DEBG Testing if we should compact the raft logs

After this point I can query old series from the database and all series and old datapoints are still there.

...
[2014/05/22 20:29:38 CEST] DEBG PassthroughEngine YieldSeries 1
[2014/05/22 20:29:38 CEST] DEBG GOT RESPONSE: %!(EXTRA _protocol.Response_Type=QUERY)
[2014/05/22 20:29:38 CEST] [DEBG] (coordinator.(_CoordinatorImpl).readFromResponseChannels:363) YIELDING: 1 points with 1 columns
[2014/05/22 20:29:38 CEST] DEBG PassthroughEngine YieldSeries 1
[2014/05/22 20:29:38 CEST] DEBG GOT RESPONSE: %!(EXTRA _protocol.Response_Type=END_STREAM)
[2014/05/22 20:29:53 CEST] [DEBG] (coordinator.(_RaftServer).CompactLog:318) Testing if we should compact the raft logs

Deleting the database while running influxdb from cli yields no additional information or traces, it just crashes silently.

@chobie
Copy link
Contributor

chobie commented Jun 11, 2014

Deleting the database while running influxdb from cli yields no additional information or traces, it just crashes silently.

That looks like the work of a OOM Killer.

@pauldix
Copy link
Member

pauldix commented Jul 14, 2014

This is fixed in v0.8.0 with the addition of the shard spaces feature.

@pauldix pauldix closed this as completed Jul 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants