Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Influxdb starts very slow with semi-large databases #945

Closed
vladimir-smirnov-sociomantic opened this issue Sep 17, 2014 · 14 comments
Closed

Comments

@vladimir-smirnov-sociomantic

I've got pretty large database - 67GB (~700k metrics), stored on SSD and restart process takes a lot of time - more than 20 minutes.

I've ran perf on that, and it shows that during the start most of the time is spent in this:

+  62.91%  influxdb  influxdb           [.] std::_Rb_tree_increment(std::_Rb_tree_node_base const*)
+  36.03%  influxdb  influxdb           [.] rocksdb::VersionSet::Builder::Apply(rocksdb::VersionEdit*)

Is there any way to speedup start process, cause 20 minutes is very-very slow?

Config settings are default.

@vladimir-smirnov-sociomantic
Copy link
Author

Also during startup it uses only one core and rather small amount of whole RAM.

@jvshahid
Copy link
Contributor

There's a related discussion here on this thread

@jvshahid
Copy link
Contributor

Can you create a gist with the content of ${INFLUXDB_DATA}/db/shard_db_v2/*/LOG after restarting influxdb. That will help us understand why opening shards on startup takes that long.

@vladimir-smirnov-sociomantic
Copy link
Author

https://yadi.sk/d/j4KIsb16bXMtB (unpacked size is around 160MB, packed - ~7.5MB)

They should contain restart information somewhere in the middle of them (it was restarted around 4PM UTC)

@vladimir-smirnov-sociomantic
Copy link
Author

https://yadi.sk/d/CaqQ7AeJbXNb7 - log of the restart process (from /etc/init.d/influxdb restart until first query after restart).

@nickhuber
Copy link

I'm having similiar issue. With under 1GB of data it took about 24 minuts to load.

Here is a gist of ${INFLUXDB_DATA}/db/shard_db_v2/*/LOG https://gist.github.com/nickhuber/6eba4b703119f34d2d1b

@jvshahid
Copy link
Contributor

@nickhuber the size of your data is manageable, can you archive your data directory and send it to us on our support email (support at influxdb dot com), I can't reproduce the issue and this will make it a lot easier to reproduce.

@vladimir-smirnov-sociomantic
Copy link
Author

To reproduce you can generate really a lot of metrics. Like use mine performance test tool for graphite: https://github.com/Civil/graphite_perf_test_go
Run it with several connections and large amount metrics per connection - like '-connections=2 -points=1000000 -runs=120' to generate significant amount of data (it'll generate 240M datapoints and 1M entries), then shutdown database and try to start it again.

@abh
Copy link

abh commented Sep 28, 2014

FWIW I ran into the same thing trying to restart influxdb to make sure it works. My data directory is about 90GB; about 10 shards and 200k series.

@pkittenis
Copy link

Deleting files in the wal directory sped up startup time for me. ~1hour after process start it was not up yet, deleted wal files and it started up immediately. Obviously any writes not commited to disk that were to be replayed by the wal files will be lost if you do this.

@abh
Copy link

abh commented Oct 9, 2014

My wal directory is empty except for a several days old "bookmark" file and I still have > 60 minute startup times. :-/

@XANi
Copy link

XANi commented Oct 10, 2014

From a bunch of my simple tests ( https://github.com/XANi/toolbox/blob/master/rocks/rock.pl ). I've noticed that replaying RocksDB log is very DB-size dependend (double size of DB increases that more than 2x) so if db is not closed propertly, re-opening it on start takes ages.

That seems to occur even on clean influxdb stop ; influxdb start

@toddboom
Copy link
Contributor

The underlying datastore is undergoing a complete rewrite for v0.9.0, so this issue will no longer be relevant. Closing it out.

@MikeSchroll
Copy link
Contributor

This is being currently discussed over here: #5764

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants