Too many open files #280

nichdiekuh · 2014-02-27T08:58:41Z

After upgrading to 0.5 I quickly run into "too many open files" while batch-inserting lots of events. Haven't had this issue on 0.4 with the same script migrating the data:


2014/02/27 00:00:18 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:19 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:20 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:21 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:22 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:23 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:24 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:25 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:26 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:27 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:28 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:29 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:30 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:31 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:32 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:33 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:34 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:35 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:36 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:37 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:38 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:39 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:40 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:41 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:42 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:43 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:44 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:45 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:46 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:47 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:48 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:49 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:50 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:51 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:52 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:53 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:54 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:55 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:56 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:57 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:58 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:59 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
[02/27/14 00:01:00] [WARN] Rotating request log...
panic: IO error: /opt/influxdb/shared/data/db/request_logs/2014-5872944-27/LOCK: Too many open files

goroutine 6 [running]:
runtime.panic(0x84e3a0, 0xc21b9bdd60)
        /home/vagrant/bin/go/src/pkg/runtime/panic.c:266 +0xb6
datastore.(*LevelDbDatastore).rotateRequestLog(0xc210071000)
        /home/vagrant/influxdb/src/datastore/leveldb_datastore.go:229 +0x25a
datastore.(*LevelDbDatastore).periodicallyRotateRequestLog(0xc210071000)
        /home/vagrant/influxdb/src/datastore/leveldb_datastore.go:214 +0x4f
created by datastore.NewLevelDbDatastore
        /home/vagrant/influxdb/src/datastore/leveldb_datastore.go:205 +0x700

goroutine 1 [sleep]:
time.Sleep(0x3b9aca00)
        /tmp/bindist375750859/go/src/pkg/runtime/time.goc:31 +0x31

The text was updated successfully, but these errors were encountered:

schmurfy · 2014-02-27T09:12:48Z

it may not be the issue but what are you using to insert the data ?
I initially had this with 0.4.x with a faulty http library which kept nearly every connections open causing both sides to keep a lot of opened files.

nichdiekuh · 2014-02-27T09:32:58Z

I already experienced that before on 0.4 and I'm pretty sure I solved it. Atm I'm at ~1500 connections

Influx Server:

 netstat -an | grep -e tcp -e udp | wc -l
1436

App Server, sending the inserts:

netstat -an | grep -e tcp -e udp | wc -l
236

Both values are pretty stable, as far as I can tell from watching them.

pauldix · 2014-02-27T14:04:49Z

This is probably related to #277, which we'll be releasing a fix for today in rc.2. You'll have to blow away your data directories and start fresh. If the problem is still there please re-open this issue.

nichdiekuh · 2014-02-27T21:33:33Z

Even with RC2, the problem persists:

                                                                                                                                                   94%
[2014/02/27 19:58:15 CET] [INFO] (cluster.(*ClusterConfiguration).AddShards:856) Adding short term shard: 26 - start: Thu Jan 2 01:00:00 +0100 CET 2014. end: Thu Jan 9 01:00:00 +0100 CET 2014. isLocal: %!d(b
ool=true). servers: [%!s(uint32=1)]
[02/27/14 19:58:22] [INFO] No matching shards for write at time 1389225605000000u, creating...
[02/27/14 19:58:22] [INFO] createShards: start: Thu Jan 9 01:00:00 +0100 CET 2014. end: Thu Jan 16 01:00:00 +0100 CET 2014
[2014/02/27 19:58:22 CET] [INFO] (cluster.(*ClusterConfiguration).GetShardToWriteToBySeriesAndTime:605) No matching shards for write at time 1389225605000000u, creating...
[2014/02/27 19:58:22 CET] [INFO] (cluster.(*ClusterConfiguration).createShards:647) createShards: start: Thu Jan 9 01:00:00 +0100 CET 2014. end: Thu Jan 16 01:00:00 +0100 CET 2014
[2014/02/27 19:58:22 CET] [INFO] (datastore.(*LevelDbShardDatastore).GetOrCreateShard:114) DATASTORE: opening or creating shard /opt/influxdb/shared/data/db/shard_db/00027
[02/27/14 19:58:22] [INFO] DATASTORE: opening or creating shard /opt/influxdb/shared/data/db/shard_db/00027
[02/27/14 19:58:22] [INFO] Adding short term shard: 27 - start: Thu Jan 9 01:00:00 +0100 CET 2014. end: Thu Jan 16 01:00:00 +0100 CET 2014. isLocal: %!d(bool=true). servers: [%!s(uint32=1)]
[2014/02/27 19:58:22 CET] [INFO] (cluster.(*ClusterConfiguration).AddShards:856) Adding short term shard: 27 - start: Thu Jan 9 01:00:00 +0100 CET 2014. end: Thu Jan 16 01:00:00 +0100 CET 2014. isLocal: %!d(
bool=true). servers: [%!s(uint32=1)]
[02/27/14 20:02:06] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.2
[2014/02/27 20:02:06 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.2
[02/27/14 20:12:58] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.3
[2014/02/27 20:12:58 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.3
[2014/02/27 20:22:44 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.4
[02/27/14 20:22:44] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.4
[02/27/14 20:34:24] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.5
[2014/02/27 20:34:24 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.5
[02/27/14 20:47:53] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.6
[2014/02/27 20:47:53 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.6
[02/27/14 20:59:51] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.7
[2014/02/27 20:59:51 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.7
2014/02/27 21:01:19 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 5ms

pauldix · 2014-02-27T21:36:16Z

Are you inserting a bunch of events all in the same time period or are the timestamps spread out ov multiple days? Also, what's your open file limit?

nichdiekuh · 2014-02-27T22:24:09Z

I'm inserting about 160 timeseries, each with >1Mio events, in chunks of 1000 events per insert. The data goes back to Nov/Dec 2012.

[root@db1:~]# cat /proc/sys/fs/file-max
678727
[root@db1:~]# ulimit -Hn
4096
[root@db1:~]# ulimit -Sn
1024
[root@db1:~]# cat /etc/*release
CentOS release 6.4 (Final)

pauldix · 2014-02-27T22:47:36Z

That's definitely low for an InfluxDB server. Particularly if you're going to have 1500 concurrent connections. Any qualms about setting it to 100k or something like that?

I did just notice that in the big refactor the max-open-files option didn't get moved over. I just pushed a commit that fixes that. However, a new LevelDB is created per shard, so if you're writing a ton of data in, that max-open-files option will be per shard. So set it accordingly. We might have to add another option of the max number of open shards at a time.

nichdiekuh · 2014-02-27T22:59:31Z

Ok, I've increased the file limits and will see how it goes overnight. The old limits were the defaults and I wasn't aware this could be an issue.
A max-open-shards limit however sounds reasonable.

nichdiekuh · 2014-02-28T08:16:41Z

Yep, that seems to have solved my issue, thank you @pauldix :)

MartinNowak · 2014-09-08T10:47:44Z

I had to increase my soft limit for maximum number of open files, which was set to 1024.
http://linux.die.net/man/5/limits.conf

sfeilmeier · 2014-09-22T19:36:00Z

I would like to add this comment here for reference, as this issue is still one of the first results on Google:
When controlling the influxdb service using systemd, it is necessary to use the option LimitNOFILE to set the maximum file limits. The settings suggested in the InfluxDB installation guide (by Basho: http://docs.basho.com/riak/latest/ops/tuning/open-files-limit/) are not applying in a systemd environment. Here is my complete influxdb.service file:

[Unit]
Description=InfluxDB
After=network.target

[Service]
User=influxdb
Group=influxdb
Type=simple
WorkingDirectory=/opt/influxdb
ExecStart=/usr/bin/influxdb -config /opt/influxdb/shared/config.toml
Restart=always
RestartSec=10
LimitNOFILE=infinity

[Install]
WantedBy=multi-user.target

iantheconway · 2017-05-02T00:37:14Z

I'm having a similar issue in 1.2.3; after writing a few hundred rows I get an error message about too many open files. In order to fix it should I set my ulimit or kern.maxfiles to something higher? Or is there a configuration option in influxdb.conf?

linuxmail · 2018-05-07T06:54:58Z

hi,
I have the same problem under Debian Stretch with 1.6.1-1 (and below). I tried to change it via systemctl edit telegraf.service or "vi /lib/systemd/system/telegraf.service" but ..

su telegraf --shell /bin/bash --command "ulimit -a" |grep open
open files                      (-n) 50000

cu denny

pauldix closed this as completed Feb 27, 2014

clamm mentioned this issue Nov 20, 2014

"panic: raft: Has been stopped" on startup #1123

Closed

denderello mentioned this issue Feb 13, 2015

Raise max-open-shards to 1000 giantswarm/docker-influxdb-min#3

Merged

dilshat mentioned this issue Jan 10, 2018

Error when a networks glitch happens and a template import is in progress subutai-io/agent#251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many open files #280

Too many open files #280

nichdiekuh commented Feb 27, 2014

schmurfy commented Feb 27, 2014

nichdiekuh commented Feb 27, 2014

pauldix commented Feb 27, 2014

nichdiekuh commented Feb 27, 2014

pauldix commented Feb 27, 2014

nichdiekuh commented Feb 27, 2014

pauldix commented Feb 27, 2014

nichdiekuh commented Feb 27, 2014

nichdiekuh commented Feb 28, 2014

MartinNowak commented Sep 8, 2014

sfeilmeier commented Sep 22, 2014

iantheconway commented May 2, 2017

linuxmail commented May 7, 2018

Too many open files #280

Too many open files #280

Comments

nichdiekuh commented Feb 27, 2014

schmurfy commented Feb 27, 2014

nichdiekuh commented Feb 27, 2014

pauldix commented Feb 27, 2014

nichdiekuh commented Feb 27, 2014

pauldix commented Feb 27, 2014

nichdiekuh commented Feb 27, 2014

pauldix commented Feb 27, 2014

nichdiekuh commented Feb 27, 2014

nichdiekuh commented Feb 28, 2014

MartinNowak commented Sep 8, 2014

sfeilmeier commented Sep 22, 2014

iantheconway commented May 2, 2017

linuxmail commented May 7, 2018