Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many open files #280

Closed
nichdiekuh opened this issue Feb 27, 2014 · 13 comments
Closed

Too many open files #280

nichdiekuh opened this issue Feb 27, 2014 · 13 comments

Comments

@nichdiekuh
Copy link

After upgrading to 0.5 I quickly run into "too many open files" while batch-inserting lots of events. Haven't had this issue on 0.4 with the same script migrating the data:


2014/02/27 00:00:18 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:19 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:20 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:21 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:22 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:23 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:24 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:25 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:26 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:27 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:28 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:29 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:30 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:31 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:32 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:33 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:34 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:35 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:36 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:37 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:38 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:39 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:40 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:41 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:42 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:43 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:44 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:45 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:46 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:47 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:48 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:49 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:50 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:51 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:52 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:53 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:54 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:55 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:56 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:57 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:58 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
2014/02/27 00:00:59 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 1s
[02/27/14 00:01:00] [WARN] Rotating request log...
panic: IO error: /opt/influxdb/shared/data/db/request_logs/2014-5872944-27/LOCK: Too many open files

goroutine 6 [running]:
runtime.panic(0x84e3a0, 0xc21b9bdd60)
        /home/vagrant/bin/go/src/pkg/runtime/panic.c:266 +0xb6
datastore.(*LevelDbDatastore).rotateRequestLog(0xc210071000)
        /home/vagrant/influxdb/src/datastore/leveldb_datastore.go:229 +0x25a
datastore.(*LevelDbDatastore).periodicallyRotateRequestLog(0xc210071000)
        /home/vagrant/influxdb/src/datastore/leveldb_datastore.go:214 +0x4f
created by datastore.NewLevelDbDatastore
        /home/vagrant/influxdb/src/datastore/leveldb_datastore.go:205 +0x700

goroutine 1 [sleep]:
time.Sleep(0x3b9aca00)
        /tmp/bindist375750859/go/src/pkg/runtime/time.goc:31 +0x31
@schmurfy
Copy link
Contributor

it may not be the issue but what are you using to insert the data ?
I initially had this with 0.4.x with a faulty http library which kept nearly every connections open causing both sides to keep a lot of opened files.

@nichdiekuh
Copy link
Author

I already experienced that before on 0.4 and I'm pretty sure I solved it. Atm I'm at ~1500 connections

Influx Server:

 netstat -an | grep -e tcp -e udp | wc -l
1436

App Server, sending the inserts:

netstat -an | grep -e tcp -e udp | wc -l
236

Both values are pretty stable, as far as I can tell from watching them.

@pauldix
Copy link
Member

pauldix commented Feb 27, 2014

This is probably related to #277, which we'll be releasing a fix for today in rc.2. You'll have to blow away your data directories and start fresh. If the problem is still there please re-open this issue.

@pauldix pauldix closed this as completed Feb 27, 2014
@nichdiekuh
Copy link
Author

Even with RC2, the problem persists:

                                                                                                                                                   94%
[2014/02/27 19:58:15 CET] [INFO] (cluster.(*ClusterConfiguration).AddShards:856) Adding short term shard: 26 - start: Thu Jan 2 01:00:00 +0100 CET 2014. end: Thu Jan 9 01:00:00 +0100 CET 2014. isLocal: %!d(b
ool=true). servers: [%!s(uint32=1)]
[02/27/14 19:58:22] [INFO] No matching shards for write at time 1389225605000000u, creating...
[02/27/14 19:58:22] [INFO] createShards: start: Thu Jan 9 01:00:00 +0100 CET 2014. end: Thu Jan 16 01:00:00 +0100 CET 2014
[2014/02/27 19:58:22 CET] [INFO] (cluster.(*ClusterConfiguration).GetShardToWriteToBySeriesAndTime:605) No matching shards for write at time 1389225605000000u, creating...
[2014/02/27 19:58:22 CET] [INFO] (cluster.(*ClusterConfiguration).createShards:647) createShards: start: Thu Jan 9 01:00:00 +0100 CET 2014. end: Thu Jan 16 01:00:00 +0100 CET 2014
[2014/02/27 19:58:22 CET] [INFO] (datastore.(*LevelDbShardDatastore).GetOrCreateShard:114) DATASTORE: opening or creating shard /opt/influxdb/shared/data/db/shard_db/00027
[02/27/14 19:58:22] [INFO] DATASTORE: opening or creating shard /opt/influxdb/shared/data/db/shard_db/00027
[02/27/14 19:58:22] [INFO] Adding short term shard: 27 - start: Thu Jan 9 01:00:00 +0100 CET 2014. end: Thu Jan 16 01:00:00 +0100 CET 2014. isLocal: %!d(bool=true). servers: [%!s(uint32=1)]
[2014/02/27 19:58:22 CET] [INFO] (cluster.(*ClusterConfiguration).AddShards:856) Adding short term shard: 27 - start: Thu Jan 9 01:00:00 +0100 CET 2014. end: Thu Jan 16 01:00:00 +0100 CET 2014. isLocal: %!d(
bool=true). servers: [%!s(uint32=1)]
[02/27/14 20:02:06] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.2
[2014/02/27 20:02:06 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.2
[02/27/14 20:12:58] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.3
[2014/02/27 20:12:58 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.3
[2014/02/27 20:22:44 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.4
[02/27/14 20:22:44] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.4
[02/27/14 20:34:24] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.5
[2014/02/27 20:34:24 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.5
[02/27/14 20:47:53] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.6
[2014/02/27 20:47:53 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.6
[02/27/14 20:59:51] [INFO] Rotating log. New log file /tmp/influxdb/development/wal/log.7
[2014/02/27 20:59:51 CET] [INFO] (wal.(*WAL).rotateTheLogFile:288) Rotating log. New log file /tmp/influxdb/development/wal/log.7
2014/02/27 21:01:19 http: Accept error: accept tcp [::]:8086: too many open files; retrying in 5ms

@pauldix
Copy link
Member

pauldix commented Feb 27, 2014

Are you inserting a bunch of events all in the same time period or are the timestamps spread out ov multiple days? Also, what's your open file limit?

@nichdiekuh
Copy link
Author

I'm inserting about 160 timeseries, each with >1Mio events, in chunks of 1000 events per insert. The data goes back to Nov/Dec 2012.

[root@db1:~]# cat /proc/sys/fs/file-max
678727
[root@db1:~]# ulimit -Hn
4096
[root@db1:~]# ulimit -Sn
1024
[root@db1:~]# cat /etc/*release
CentOS release 6.4 (Final)

@pauldix
Copy link
Member

pauldix commented Feb 27, 2014

That's definitely low for an InfluxDB server. Particularly if you're going to have 1500 concurrent connections. Any qualms about setting it to 100k or something like that?

I did just notice that in the big refactor the max-open-files option didn't get moved over. I just pushed a commit that fixes that. However, a new LevelDB is created per shard, so if you're writing a ton of data in, that max-open-files option will be per shard. So set it accordingly. We might have to add another option of the max number of open shards at a time.

@nichdiekuh
Copy link
Author

Ok, I've increased the file limits and will see how it goes overnight. The old limits were the defaults and I wasn't aware this could be an issue.
A max-open-shards limit however sounds reasonable.

@nichdiekuh
Copy link
Author

Yep, that seems to have solved my issue, thank you @pauldix :)

@MartinNowak
Copy link

I had to increase my soft limit for maximum number of open files, which was set to 1024.
http://linux.die.net/man/5/limits.conf

@sfeilmeier
Copy link

I would like to add this comment here for reference, as this issue is still one of the first results on Google:
When controlling the influxdb service using systemd, it is necessary to use the option LimitNOFILE to set the maximum file limits. The settings suggested in the InfluxDB installation guide (by Basho: http://docs.basho.com/riak/latest/ops/tuning/open-files-limit/) are not applying in a systemd environment. Here is my complete influxdb.service file:

[Unit]
Description=InfluxDB
After=network.target

[Service]
User=influxdb
Group=influxdb
Type=simple
WorkingDirectory=/opt/influxdb
ExecStart=/usr/bin/influxdb -config /opt/influxdb/shared/config.toml
Restart=always
RestartSec=10
LimitNOFILE=infinity

[Install]
WantedBy=multi-user.target

@iantheconway
Copy link

I'm having a similar issue in 1.2.3; after writing a few hundred rows I get an error message about too many open files. In order to fix it should I set my ulimit or kern.maxfiles to something higher? Or is there a configuration option in influxdb.conf?

@linuxmail
Copy link

hi,
I have the same problem under Debian Stretch with 1.6.1-1 (and below). I tried to change it via systemctl edit telegraf.service or "vi /lib/systemd/system/telegraf.service" but ..

su telegraf --shell /bin/bash --command "ulimit -a" |grep open
open files                      (-n) 50000

cu denny

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants