Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scylla silently stops handling writes under load if open file limit is low #369

Closed
tgrabiec opened this issue Sep 17, 2015 · 14 comments
Closed
Labels

Comments

@tgrabiec
Copy link
Contributor

I tested with 7 c-s processes, 22 connections and 1400 threads per proceses. Soon all clients fail with the following error message:

java.io.IOException: Operation x10 on key(s) [4c4d504f4b3437504e30]: Error executing: (WriteTimeoutException): Cassandra timeout during write query at consistency LOCAL_ONE (1 replica were required but only 0 acknowledged the write)

        at org.apache.cassandra.stress.Operation.error(Operation.java:216)
        at org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:188)
        at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:99)
        at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:107)
        at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:259)
        at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309)

There is nothing in scylla logs.

Bumping up open file limit to 100000 makes the problem not occur.

The problem is not that we don't work with low number of open files, that's acceptable, but that there is no information about the underlying cause of failure in the logs, so it's not easy to figure out why scylla stops responding.

@tgrabiec tgrabiec added the bug label Sep 17, 2015
@asias
Copy link
Contributor

asias commented Sep 18, 2015

We already have the file limit updated in rpm:

$ cat dist/redhat/limits.d/scylla.conf
scylla - memlock unlimited
scylla - nofile 100000
scylla - as unlimited
scylla - nproc 8096

What did you do to increase it?

@gleb-cloudius
Copy link
Contributor

On Fri, Sep 18, 2015 at 04:29:40PM -0700, Asias He wrote:

We already have the file limit updated in rpm:

$ cat dist/redhat/limits.d/scylla.conf
scylla - memlock unlimited
scylla - nofile 100000
scylla - as unlimited
scylla - nproc 8096

What did you do to increase it?

I added "ulimit -l > /tmp/limits" to /lib/scylla/scylla_run and I see:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31849
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31849
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

after stop/start the server. Did not tries reboot yet.

        Gleb.

@asias
Copy link
Contributor

asias commented Sep 19, 2015

On a AWS AMI instance

[fedora@ip-172-31-43-112 ~]$ sudo su - scylla
-bash-4.3$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 241448
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 8096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

@asias
Copy link
Contributor

asias commented Sep 19, 2015

[root@ip-172-31-43-112 1424]# cat limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 8096 8096 processes
Max open files 100000 100000 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 241448 241448 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
[root@ip-172-31-43-112 1424]# pwd
/proc/1424
[root@ip-172-31-43-112 1424]# pgrep scylla
1424

@gleb-cloudius
Copy link
Contributor

On Sat, Sep 19, 2015 at 01:56:36AM -0700, Asias He wrote:

On a AWS AMI instance

[fedora@ip-172-31-43-112 ~]$ sudo su - scylla
-bash-4.3$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 241448
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 8096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

There are two problems with that. Firs our scylla-server.service forgets
to specify User/Group so scylla runs as root. Second if somebody still
had any doubts that systemd is crap then look here
https://bugzilla.redhat.com/show_bug.cgi?id=754285. TL/DR systemd does
not apply limits when it runs services. Systemd has its own way to
specify limits! When I add LimitNOFILE=100000 to scylla-server.service
only then I see proper limit during service startup.

        Gleb.

@asias
Copy link
Contributor

asias commented Sep 23, 2015

@slivne right now, we the limit in systemd. we can close this.

@tgrabiec
Copy link
Contributor Author

24 wrz 2015 12:23 AM "Asias He" notifications@github.com napisał(a):

@slivne right now, we the limit in systemd. we can close this.

This issue was more about the fact that in case we hit open file limit we
silently stop responding, whereas we should shout in logs. Was this
resolved?


Reply to this email directly or view it on GitHub.

@slivne
Copy link
Contributor

slivne commented Sep 24, 2015

the issue is with non ami users who try to install rpms and then have
scylla stop working.

no so lets make sure we provide a proper error message - still open.

On Thu, Sep 24, 2015 at 8:37 AM, Tomasz Grabiec notifications@github.com
wrote:

24 wrz 2015 12:23 AM "Asias He" notifications@github.com napisał(a):

@slivne right now, we the limit in systemd. we can close this.

This issue was more about the fact that in case we hit open file limit we
silently stop responding, whereas we should shout in logs. Was this
resolved?


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#369 (comment).

@gleb-cloudius
Copy link
Contributor

I checked and scylla is far from been silent:
ERROR [shard 0] database - failed to write sstable: Too many open files
ERROR [shard 0] database - failed to write sstable: Too many open files
ERROR [shard 0] database - failed to write sstable: Too many open files
ERROR [shard 0] database - failed to write sstable: Too many open files
ERROR [shard 0] database - failed to write sstable: Too many open files

@slivne
Copy link
Contributor

slivne commented Sep 24, 2015

Tomek - what are you getting - in your scenario ? / maybe its failing in
the read path and not the write path ?

On Thu, Sep 24, 2015 at 11:01 AM, Gleb Natapov notifications@github.com
wrote:

I checked and scylla is far from been silent:
ERROR [shard 0] database - failed to write sstable: Too many open files
ERROR [shard 0] database - failed to write sstable: Too many open files
ERROR [shard 0] database - failed to write sstable: Too many open files
ERROR [shard 0] database - failed to write sstable: Too many open files
ERROR [shard 0] database - failed to write sstable: Too many open files


Reply to this email directly or view it on GitHub
#369 (comment).

@gleb-cloudius
Copy link
Contributor

Sometimes I also see:
WARNING: exceptional future ignored of type 'std::system_error': Error system:24 (Too many open files)
I'll check where this coming from

@gleb-cloudius
Copy link
Contributor

#0 report_failed_future (eptr=...) at core/reactor.cc:2101
#1 0x0000000000798843 in ~future (this=0x60000197cc70,
__in_chrg=)
at /home/gleb/work/seastar/seastar/core/future.hh:691
#2 ~ (this=0x60000197cc70, __in_chrg=)
at sstables/compaction.cc:243
#3 ~ (this=0x60000197cc50, __in_chrg=)
at /home/gleb/work/seastar/seastar/core/future.hh:785
#4 ~continuation (this=0x60000197cc40, __in_chrg=)
at /home/gleb/work/seastar/seastar/core/future.hh:358
#5 continuation<future::then(Func&&) [with Func = sstables::compact_sstables(std::vector<lw_shared_ptrsstables::sstable >, column_family&, std::function<lw_shared_ptrsstables::sstable()>)::<lambda()>; Result = future<>; T = {}]::<lambda(auto:1&&)> >::~continuation(void) (this=0x60000197cc40,
__in_chrg=)
at /home/gleb/work/seastar/seastar/core/future.hh:358
#6 0x0000000000477d89 in operator() (this=,
__ptr=0x60000197cc40) at /usr/include/c++/4.9.2/bits/unique_ptr.h:76
#7 reset (__p=0x60000197cc40, this=)
at /usr/include/c++/4.9.2/bits/unique_ptr.h:344
#8 reactor::run_tasks (this=this@entry=0x6000001df000, tasks=...)
at core/reactor.cc:1118
#9 0x00000000004a352b in reactor::run (this=0x6000001df000)
---Type to continue, or q to quit---
at core/reactor.cc:1234
#10 0x00000000004f7eeb in app_template::run_deprecated(int, char**, std::function<void ()>&&) (this=this@entry=0x7fffffffdb10, ac=ac@entry=18,
av=av@entry=0x7fffffffdd58,
func=func@entry=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x512632, DIE 0x5bb4e5>) at core/app-template.cc:122
#11 0x000000000041eeac in main (ac=18, av=0x7fffffffdd58) at main.cc:324

@tgrabiec
Copy link
Contributor Author

@slivne In my tests, I got no message at all. I need to retest with latest.

@tgrabiec
Copy link
Contributor Author

I seem to be getting error messages with scylla 0.9, so closing.

ERROR [shard 32] storage_proxy - exception during write: Too many open files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants