Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node does not shut down communications when out of disk space #1311

Closed
avikivity opened this issue Jun 5, 2016 · 8 comments
Closed

Node does not shut down communications when out of disk space #1311

avikivity opened this issue Jun 5, 2016 · 8 comments

Comments

@avikivity
Copy link
Member

Installation details
Scylla version (or git commit hash): 1.1

@benoit-canet
Copy link

Yes Avi,

I made an exception for ENOSPC since I though that Raphael was working
for a way for Scylla to back off write on ENOSPC.

Was I wrong ?

Best regards

Benoît

On Sun, Jun 5, 2016 at 12:44 PM, Avi Kivity notifications@github.com
wrote:

Installation details
Scylla version (or git commit hash): 1.1


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1311, or mute the thread
https://github.com/notifications/unsubscribe/AAA5IYoXTzoBXARSVMe38dpLDuc4LMzrks5qIqh5gaJpZM4IuVC-
.

@avikivity
Copy link
Member Author

So it appears, we should have caught ENOSPC and removed it when whatever Raphael was doing was complete.

@benoit-canet
Copy link

I refreshed my memory. Later on you instructed we to shutdown only on EIO.

On Sun, Jun 5, 2016 at 1:10 PM, Avi Kivity notifications@github.com wrote:

So it appears, we should have caught ENOSPC and removed it when whatever
Raphael was doing was complete.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1311 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAA5IRBxUOKTUzA1idU4bU6qrrOGocARks5qIq67gaJpZM4IuVC-
.

@avikivity
Copy link
Member Author

You should have ignored me, it was a really bad idea.

Did we not discuss ENOSPC?

@avikivity
Copy link
Member Author

Here's what I said at one time:

We need separate treatment for:

ENOSPC (can't write any more, but can read) 
EIO on sstable write (we may recover if we retry) 
EIO on sstable read (may want to start repair) 
EIO on commitlog write (bad) 
ENOTDIR/EPERM/ENOENT etc (indicating bad configuration probably) 

Maybe we won't have separate treatment now, but by having multiple
signals, we can be prepared.

Looks like we aren't prepared.

@benoit-canet
Copy link

We have:

I though you wanted to discriminate by source of the exception so I did

extern thread_local disk_error_signal_type commit_error;
extern thread_local disk_error_signal_type sstable_read_error;
extern thread_local disk_error_signal_type sstable_write_error;
extern thread_local disk_error_signal_type general_disk_error;

to discriminate from the source of the exception.

We could discriminate futher by signal in service/storage_service.(hh/cc)
with some switch/cases.

At least even I did it wrong I tried to prepare ourselves for this.

On Sun, Jun 5, 2016 at 3:37 PM, Avi Kivity notifications@github.com wrote:

Here's what I said at one time:

We need separate treatment for:

ENOSPC (can't write any more, but can read)
EIO on sstable write (we may recover if we retry)
EIO on sstable read (may want to start repair)
EIO on commitlog write (bad)
ENOTDIR/EPERM/ENOENT etc (indicating bad configuration probably)

Maybe we won't have separate treatment now, but by having multiple
signals, we can be prepared.

Looks like we aren't prepared.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1311 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAA5IWFwCLNkerzWxT6j-lVnz_QyyExTks5qItEzgaJpZM4IuVC-
.

@avikivity
Copy link
Member Author

Yeah it's my fault, I didn't explain myself clearly, and in any case I should have caught EIO-only on review.

@slivne
Copy link
Contributor

slivne commented Jun 6, 2016

requires backport for 1.2, 1.1

On Mon, Jun 6, 2016 at 11:56 AM, Tomasz Grabiec notifications@github.com
wrote:

Closed #1311 #1311 via 961e80a
961e80a
.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1311 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ADThCPhxhXbaqAxv779rbwRA9A-dxHKmks5qI-DagaJpZM4IuVC-
.

penberg pushed a commit that referenced this issue Jun 6, 2016
Currently we only shut down on EIO.  Expand this to shut down on any
system_error.

This may cause us to shut down prematurely due to a transient error,
but this is better than not shutting down due to a permanent error
(such as ENOSPC or EPERM).  We may whitelist certain errors in the future
to improve the behavior.

Fixes #1311.
Message-Id: <1465136956-1352-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 961e80a)
penberg pushed a commit that referenced this issue Jun 14, 2016
Currently we only shut down on EIO.  Expand this to shut down on any
system_error.

This may cause us to shut down prematurely due to a transient error,
but this is better than not shutting down due to a permanent error
(such as ENOSPC or EPERM).  We may whitelist certain errors in the future
to improve the behavior.

Fixes #1311.
Message-Id: <1465136956-1352-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 961e80a)
penberg pushed a commit that referenced this issue Jun 16, 2016
Currently we only shut down on EIO.  Expand this to shut down on any
system_error.

This may cause us to shut down prematurely due to a transient error,
but this is better than not shutting down due to a permanent error
(such as ENOSPC or EPERM).  We may whitelist certain errors in the future
to improve the behavior.

Fixes #1311.
Message-Id: <1465136956-1352-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 961e80a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants