-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Handle nil memtable gracefully on Close #1034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ A review job has been created and sent to the PullRequest network.
@connorgorman you can click here to see the review status or cancel the code review job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @@ -492,6 +492,10 @@ func (db *DB) getMemTables() ([]*skl.Skiplist, func()) { | |||
|
|
|||
| tables := make([]*skl.Skiplist, len(db.imm)+1) | |||
|
|
|||
| if db.mt == nil { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change and optimization to avoid this issue!
If you haven't already, this is a good opportunity to start tracking memory usage as part of reliability metrics to ensure your application doesn't start overflowing as the codebase becomes more complex and we get into the memory management nitty gritties.
|
@connorgorman Thanks for the PR. Can you tell us, how did you end up with above crash? Also, if possible, please create issue for this. |
|
@ashish-goswami It's actually related to this one: hypermodeinc/dgraph#3873. Basically, that stacktrace shows two panics. It panics once on the getMemTables because it is getting a value, but then there is an iterator panic in a defer function because the DB has been closed This scenario occurs when you Close the DB, but have a txn in progress (new txns get an error about the DB being closed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ashish-goswami, @connorgorman, @manishrjain, and @pullrequest[bot])
db.go, line 495 at r1 (raw file):
Previously, pullrequest[bot] wrote…
Good change and optimization to avoid this issue!
If you haven't already, this is a good opportunity to start tracking memory usage as part of reliability metrics to ensure your application doesn't start overflowing as the codebase becomes more complex and we get into the memory management nitty gritties.
I am not very sure about this change. The db.mt shouldn't be nil under normal circumstances. I'm curious to know how would we end up in a situation where db.mt is nil.
db.go, line 535 at r1 (raw file):
tables, decr := db.getMemTables() // Lock should be released. if len(tables) == 0 { return y.ValueStruct{}, errors.New("No tables found. DB may be closed")
This wouldn't work. The db.get(...) function looks in the memtables and then in all the levels to find a key. If we return from here, we would not find keys that are stored in any level below the memtables.
|
@jarifibrahim If you are trying to gracefully shutdown by calling Close, but have other readers then I believe they are not synchronized and will panic. https://github.com/dgraph-io/badger/blob/master/db.go#L407 is where db.mt is set to nil. In the case where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to inactivity, PullRequest has cancelled this review job. You can reactivate the code review job from the PullRequest dashboard
|
Circling back on this change, I think the correct Close behavior would be to wait for all txns including Views to complete while not allowing any new txns. Thoughts? |
I had the same thing in mind. Instead of crashing with a nil pointer exception, it would be nice to complain to the user about the open transactions while closing the DB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch seems to me the simplest fix for the memtable nil crash.
func TestMemtableNil(t *testing.T) {
db, err := Open(DefaultOptions("/tmp/b"))
if err != nil {
panic(err)
}
txnSet(t, db, []byte("foo"), []byte("1"), 0)
go func() {
txn := db.NewTransaction(true)
for {
time.Sleep(time.Second)
i, err := txn.Get([]byte("foo"))
if err != nil && err != ErrKeyNotFound {
log.Fatal(err)
}
fmt.Println(i)
}
}()
db.Close()
time.Sleep(10 * time.Second)
}This test would crash on master but on this PR it gracefully returns an error.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ashish-goswami, @connorgorman, and @manishrjain)
db.go, line 495 at r1 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
I am not very sure about this change. The
db.mtshouldn't be nil under normal circumstances. I'm curious to know how would we end up in a situation wheredb.mtisnil.
Thanks for explaining @connorgorman
db.go, line 535 at r1 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
This wouldn't work. The
db.get(...)function looks in the memtables and then in all the levels to find a key. If we return from here, we would not find keys that are stored in any level below the memtables.
In case the DB is being closed, it should be okay to return from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ashish-goswami, @connorgorman, and @manishrjain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not assume that if db.mt is nil, db is being closed. That is just relying upon some behavior which happens to be the case right now, but could change in the future.
@jarifibrahim Please make the changes as discussed. Feel free to merge after.
Reviewed 1 of 1 files at r1.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ashish-goswami, @connorgorman, and @jarifibrahim)
db.go, line 495 at r1 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
Thanks for explaining @connorgorman
Do not return here.
db.go, line 500 at r1 (raw file):
// Get mutable memtable. tables[0] = db.mt
if db.mt != nil { this logic here }
db.go, line 508 at r1 (raw file):
tables[i+1] = db.imm[last-i] tables[i+1].IncrRef() }
Here, filter the tables array for nil tables. Loop over elements, if not nil, keep them.
db.go, line 535 at r1 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
In case the DB is being closed, it should be okay to return from here.
I wouldn't do this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ashish-goswami, @connorgorman, and @jarifibrahim)
db.go, line 508 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Here, filter the tables array for nil tables. Loop over elements, if not nil, keep them.
Also move all IncrRef here.
|
@manishrjain Does it seem like it'd be better to just wait for View txns to close? The issue is that closing the DB does not wait for all txns to finish |
|
Hey @jarifibrahim , what's the status of this? |
|
@manishrjain @jarifibrahim After looking at this and testing, it seems like this will not be sufficient for fixing the issue completely. e.g. if you use a merge operator, then the Get will succeed, but then the iterator will panic. It seems like the only way to fix this issue would be to ensure that all txns complete (both view and update) and then closing the DB (while not allowing new txns to begin). I took a stab at this, but it's fairly problematic due to the NewTransaction function being relied upon to always succeed and have a non-nil return value, which makes synchronizing difficult Do you guys have any thoughts about how we could synchronize txns? |
|
Hey @connorgorman, I don't think there's any way we can handle all the The crash happened in hypermodeinc/dgraph#3873 because we were accessing badger after closing it. A user/system should never do that. I think it's a reasonable expectation from the user to not access the DB after they have closed the database. @connorgorman I'd like to close this PR since the issue we're trying to fix here is not really an issue with badger. |
This is related to crashes on Close. I think it's a fair expectation that if you have multiple goroutines reading/writing a call to Close (on a sigterm for example) will not cause a panic. This fixes an issue where the memtable is nil on Close. This will cause Get's to return errors and not panic
This is an example of the panic
This change is