Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changefeed crash #345

Closed
gthmac opened this issue Jul 26, 2016 · 15 comments
Closed

Changefeed crash #345

gthmac opened this issue Jul 26, 2016 · 15 comments

Comments

@gthmac
Copy link

gthmac commented Jul 26, 2016

Having recurring crashes with the ChangeFeed-cursors (please see below).

Thanks in advance for looking into it, Dan. :-)

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x62df73]

goroutine 7932 [running]:
github.com/dancannon/gorethink.(*Cursor).bufferNextResponse(0xc820517b80, 0x0, 0x0)
    /home/jenkins/.gvm/pkgsets/go1.5.1/global/src/bitbucket.org/cloudintel/vatomizer/Godeps/_workspace/src/github.com/dancannon/gorethink/cursor.go:632 +0x263
github.com/dancannon/gorethink.(*Cursor).seekCursor(0xc820517b80, 0x100000001, 0x0, 0x0)
    /home/jenkins/.gvm/pkgsets/go1.5.1/global/src/bitbucket.org/cloudintel/vatomizer/Godeps/_workspace/src/github.com/dancannon/gorethink/cursor.go:570 +0xe7
github.com/dancannon/gorethink.(*Cursor).nextLocked(0xc820517b80, 0xd9bfe0, 0xc8200769a0, 0xfeb601, 0xc8200769a0, 0x0, 0x0)
    /home/jenkins/.gvm/pkgsets/go1.5.1/global/src/bitbucket.org/cloudintel/vatomizer/Godeps/_workspace/src/github.com/dancannon/gorethink/cursor.go:205 +0x3c
github.com/dancannon/gorethink.(*Cursor).Next(0xc820517b80, 0xd9bfe0, 0xc8200769a0, 0xd9bfe0)
    /home/jenkins/.gvm/pkgsets/go1.5.1/global/src/bitbucket.org/cloudintel/vatomizer/Godeps/_workspace/src/github.com/dancannon/gorethink/cursor.go:188 +0xb0
github.com/dancannon/gorethink.(*Cursor).Listen.func1(0xdd3720, 0xc8206341e0, 0xc820517b80)
    /home/jenkins/.gvm/pkgsets/go1.5.1/global/src/bitbucket.org/cloudintel/vatomizer/Godeps/_workspace/src/github.com/dancannon/gorethink/cursor.go:447 +0x19d
created by github.com/dancannon/gorethink.(*Cursor).Listen
    /home/jenkins/.gvm/pkgsets/go1.5.1/global/src/bitbucket.org/cloudintel/vatomizer/Godeps/_workspace/src/github.com/dancannon/gorethink/cursor.go:456 +0x49
@dancannon
Copy link
Collaborator

Hey, thanks for reporting the crash, I will try to get it fixed as soon as possible. Since you are vendoring the library could you confirm which version of GoRethink you are currently using?

@gthmac
Copy link
Author

gthmac commented Jul 26, 2016

Hi Dan, thanks for the super fast reply and pls excuse my delay in answering as well as not having added this information at the first place.

This is what I am using:
{
"ImportPath": "github.com/dancannon/gorethink",
"Comment": "v2.1.1",
"Rev": "d970d3cce3e907bd864200d4fb7410bca05b9264"
},

@dancannon
Copy link
Collaborator

Sorry to keep asking for more information but I am having some trouble replicating this issue, would it be possible to see the query that is causing the panic and the code you are using to listen and process the changefeed (especially if you are closing the cursor). If you can think of anything else that might help replicate the issue please post that as well.

@gthmac
Copy link
Author

gthmac commented Jul 27, 2016

Hi Dan, here is the code segment that produces this behavior. I upgraded to this version of gorethink from an older one (which did not have the problem, i. e. the code snippet from below worked just fine). The version change was as follows:
-Current version: v1.3.2 (RethinkDB v2.2)
+Current version: v2.1.1 (RethinkDB v2.3)

if actionEntry.Wait {

        mychan := make(chan events.DocChange)
        // start the changeFeed from DB
        cursor, dberr := r.Table(t.EventTable).Get(id).Changes().Run(cfg.EventDB)
        if dberr != nil {
            errmsg := fmt.Sprintf("%+v", dberr)
            cilog.Log(sys.ERROR, cierrors.LogError(&API_ERR_DB_GET, errmsg))
            replyWithError(c, &API_ERR_DB_GET)
            return
        }
        cursor.Listen(mychan)
        defer cursor.Close()

        var change events.DocChange

        var res t.VEvent
        for done := false; !done; {
            select {

            case change = <-mychan:
                _ = change

            case <-time.After(time.Duration(50) * time.Millisecond):
                _ = done
            }
            resp, dberr := r.Table(t.EventTable).Get(id).Run(cfg.EventDB)
            if dberr != nil {
                errmsg := fmt.Sprintf("%+v", dberr)
                cilog.Log(sys.ERROR, cierrors.LogError(&API_ERR_DB_GET, errmsg))
                replyWithError(c, &API_ERR_DB_GET)
                return
            }
            defer resp.Close()
            if resp.IsNil() {
                continue
            }
            dberr = resp.One(&res)
            if dberr != nil {
                errmsg := fmt.Sprintf("%+v", dberr)
                cilog.Log(sys.ERROR, cierrors.LogError(&API_ERR_DB_GET, errmsg))
                replyWithError(c, &API_ERR_DB_GET)
                return
            }
            if res.Status != string(events.EventClosed) {
                continue
            }
            done = true

        }

@gthmac
Copy link
Author

gthmac commented Jul 27, 2016

I meanwhile changed the code above to poll rather than to wait for change feeds until the driver is fixed. But apparently I am crashing in another section that still relies on change feeds which is this code. Again, both code paths were perfectly working before the upgrade (see above).
Thanks again for your help, Dan. Really appreciated, as always.

func (cic *CiCache) listenForChanges(cursor *r.Cursor) {

    ch := make(chan docChange)
    defer cursor.Close()
    cursor.Listen(ch)

    cilog.Log(syslog.LOG_INFO,
        "Cache %s: starting to listen for changes...",
        cic.tableName)
    for change := range ch {
        newkeyval := change.NewVal[cic.key]
        oldkeyval := change.OldVal[cic.key]
        var keyval interface{}
        if newkeyval == nil {
            keyval = oldkeyval
        } else {
            keyval = newkeyval
        }
        if keyval != nil {
            cic.cache.Delete(keyval.(string))
            cilog.Log(syslog.LOG_DEBUG,
                "Cache %s: received change for key=%s",
                cic.tableName,
                keyval)
        }
    }

}

@dancannon
Copy link
Collaborator

dancannon commented Jul 27, 2016

Ah so this is something that has recently changed, that should help track this down, do you remember what version you were on before?

Sorry didn't see the comment with the versions at first!

@gthmac
Copy link
Author

gthmac commented Jul 27, 2016

My previous version was: v1.3.2 (RethinkDB v2.2)

@dancannon
Copy link
Collaborator

I am still not able to replicate the issue but I will keep trying again this evening, its possible that this is caused by the keep alive issue that was fixed in v2.1.2 which caused connections to get stuck.

Also does the panic occur as soon as your application starts or is it after it has been running for a while? Thanks for being so patient while I look into this and sorry again for the trouble this is causing.

@gthmac
Copy link
Author

gthmac commented Jul 29, 2016

Hi Dan,

I personally do not think that this crash is related to the connection loss issue. The reason why I think so, is the following: The main crashing code is based on an event bus mechanism, i. e. an event gets created, the change feed is created on this single event record, the event is being processed (which takes between milliseconds to rarely seconds) and thereafter the event rec is updated, which triggers the change feed. Hence, I don't believe I loose the connection in this time frame.

Secondly, the application does not crash at startup but after a certain amount of time (minutes to hours). It looks to me as if the result structure of the feed may be causing the problem.

I can add additional debug info or try to come up with a small code piece to reproduce the problem on your end, if that would help.

@dancannon
Copy link
Collaborator

I think I may have figured out what is causing this issue, I have pushed a possible fix to the branch hotfix/cursor-panic. Would it be possible for you to test your application with this branch since I have not been able to reproduce this.

@gthmac
Copy link
Author

gthmac commented Jul 31, 2016

Awesome, Dan - thank you very much. I will be able to try this on Tuesday and will give you immediate feedback. Thanks again.

@dancannon dancannon modified the milestone: v2.2.0 Aug 4, 2016
@dancannon
Copy link
Collaborator

Hey @gthmac, did the new release fix the issue for you? (I assume so but it never hurts to double check 😄 )

@gthmac
Copy link
Author

gthmac commented Aug 11, 2016

Hi Dan, my sincere apologies for yet not having come back to you. I have been sick and didn't do the testing yet. I hope to do it at the end of this week.

Thanks again for your efforts, Dan.

@dancannon
Copy link
Collaborator

Sorry to hear that, hope you recover soon! There's no rush to do the testing if you are not well, take your time 😄 Thanks again.

@dancannon
Copy link
Collaborator

I will close this issue as it has been open for quite a while now, if you see the issue again let me know and we can reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants