Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoRethink stopps working after some days #195

Closed
r0l1 opened this issue May 29, 2015 · 10 comments
Closed

GoRethink stopps working after some days #195

r0l1 opened this issue May 29, 2015 · 10 comments

Comments

@r0l1
Copy link
Contributor

r0l1 commented May 29, 2015

I am using the gorethink database driver for a web project. I always get the following error after some days:

gorethink: no connections were available

I searched for this error in the source:

// GetRandomNode returns a random node on the cluster
// TODO(dancannon) replace with hostpool
func (c *Cluster) GetRandomNode() (*Node, error) {
    if !c.IsConnected() {
        return nil, ErrClusterClosed
    }
    // Must copy array reference for copy on write semantics to work.
    nodeArray := c.GetNodes()
    length := len(nodeArray)
    for i := 0; i < length; i++ {
        // Must handle concurrency with other non-tending goroutines, so nodeIndex is consistent.
        index := int(math.Abs(float64(c.nextNodeIndex() % int64(length))))
        node := nodeArray[index]

        if !node.Closed() && node.IsHealthy() {
            return node, nil
        }
    }
    return nil, ErrNoConnections
}

Could it be, that gorethink drops all active nodes after a long idle timeout?

@r0l1 r0l1 changed the title gorethink: no connections were available GoRethink stopps working after some days May 29, 2015
@dancannon
Copy link
Collaborator

Does the driver log any errors? If not could you try running your code with this flag set to true, hopefully that should provide some more information about what is going wrong https://godoc.org/github.com/dancannon/gorethink#SetVerbose.

@dancannon
Copy link
Collaborator

Also related to this issue I think I should probably work on adjusting the logging levels to be more useful. Currently pretty much all logging is set to debug level which is hidden by default.

That being said I also think you should be able to remove all logging. I will look into this as part of fixing this issue.

@dancannon
Copy link
Collaborator

Ok I have now figured out exactly what is going wrong, after the addition of cluster support each node has a "health" attached to it, currently this is pretty basic as it is just a counter from 0-100. If the health drops to 0 then a node becomes unhealthy and the driver will stop sending requests to the node. There is also a goroutine for each node which attempts to refresh the status of the node every 30 seconds by querying the server_status table using the nodes ID.

This causes issues when not using the DiscoverHosts flag as there is no (easy) way to get the nodes ID so the query to server_status fails and the node is never refreshed. This means that if a query fails 100 times then the node becomes unhealthy and never recovers.

@omeid
Copy link
Contributor

omeid commented May 29, 2015

Could this also explain #194?

@dancannon
Copy link
Collaborator

I think they are unrelated but I will double check 

@dancannon
Copy link
Collaborator

I have created a branch which has host discovery enabled by default. I hope to merge it soon.

@r0l1
Copy link
Contributor Author

r0l1 commented Jun 1, 2015

@dancannon: Thank you for your efforts! I have set my application to verbose logging. I'll report any further issues regarding this issue. Hope I'll get some more useful information! We have to be patient^^

@r0l1
Copy link
Contributor Author

r0l1 commented Jun 6, 2015

Gorethink failed again without throwing any log messages. The web application is running now for 4 days. Gorethink failed after 2 days. The only error I get on queries is:

gorethink: no connections were available

Each few hours the application checks the users database table for expired users which did register but never log in for at least 3 weeks. So this can't be an idle timeout problem. There are currently no new users and this query failed each time, because the passed slice of user IDs to delete was always empty. I fixed this by checking the length of the IDs slice and skipping the deletion. However gorethink's connection to the database should not break, even if the deletion query fails.

r.Table(DBUserTable).GetAll(ids...).
        Delete().RunWrite(db.Session)

gorethink: Expected 2 or more arguments but found 1. in: r.Table("users").GetAll().Delete()

This was the last successful database query, even if it threw an error.

@dancannon
Copy link
Collaborator

Thanks for checking @m4ng0squ4sh, was your last test with the master branch? If so I have just deployed a new version of the driver which should hopefully fix your issue.

If not could you let me know and I will reopen this issue.

@r0l1
Copy link
Contributor Author

r0l1 commented Jun 8, 2015

Great thank you @dancannon! Yeah, I am on the master branch. I just updated to the new database naming conventions and pulled the latest source from gorethink. I'll reopen this issue if this is not fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants