want some way to find out whether a session is still live #124

Closed
rogpeppe opened this Issue Jun 17, 2015 · 7 comments

Comments

Projects
None yet
3 participants
Contributor

rogpeppe commented Jun 17, 2015

When serving multiple concurrent requests in a web server,
the easiest thing to do is to Copy the session for each request.
Unfortunately mongoDB uses a substantial amount of memory
(~1MB) for each connection, so this is not ideal.

If there was some way of determining whether an existing
session had died in a permanent way (for example because the
connection has been dropped), this would enable easy
reuse of existing connections via Clone instead of Copy.

Something like this perhaps?

// Dead reports whether any socket connection reserved
// by the session has died. This can happen when a mongo
// server goes away, for example. This method will always
// return true after s.Refresh has been called.
func (s *Session) Dead() bool
Contributor

niemeyer commented Jun 17, 2015

This request seems based on misunderstandings of how the driver works.

A few data points:

  • When you Close a session, the underlying socket goes back into a pool, rather than being closed
  • Sessions returned by Clone and Copy both reuse connections from a pool, always
  • Sessions never die in a permanent way, unless you Close them
Contributor

rogpeppe commented Jun 17, 2015

I don't think I'm misunderstanding the way things work. The way we've got things set up currently is as follows.

When an HTTP request is received, we acquire a new session for the request by invoking Copy on an existing session (in actual fact we have a pool of preallocated sessions and use Refresh at the end of the request, but I believe this is roughly equivalent) thus giving each request a consistent view of the database for the duration of that request (we're using strong consistency mode).

This works fine in principle - if a mongo server switches primary, only the requests that are currently being serviced will encounter an error (not much can be done about those, I think). The disadvantage is that N concurrent requests will use N sockets to the mongo primary, which at 1MB per connection is not great.

It would be nice to be able to share a mongo session between concurrent requests, and we could do that by sometimes using Clone instead of Copy. The problem with that is that when a primary has changed or the socket is otherwise down, the session is unusable until it's refreshed, and it's not easy to tell when we should do that. One possible way to do this might be to look at all error returns from all operations performed on the session and refresh if any indicate any error indicates that a refresh is needed. But this is a highly invasive change because every piece of code needs to know to do this.

The suggestion above is for a way to accomplish this aim. I have no particular attachment to the name. "TemporarilyUnusable" might be more accurate, though I wouldn't suggest it seriously.

Do you have a suggestion for how we might do this with the current mgo.v2 API?

Contributor

niemeyer commented Jun 17, 2015

Before discussing your full scenario description, why do you have a pool of pre-allocated sessions?

Contributor

rogpeppe commented Jun 18, 2015

The reason is partly pragmatic. When we wrote the code originally, we didn't realise
that it was important to copy the session for each request, so the HTTP handler
code was written something like this:

type handler struct {
     db *mgo.DB
}
func NewHandler(db *mgo.DB) http.Handler {
    h := &handler{db}
    mux := http.NewServeMux()
    mux.Handle("/xxxx", h.serveSomething)
    // etc - about 50 methods.
    return mux
}
func (h *handler) serveSomething(w http.ResponseWriter, r *http.Request) {
    // handle request
}

In actual fact the db field is not directly inside the handler - we
have an abstraction that sits on top of it and provides higher level
operations (and even within that there are other values that contain
the session, such as mgo.GridFS).

When we realised that our web service went irrevocably down when
the mongo instance restarted, we changed the higher level object
so that it knew how to copy itself (and the session along with it).
At the start of a request, we'd acquire a copy and operate on that.

Unfortunately that gave us failures too because we ended up using
far too many mongo sessions (in particular some code where we
used concurrency ended up using many sessions per request)
resulting in mongo dying badly.

So we changed the code so that it notionally does something
like this, ensuring exactly one session for each request
and simplifying a bunch of the code too.

type handler struct {
     db *mgo.Database
}
func NewHandler(db *mgo.Database) http.Handler {
    return &handler{db}
}
func (h *handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    // We actually implement a limiter that limits the total number
    // of mongo connections and returns an error if a session cannot
    // be acquired within a reasonable length of time.
    rh := newReqHandler(h.db.With(h.db.Session.Copy())
    defer rh.db.Session.Close()
    rh.ServeHTTP(w, r)
}
type reqHandler struct {
    db *mgo.Database
}
func newReqHandler(db *mgo.Database) *reqHandler
    h := &reqHandler{db}
    mux := http.NewServeMux()
    mux.Handle("/xxxx", h.serveSomething)
    // etc - about 50 methods.
    return h
}
func (h *reqHandler) serveSomething(w http.ResponseWriter, r *http.Request) {
    // handle request
}

except that to save on garbage (allocating a req handler takes about 15K)
instead of actually calling newReqHandler
on each request, we maintain a pool of preallocated reqHandlers, and call
Session.Refresh before putting a reqHandler back in the pool.

That's why we have a pool of preallocated sessions.

But this still isn't ideal - we'd like to be able to serve more concurrent
requests than we have available mongo connections, which is where
this issue comes from. If we were able to tell that a given session
had gone down, then instead of doing a Copy on each request, we
could do a Clone, sharing a single connection between several requests,
while still guarding against mongo restarts, master changes, etc.

Sorry about the length. I hope this manages to explain something of
why we want to do this.

Contributor

niemeyer commented Sep 30, 2015

There's still no reason to cache sessions, even given that scenario. You can avoid creating excessive sockets to the database just by controlling the amount of concurrency in your application, which is what you seem to be doing given the above scenario. The fact you have a cache of sessions is not really adding anything on top of that.

I'm closing this request as the use case is well supported by simply following the most common and expected practices recommended.

@niemeyer niemeyer closed this Sep 30, 2015

Contributor

rogpeppe commented Nov 20, 2015

If we're always using one Copied session for each concurrent operation, we have a much tighter limit on the number of concurrent operations than if we could use Clone too. That seems a pity. When serving hundreds of concurrent connections, we'll probably end up using 100s of MB of resources on the mongo server that we don't really need, so our service won't scale as well.

Contributor

glasser commented Aug 19, 2016

I have a related question to this. (I use Strong consistency.)

When I run a query or command against a Session, it might succeed, or it might return an error. If it returns an error, it might be just that the particular command failed (a deserialized error written by the server), or it might be an IO error reading from the connection.

In the latter case, the socket is dead and all subsequent operations on the session will also fail until the session is refreshed or a new session is created.

If every operation in my program was "fail fast" (a single error causing the whole current "task" to immediately fail), then this would be fine — an IO error would cause the task to fail, and before we start the next task we either create a new Session or refresh the current session.

But my program has operations that aren't fail fast. eg, they want to operate on a bunch of different pieces of data in series, and an error from the database from one of them should be tracked but not cause it to skip working on the other data.

So in these operations, where an error does not result in immediate termination of the current session, I'd like to be able to know whether a given error is a permanent socket error or a one-time error meaning the particular command was bad. That way I'd be able to fail fast on a "socket bad, need to start over" error and merely move on for a "the data I thought would be in the DB wasn't there" error.

Or maybe the answer here is that for every "subtask" here I should be making a new session?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment