Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #3913 Fix races in session request reference counting #3947

Merged
merged 7 commits into from
Aug 15, 2019

Conversation

janbartel
Copy link
Contributor

@janbartel janbartel commented Aug 8, 2019

#3913
Fixed races in the reference counting used by sessioncache to work out when it can evict a session. As part of fixing races, also discovered a couple of other problem cases to do with complex server-side forwarding chains to other contexts, and also problems with cleaning up sessions on async dispatches. The code is now a lot simpler and clearer in the lifecycle and reference counts for sessions.

Signed-off-by: Jan Bartel <janb@webtide.com>
Copy link
Member

@olamy olamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor comment

*
* @param id session id
* @return the Session object matching the id
*/
public abstract Session doGet(String id);
protected abstract Session doGet(String id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing visibility in minor release? (maybe in 10?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, bit naughty I know. These methods should have been protected from the start. I figure that as they are only called by the session cache itself, then there should be no incompatibilities with this change.

@@ -125,7 +126,7 @@
* @param session the session object
* @return null if the session wasn't already in the map, or the existing entry otherwise
*/
public abstract Session doPutIfAbsent(String id, Session session);
protected abstract Session doPutIfAbsent(String id, Session session);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing visibility in minor release? (maybe in 10?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment. I guess if I absolutely had to, I could leave the viz changes until 10 ...

@@ -135,7 +136,7 @@
* @param newValue the new value
* @return true if replacement was done
*/
public abstract boolean doReplace(String id, Session oldValue, Session newValue);
protected abstract boolean doReplace(String id, Session oldValue, Session newValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing visibility in minor release? (maybe in 10?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per previous comment.

@sbordet
Copy link
Contributor

sbordet commented Aug 8, 2019

@janbartel in order to review this PR I need a clear description of what the problem is.
The issue mentions a nasty race but does not tell exactly what/where.

Also, I need a clear description of what the solution is to solve that race.

Copy link
Contributor

@sbordet sbordet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got to fix your formatting configuration.

Can't really review this PR without @janbartel driving me through what the problem was and how is the solution solving the problem.

The fact that many tests had to be changed to introduce a wait for the "complete" event tells me that a normal application could undergo the same race (but it won't wait for the "complete" event), so how are we dealing with this situation?

Session session = Session.class.cast(s);
if (LOG.isDebugEnabled())
LOG.debug("Request {} leaving session {}", this, session);
session.getSessionHandler().complete(session);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as asymmetric: why the "enter" event does not need to tell the SessionHandler but adds the session to the list, but the "leave" event needs to tell to the SessionHandler and not remove the session from the list?
I mean I can see the list is cleared out in recycle(), but often asymmetries of this kind are indication of a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SessionHandler tells the Request that it has entered the Session. The Request knows when it is completing, and tells the SessionHandler that it is leaving the Session. The Request could iterate over the list and remove each Session as it calls its SessionHandler with it. I just didn't bother because the next thing that happens is that the Request gets recycled.

LOG.warn(e);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird to me. Javadocs for release() say that the method should be called once a request is finished with a session.
It does not strike me that the request is finished with this session before existing this method. What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc on line 1103 says that the request reference count will be incremented by the call to sessionCache.renewSessionId; the finally clause makes sure that this increment is decremented again.

You have to consider the case where the session id exists on multiple different contexts, and thus must all be renewed when a request invokes changeId method. There may be no active requests in any of the other contexts, thus sessioncache for each context needs to load in the session if its not already present - this has to increment a reference count. If we did not then decrement the count when we've finished changing the session id, the session would sit forever in the sessioncaches of these other contexts with ref count =1. By always incrementing and decrementing for change of session id we maintain the correct behaviour of the caches.

@sbordet
Copy link
Contributor

sbordet commented Aug 8, 2019

I'm also slightly worried about locking.
The code references many components, each of which has its own lock.
Are we always sure that the lock ordering is the same (i.e. A grabs a_lock then calls B which grabs b_lock) and we never end up in a situation where the order is inverted (B grabs b_lock and then calls A which grabs a_lock)? Because otherwise it's a possible deadlock.

Signed-off-by: Jan Bartel <janb@webtide.com>
@janbartel
Copy link
Contributor Author

janbartel commented Aug 12, 2019

@sbordet re deadlocking: I believe that the locking is sound but we can discuss on hangout.

re using CountDownLatch in tests to ensure that the request has finished processing is just for testing purposes. It is true that responses can flow back to the client who may make another request before the previous request has fully exited, but there are modes on the SessionCache to deal with that (specifically SAVE_ON_CREATE to ensure that a session is immediately saved to the store, rather than lazily saved when the last request drains out). I'm not testing those other modes in the tests where I use CountDownLatches - there are other tests for that.

Signed-off-by: Jan Bartel <janb@webtide.com>
Signed-off-by: Jan Bartel <janb@webtide.com>
@janbartel
Copy link
Contributor Author

Added comment on issue #3913 describing race condition that was fixed.

During fixing of this issue I also discovered and fixed these 2 problems:

  1. If a request creates a session in eg contextA, forwards to a chain of other contexts, then forwards back to contextA, we will not be able to find the newly created session because we only retain the previous session & SessionHandler. Note that forwarding to only 1 other context before forwarding back to contextA the problem does not occur, because the previous context is contextA.

  2. Async completion was not being handled correctly: we were registering an AsyncListener if asyncStarted == true when the SessionHandler exited. However the AsyncListener will only clean up a session if it was created in the same context for which async was started: if the request visits other contexts and creates sessions in them before async completion, none of those sessions will be cleaned up correctly.

For a full list of session use cases considered and verified for the fix, see the attached file.

httpsession-fix.txt

Signed-off-by: Jan Bartel <janb@webtide.com>
Signed-off-by: Jan Bartel <janb@webtide.com>
Signed-off-by: Jan Bartel <janb@webtide.com>
@janbartel janbartel merged commit af6c675 into jetty-9.4.x Aug 15, 2019
Jetty 9.4.21 automation moved this from In Review to Done Aug 15, 2019
@janbartel janbartel deleted the jetty-9.4.x-3913-session-reference-count-race branch August 15, 2019 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Jetty 9.4.21
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants