Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease the frequency of session metadata cleanup and where in the request cycle it occurs #19165

Merged
merged 1 commit into from Jan 15, 2018

Conversation

Projects
None yet
6 participants
@mbabker
Copy link
Member

commented Dec 24, 2017

Pull Request for Issue #19146 .

Summary of Changes

Presently, we write metadata about the session to the database regardless of the handler in use, and we purge expired metadata when a request happens on an even numbered second. This results in a high number of DELETE FROM #__session... queries for no good reason.

This PR tries to make this cleanup operation less frequent AND defers it to the end of the request cycle versus the beginning of it. To do so, an event listener is dynamically registered during the onAfterSessionStart event that checks if the request is occurring on a second that is a divisor of five and the session handler in use is NOT the database handler. If these conditions are met, then the cleanup query will be run.

Practical changes:

  • Expired session cleanup when the database handler is in use will be deferred to when PHP natively runs the session cleanup operation, this may result in some sessions living in the database longer than they do now but this will not cause a side effect of allowing an expired session to be resumed
  • The metadata cleanup drops in frequency from once every 2 seconds to once every 5 seconds at a minimum (of course this is also dependent on your site's traffic, for higher traffic sites this can be a big deal)
  • The metadata cleanup is deferred from the beginning of the request cycle to the end of it

Testing Instructions

Sessions should continue to function as they presently do. Note that the frequency of when expired sessions are purged from the database will be reliant on the PHP configuration now versus an arbitrary once every 2 second code check when the database handler is in use (this basically means the expired session won't be immediately removed, at worst this can affect the list of logged in users shown in various modules). When a non-database session handler is in use (i.e. the "PHP" option using the filesystem), the session metadata is cleared from the database once every five seconds at a minimum (since this is now in a lambda function happening after the response is sent, you'll probably need to add a JLog::add() statement to actually verify the lambda is triggered as a poor man's var_dump($foo);die; won't really do much here).

@joomdonation

This comment has been minimized.

Copy link
Contributor

commented Dec 25, 2017

I think we will need to find a way to pass session lifetime from configuration to session handle classes. For example, right now, gc method from database handle has $lifetime parameter set to 1440 seconds https://github.com/joomla/joomla-cms/blob/staging/libraries/joomla/session/storage/database.php#L139 , so it completely ignore session lifetime configuration from global configuration of the site

@joomdonation

This comment has been minimized.

Copy link
Contributor

commented Dec 25, 2017

BTW, could you please explain why the change "Expired session cleanup will be deferred to when PHP natively runs the session cleanup operation" only apply database handler, not to other handlers?

I think leave it to PHP GC to handle clean up will be more reliable compareto how we are doing it at the moment (we don't know what would be the best number 2, 3, 4, 5... to start doing the expired session clean up). It also have few other benefits:

  1. Reduce number of DELETE query which server has to run (especial on high traffic websites)
  2. Server runs on PHP 7.1.0+ can setup cron job to run session_gc function http://php.net/manual/en/function.session-gc.php at a pre-defined time, so more reliable and much better than rely on PHP or Joomla to run the clean up
@mbabker

This comment has been minimized.

Copy link
Member Author

commented Dec 25, 2017

BTW, could you please explain why the change "Expired session cleanup will be deferred to when PHP natively runs the session cleanup operation" only apply database handler, not to other handlers?

Because right now Joomla is only arbitrarily doing session cleanup on the database handler with this hardcoded query because Joomla is arbitrarily storing extra metadata to the same database table it stores "real" session data to and is inherently running garbage collection to ensure expired metadata is purged with the side effect of doing GC when the database handler is in use. GC has to run for all of the other handlers for all session data to be correctly cleared, we aren't trying to delete the filesystem data manually when the filesystem handler is in use.

If someone wants to write a cron to do something similar, they are more than welcome to, BUT, we need to start making a major distinction between the OPTIONAL session metadata that we are arbitrarily logging right now (basically all columns of the #__session table minus the data column) and the REQUIRED data when the database is in use as the session handler's backend storage (the session_id, time, and data columns). This PR in effect tunes the cleanup operation to ONLY be a cleanup operation for the metadata aspect of that storage and would actually make the session GC logic consistent across all handlers.

@mbabker

This comment has been minimized.

Copy link
Member Author

commented Dec 25, 2017

BTW with the CLI console stuff in 4.0 we actually could add two console commands to core with ease.

  • session:gc would be an interface to trigger PHP's native session_gc() (we'd just need to add it to our Session package, but that's pretty simple)
  • session:metadata:gc would be a way to trigger this DELETE query for the optional metadata (this one would need a config param to disable it for web requests otherwise the point of having it is defeated)
@feltkamptv

This comment has been minimized.

Copy link

commented Dec 25, 2017

I get the following notification:

Notice: Undefined variable: time in /var/www/html/public_html/libraries/src/Application/CMSApplication.php on line 878

So I added the line:
$time = time();

@mbabker mbabker force-pushed the mbabker:session-meta-cleanup branch from 36a0c1d to f383bd4 Dec 25, 2017

@mbabker

This comment has been minimized.

Copy link
Member Author

commented Dec 25, 2017

With the patch there should only be a doc block line on line 878. Nonetheless the undefined variable issue is fixed.

@feltkamptv

This comment has been minimized.

Copy link

commented Dec 26, 2017

Yes I know, its because I am using Joomla version 3.8.3, so I added and uncommented the changed lines instead of replacing the whole file.

@joomdonation

This comment has been minimized.

Copy link
Contributor

commented Dec 26, 2017

@mbabker Thanks for the explanation, I checked JSessionStorageNone and understand the reason now.

There is still one issue with this PR which I commented here #19165 (comment) , could you please take a look?

@mbabker

This comment has been minimized.

Copy link
Member Author

commented Dec 26, 2017

I think we will need to find a way to pass session lifetime from configuration to session handle classes. For example, right now, gc method from database handle has $lifetime parameter set to 1440 seconds

The lifetime is passed in when PHP natively performs GC, we are not ourselves actively calling this method. The default argument value is based on the default value of the session.gc_maxlifetime PHP config. But, that default value shouldn't actually be there as it looks like the SessionHandlerInterface (granted this is PHP 5.4+) defines that method with the argument not having a default value.

So what this probably means is that the value being passed in is coming from session.gc_maxlifetime and that we might need to change this value at runtime.

@joomdonation

This comment has been minimized.

Copy link
Contributor

commented Dec 26, 2017

I understand that it is PHP calling that method when perform GC, I just thought that we have a way to pass that parameter and PHP will use it. Anyway, it is still an issue need to be addressed

@mbabker

This comment has been minimized.

Copy link
Member Author

commented Dec 26, 2017

To fix that we have to set session.gc_maxlifetime at runtime. Either way it's beyond scope of this patch IMO.

@mbabker

This comment has been minimized.

Copy link
Member Author

commented Dec 26, 2017

Also note that such a configuration change can have side effects if the Joomla installation is living on a space shared with other platforms storing session data in the same location (i.e. the filesystem handler using /tmp with both a Joomla and Symfony application sharing that path). So it probably shouldn't be something we change so easily.

@joomdonation

This comment has been minimized.

Copy link
Contributor

commented Dec 26, 2017

Either way it's beyond scope of this patch IMO

I am afraid of it is not. Before this PR, session lifetime setting works well for database handler. If this PR is accepted, then that setting doesn't have effect for database handler, so I guess we will get complain from users.

I know it is a difficult issue (and not sure if we can address it)

@mbabker

This comment has been minimized.

Copy link
Member Author

commented Dec 26, 2017

Then there are two feasible options:

  • Drop all non-database session backend storage
  • Reject this PR

We give preferential treatment to the database handler because we mix the data stored in the #__session table (it is both our OPTIONAL metadata and REQUIRED session data). If the OPTIONAL metadata aspect of that were stored in a separate table, then the REQUIRED session data would not be touched by our metadata management operations. The fact is, we have a screwed up system design. We do PHP's job of garbage collection as a side effect of the metadata storage, ONLY for the database handler. Show me the code where we are doing garbage collection for Memcached or the filesystem.

As noted at http://php.net/manual/en/session.configuration.php#ini.session.gc-maxlifetime we SHOULD NOT arbitrarily change session.gc_maxlifetime because if you have multiple platforms sharing the same storage space for sessions, changing that runtime setting on a process that actually triggers garbage collection will result in the cleanup operation touching EVERYTHING in that storage space (for filesystem storage, everything in the /tmp directory that's a session file as an example).

I'm done fighting for session management improvements in Joomla. It seems like everyone wants the status quo and nobody realizes the side effects of it.

@mbabker

This comment has been minimized.

Copy link
Member Author

commented Dec 26, 2017

To really get things right here, what we really need is:

  • This PR
  • #13322
  • Creating a new #__session_metadata table with the OPTIONAL metadata fields mentioned previously and restructuring the existing #__session table to only store the REQUIRED session data fields mentioned previously

Users should not care when garbage collection is happening on the PHP session data. Users are more interested in the garbage collection on the optional metadata because THAT is the data used to show statistics in various places in the UI (all of those touched on by #13322 in making this metadata tracking optional). Our code is correctly performing garbage collection on that metadata, it has a side effect of replacing PHP's garbage collection but ONLY when the database session handler is in use. This PR removes that side effect, and is ONLY visible because of the dual purpose of the #__session table.

This PR fixes a legitimate performance issue. Introduced by ourselves. Yes, it causes a visible side effect by removing another side effect of the existing code logic. You can either live with the major performance issue or you can live with potentially stale data being displayed in some places until we finally get this metadata behavior right.

@joomdonation

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2017

I have tested this item successfully on f383bd4

Login, logout works OK. Session in my third party extension still works as expected


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/19165.

@joomdonation

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2017

@mbabker This morning, I tried to add some code to log the query executed in gc method of JSessionStorage to see what value PHP passed to $lifetime parameter in the method and you know what, it uses correct value from Session lifetime configuration of Joomla

I was so surprised and think that PHP is so smart but actually it isn't smart like that. It is actual our codebase pass the value by using ini_set https://github.com/joomla/joomla-cms/blob/staging/libraries/src/Session/Session.php#L942

So in short, this PR is good at it is and works as expected.

@joomdonation

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2017

@feltkamptv We require two testers for each PR, as you successfully apply this PR for your large site, please mark the test result as success so that it can be merged

@franz-wohlkoenig

This comment has been minimized.

Copy link
Member

commented Dec 27, 2017

@joomdonation i altered the successfully Test by @eltkamptv at Issue Tracker.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/19165.

@franz-wohlkoenig

This comment has been minimized.

Copy link
Member

commented Dec 27, 2017

Ready to Commit after two successful tests.

@joomla-cms-bot joomla-cms-bot added the RTC label Dec 27, 2017

$time = time();
// If the database session handler is not in use and the current time is a divisor of 5, purge session metadata after the response is sent
if ($handler !== 'database' && $time % 5 === 0)

This comment has been minimized.

Copy link
@joomdonation

joomdonation Dec 31, 2017

Contributor

Thinking more about this, I think we should make behavior consistent for session handlers instead of making something exception for database only.

We can use the code below (taken from PHP source code https://github.com/php/php-src/blob/master/ext/session/session.c#L377-L380)

$gcProbability = ini_get('session.gc_probability');
$gcDivisor     = ini_get('session.gc_divisor');
$randNumber    = (int) (lcg_value() * $gcDivisor);

if ($gcProbability > 0 && $randNumber < $gcProbability)
{
	// Run code to clean up session meta data here
}

Doing that have several benefits:

  1. Consistent behavior in all storages
  2. If someone uses a storage different than database, he can still disable this code (set session.gc_probability in php.ini to 0) and use cron job for doing the task
  3. Later, when we introduce a separate database table to store session meta data, we don't have to delete data in two separate database tables in gc method of database handle

If it is too much change, maybe we can try for Joomla 4?

This comment has been minimized.

Copy link
@mbabker

mbabker Dec 31, 2017

Author Member

It would decrease the frequency to a point that isn't suitable for purpose by default. Garbage collection isn't meant to be a frequent operation in general but this is data affecting what is displayed on the website and potential site operations (the checked out lock check in JTable for example).

The metadata cleanup must be a operation that runs fairly more frequent than PHP's native session GC and must not be reliant on the PHP configuration because it has nothing to do with PHP's session module other than getting an expiry time and identifier.

This comment has been minimized.

Copy link
@joomdonation

joomdonation Dec 31, 2017

Contributor

Agree !

The problem is that with this PR, we rely on PHP session gc to clean up meta data for database handler, so if someone uses database storage, they would get the issue with data related to session meta data displayed on the site and potential site operations as you pointed out (with default php configuration, the chance for clean up data is only 1/1000 - 0.1%)

Not sure if we can introduce a additional config option to allow users to choose, maybe range from 0 to 100 and default to a number like 5 (so 5% of chance for clean up by default)?

This comment has been minimized.

Copy link
@mbabker

mbabker Dec 31, 2017

Author Member

We should not introduce probability based garbage collection for the metadata.

The problem is that with this PR, we rely on PHP session gc to clean up meta data for database handler

The only fix for this is as I said before, splitting the metadata to a separate table. As long as the two sets of data stay in one table, there is going to be a tradeoff somewhere. IMO, performance should not be that tradeoff and this PR is geared toward improving a major performance issue in the most efficient way practical given the B/C constraints of 3.x.

@rdeutz rdeutz merged commit b9abdd1 into joomla:staging Jan 15, 2018

5 checks passed

JTracker/HumanTestResults Human Test Results: 2 Successful 0 Failed.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/drone/pr the build was successful
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
hound No violations found. Woof!

@joomla-cms-bot joomla-cms-bot removed the RTC label Jan 15, 2018

@rdeutz rdeutz added this to the Joomla 3.8.4 milestone Jan 15, 2018

@mbabker mbabker deleted the mbabker:session-meta-cleanup branch Jan 15, 2018

mbabker added a commit that referenced this pull request Feb 14, 2018

Revert "Decrease the frequency of session metadata cleanup and where …
…in the request cycle it occurs (#19165)"

This reverts commit b9abdd1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.