Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Cavalcade lookups on front-end requests #702

Open
roborourke opened this issue Nov 22, 2022 · 12 comments · May be fixed by #713
Open

Avoid Cavalcade lookups on front-end requests #702

roborourke opened this issue Nov 22, 2022 · 12 comments · May be fixed by #713

Comments

@roborourke
Copy link
Contributor

Per this issue on the Cavalcade repo when db latency is a factor the common pattern of checking for and scheduling cron jobs is often done on init. WP core does this as well many 3rd party plugins etc...

In theory we could avoid this altogether using the preflight filter to limit these checks to the admin only, provided they are added during the init hook:

add_filter( 'pre_get_scheduled_event', function ( $pre ) {
    if ( ! doing_action( 'init' ) || is_admin() ) {
       return $pre;
    }
    return true;
}, 1 );

Events will still be scheduled but on admin requests only.

This would cause issues if plugins are adding events on the frontend using the init hook based on user interactions and $_GET or $_POST parameters, although that would be far more likely dealt with not on the init hook.

This approach seems safe to me, especially if we also check that $_POST is empty. Would be great to get some additional input and feedback though.

@roborourke
Copy link
Contributor Author

Suggestion from @kovshenin

It is a good compromise, though I'd be careful returning true for all events. Instead maybe have an explicit list of events that you expect to be scheduled at all times, and let the checks for those only run within wp-admin.

@kovshenin
Copy link
Contributor

This is also a good opportunity to improve our scheduled tasks docs, specifically suggest ways how to add single and recurring tasks, how and when to check wp_next_scheduled and why it's better done during altis.migrate instead of init.

By default in altis/cloud we can short-circuit the core https check event, the privacy events, etc. that we know are most likely to already be scheduled. We can also try and move them to altis.migrate if that makes more sense.

roborourke added a commit that referenced this issue Dec 5, 2022
@roborourke roborourke linked a pull request Dec 5, 2022 that will close this issue
@kadamwhite
Copy link
Contributor

Implementing this filter from application code as described here doesn't seem to have any effect on whether Cavalcade queries are executed on the frontend, for me. In fact, returning either true or false seems instead to guarantee that there will also be a call to Job->save() which triggers additional queries.

It seems like possibly Cavalcade runs so early, that I am not able to set filters from an mu-plugin which actually run during the cavalcade job lookup process?

@kadamwhite
Copy link
Contributor

kadamwhite commented Dec 9, 2022

My assumption was incorrect, but returning true causes an error because cron.php checks $returned_event->timestampif it is not null orfalse`, so the only way I've reliably disabled an event looks like this:

/**
 * Skip cavalcade scheduling on high-traffic frontend pages.
 *
 * @param null|false|object $pre  Value to return instead. Default null to continue retrieving the event.
 * @param string            $hook Action hook of the event.
 * @param array             $args Array containing each separate argument to pass to the hook's callback function.
 * @return null|false|object Returning a non-null value will short-circuit the normal process.
 */
function mytheme_skip_cavalcade_scheduled_event_on_high_traffic_pages( $pre, $hook, $args ) {
	if ( is_admin() ) {
		// Do not alter processing of events within the admin.
		return $pre;
	}

	// Most of the events we want to skip run before any globals are set, so
	// it is not straightforward to skip these only on is_single() or is_home().
	// Skip relevant jobs on all frontend pages, for now.

	$is_skippable_hook = in_array(
		$hook,
		[
			'wp_https_detection',
			'extendable-aggregator-sync-cron',
			'wp_site_health_scheduled_check',
			'wp_privacy_delete_old_export_files',
		],
		true
	);

	if ( $is_skippable_hook ) {
		$fake_scheduled_event = new stdClass();
		// Perpetually one hour in the future to defer any queries.
		$fake_scheduled_event->timestamp = time() + HOUR_IN_SECONDS;
		return $fake_scheduled_event;
	}

	return $pre;
}

add_filter( 'pre_get_scheduled_event', 'mytheme_skip_cavalcade_scheduled_event_on_high_traffic_pages', 10, 3 );

init cannot be used as the sole restriction because the health check initiator runs before init.

@roborourke
Copy link
Contributor Author

Can we just return false?

I this is the only way would you mind updating the PR #713 ?

@kadamwhite
Copy link
Contributor

@roborourke if we return false, Cavalcade treats the job as unscheduled, and does us the "favor" of querying to set it. That pumps additional jobs into the table and causes an INSERT on every page view.
image

@kovshenin
Copy link
Contributor

If $pre is not null then something else has already modified it, likely at an earlier priority and I think we should respect that and return early, I was thinking if ( is_admin() || ! is_null( $pre ) ) {

Can we just return false?

Returning false will also short-circuit the filter because there is an explicit check for null:

https://github.com/WordPress/wordpress-develop/blob/9690aa7f7b0b7acd5acad54278952622585dec16/src/wp-includes/cron.php#L746-L749

@kadamwhite
Copy link
Contributor

@kovshenin See above—it short-circuits the filter, but then the plugin treats that as an opportunity to "fix" the situation by setting the missing job. That results in more queries, which is not the goal.

@roborourke
Copy link
Contributor Author

Shortcircuiting is the point though? Cavalcade should be respecting any previous modifications to the value on that filter.

@kovshenin
Copy link
Contributor

Sorry I should have been a bit more clear. Short-circuit with a false at the end tells Cavalcade that the none of the jobs exist, which is what we don't want, as demonstrated by @kadamwhite's screenshot. We only want to short-circuit for the jobs that we know, i.e. $is_skippable_hook, and allow Cavalcade to proceed with the checks to fetch any remaining jobs from the DB, and add them to the schedule if absent.

@kadamwhite
Copy link
Contributor

kadamwhite commented Dec 9, 2022

@kovshenin The change I made which resulted in additional queries was to return false where I am otherwise returning the fake scheduled notice, above.

	if ( $is_skippable_hook ) {
		return false;
	}

I wasn't short-circuiting at the end, only for the jobs that we know. However, returning false causes WP to treat the job as not scheduled — which, as I've attempted to illustrate, means that the health check then schedules the event, unnecessarily.

		if ( ! wp_next_scheduled( 'wp_site_health_scheduled_check' ) && ! wp_installing() ) {
			wp_schedule_event( time() + DAY_IN_SECONDS, 'weekly', 'wp_site_health_scheduled_check' );
		}

Returning false doesn't skip processing of that hook, it instead notifies that the event must be scheduled. That results in more queries overall. The only way to skip both looking up, and scheduling an event, is to return an object with a non-falsy timestamp property so that each of the functions which check for a scheduled event will decide no further action is needed.

@kovshenin
Copy link
Contributor

kovshenin commented Dec 9, 2022

Ah, apologies, I thought @roborourke's comment was referring to that last return $pre;. Returning an object with a timestamp sounds correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants