New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC: Use pcntl_fork to run save command from different thread #18
base: master
Are you sure you want to change the base?
Conversation
Tested with one of the project on my local with 378 pages. With remote request:
With process forking:
With process forking like this, is faster by ~60%.
Also all custom export doesn't work, like sitemaps and wp-json api json file. Will look into this deeper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some thoughts
} | ||
$contents = array_map( __NAMESPACE__ . '\\replace_urls', $contents ); | ||
|
||
array_map( function( $content, $url ) use ( $args_assoc ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this was added? This CLI command is to just show the HTML output, not save it too.
die( 'could not fork' ); | ||
} else if ( $pid ) { | ||
// This is the main process. Put it to wait till all the child process finished. | ||
\pcntl_wait( $status ); //Protect against Zombie children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to design this loop such that we could spawn multiple children at once? It might be that we can do 4 on the CPU ok, so doing them in parallel could give a good perf speedup.
// Fire 3rd party actions. | ||
$this->do_actions( $url ); | ||
|
||
save_contents_for_url( $content, $url, $config ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also managed to run this in parallel, it would mean per the above we'd also get some parrallel uploading to netstorage which is nice. It might be technically better to de-couple the capturing output, and saving to netstorage so each can run at maximum page, but for now this should be ok.
die( 'could not fork' ); | ||
} else if ( $pid ) { | ||
// This is the main process. Put it to wait till all the child process finished. | ||
\pcntl_wait( $status ); //Protect against Zombie children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it seems currently you are just using forking when using CLI commands, as opposed to the cron tasks too. Is that intentional, or just for the POC?
This PR is a refresh from the original PR #1 from Joe.
Like the issue mentioned in previous PR from Joe:
And I tried to fork the main process to avoid multiple php
include
. And it does pretty well.Currently only apply this to wpcli command to avoid php max execution timeout. And in production environment, mostly this will be done from cavalcade which run on wpcli.
The other challenge i have in mind is how to handle the fallback when the main process get killed?
@joehoyle what is your opinion on this PR?