Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple requests per process #14953

Open
wants to merge 7 commits into
base: master
from

Conversation

@tomasz-grobelny
Copy link
Contributor

tomasz-grobelny commented Apr 3, 2019

Changes required to allow third-party Nextcloud application to use multiple requests per process approach for document previews and gallery previews.

Key points about this pull request:

  1. Idea behind this change is to keep the current request processing pipeline intact (one process per request model) and at the same time allow for multiple requests per process model to be implemented by external applications. This model allows to skip significant number of per request processing (loading php code, loading Nextcloud applications) therefore reducing time needed to process single request.
  2. It is a followup to https://github.com/tomasz-grobelny/server/tree/reactphp_poc, stripped down to only the parts that are needed in the core of Nextcloud. All other code from that PoC can be implemented in a separate application. See also previous discussion here: #14131
  3. These changes should keep backward compatibility - if you think this is not the case then please speak up. It also does not introduce additional dependencies on Nextcloud environment.
  4. These changes by themselves do not improve performance, but allow external applications to implement performance improvement.
  5. These changes might not be complete - in particular all middlewares should be adapted to use HttpContext and not request injected through constructor. However, this is just enough to have a useful application that makes use of proposed changes.

Please review the changes and let me know if you are ok with them or you have some questions about any part.

@tomasz-grobelny tomasz-grobelny added this to the Nextcloud 17 milestone Apr 3, 2019
@tomasz-grobelny tomasz-grobelny force-pushed the tomasz-grobelny:multiple_requests_per_process branch 2 times, most recently from 2f43b09 to 705ad70 Apr 3, 2019
@tomasz-grobelny tomasz-grobelny requested review from blizzz and schiessle Apr 4, 2019
@tomasz-grobelny

This comment has been minimized.

Copy link
Contributor Author

tomasz-grobelny commented Apr 4, 2019

Any idea why there is no status for tests for quite some time now?

@nickvergessen

This comment has been minimized.

Copy link
Member

nickvergessen commented Apr 4, 2019

We updated Drone and are currently checking the new config:
#14958

@nickvergessen

This comment has been minimized.

Copy link
Member

nickvergessen commented Apr 5, 2019

You should be able to rebase on master now so tests run again.

@tomasz-grobelny tomasz-grobelny force-pushed the tomasz-grobelny:multiple_requests_per_process branch from 705ad70 to 6ab9fbd Apr 6, 2019
…ltiple requests per process approach for document previews and gallery previews

Signed-off-by: Tomasz Grobelny <tomasz@grobelny.net>
Signed-off-by: Tomasz Grobelny <tomasz@grobelny.net>
Signed-off-by: Tomasz Grobelny <tomasz@grobelny.net>
Signed-off-by: Tomasz Grobelny <tomasz@grobelny.net>
@tomasz-grobelny tomasz-grobelny force-pushed the tomasz-grobelny:multiple_requests_per_process branch from f5abd68 to 0532e40 Apr 10, 2019
@tomasz-grobelny

This comment has been minimized.

Copy link
Contributor Author

tomasz-grobelny commented Apr 10, 2019

Tests should be more or less fixed - any potential failures are not related to my changes to my knowledge. Therefore PR is ready to be reviewed.

@@ -54,7 +54,7 @@ public function __construct(array $urlParams = array()) {
);
});
$container->registerService('ProvisioningApiMiddleware', function(SimpleContainer $c) use ($server) {
$user = $server->getUserManager()->get($c['UserId']);
$user = $server->getUserManager()->get($server['UserId']);

This comment has been minimized.

Copy link
@nickvergessen

nickvergessen Apr 11, 2019

Member

Is this required? So all middlewares should ge the user from server, instead of the container? what's the difference?

This comment has been minimized.

Copy link
@tomasz-grobelny

tomasz-grobelny Apr 20, 2019

Author Contributor

Yes and no. This is required in the sense that otherwise tests failed. This resulted from moving registration of UserId from DIContainer to Server class (which is also a container). This in turn was needed to be able to replace one of session objects in external application. On the other hand I could try writing compatibility code ($this->registerService('UserId', ...)) that would make this specific change not required. I was thinking that explicitly querying specific container seems to be a hack anyway so I didn't pay too much attention to it.

Please let me know what is your preference (keep as is or add compatibility in DIContainer).

This comment has been minimized.

Copy link
@tomasz-grobelny

tomasz-grobelny May 6, 2019

Author Contributor

@nickvergessen, any thoughts on this one? Should it be changed or can it stay as is?

@@ -213,6 +213,14 @@ class Filesystem {
/** @var bool */
private static $logWarningWhenAddingStorageWrapper = true;
public static function reset() {

This comment has been minimized.

Copy link
@nickvergessen

nickvergessen Apr 11, 2019

Member

this scares me a bit. I think a lot of apps keep their own "cache" and need to be made aware of a potential context/user switch

This comment has been minimized.

Copy link
@MorrisJobke

MorrisJobke Apr 11, 2019

Member

Also a lot of the code that we currently use is actually building on that fact. There are properties in objects that cache given things for the user.

This comment has been minimized.

Copy link
@tomasz-grobelny

tomasz-grobelny May 6, 2019

Author Contributor

@nickvergessen, @MorrisJobke: as per explanation below - do you think this is ok?

@MorrisJobke

This comment has been minimized.

Copy link
Member

MorrisJobke commented Apr 11, 2019

In general I like the overall idea, but then we should go for only that way of running Nextcloud. Either it should be an application server or a request based approach. Mixing those two in one code base just screams for horrible bugs that are super hard to find. Also maintaining both ways means a lot of effort on our side. Don't get me wrong, but I doubt that you will maintain all of this in parallel to our general pace of implementing and fixing stuff.

I get that you want to improve the performance heavily by basically changing the stack Nextcloud is running on (mod_php/php-fpm vs a services that continuously runs via ReactPHP). We have a lot of instances out there that run with the first of them and they all then need to be migrated over somehow. Also this is something I highly doubt to be worth the effort. We should look into bootstrapping less on every request - this is how PHP was designed - bootstrap the whole thing per request. Thus that is the part where we need to improve. We simply bootstrap too much on every request and thus have those not perfect timings. Still I think that moving over to a complete new stack is the wrong approach here, because we need to evolve slowly and can't do too radical changes here, as there are a few hundreds of thousand instances out there that rely on us and how our stack looks like now.

@tomasz-grobelny

This comment has been minimized.

Copy link
Contributor Author

tomasz-grobelny commented Apr 20, 2019

@MorrisJobke thanks a lot for your review. There are a few parts that I need more clarity on and I would like to provide more clarity from my side:

  1. You mention that PHP was designed for one request per process model. This is of course true and probably comes from the fact that when PHP was designed this was the dominant model (CGI and stuff). At the same time all major contemporary web application frameworks (ASP.NET MVC, Node.js, Java) use application server model. Also worth noting is that ReactPHP supports PHP 5.3 which is a framework released almost 10 years ago and ReactPHP is being developed since 2012. Therefore I think that application server model is proven and mature enough even in PHP to start considering it for use in Nextcloud.
  2. As for reducing bootstrap time: I simply do not believe it would happen ever. First of all, I do not currently see any significant effort at reducing this time. Second, given my profiling attempts, I see no quick wins. This means that a really significant amount of code would need to be reworked to reduce bootstrap time millisecond after millisecond after millisecond. Even if somebody would be willing to do it - it would cause huge amount of changes, each one introducing potential bugs and incompatibilities. IMO the effort is not worth it given the benefit we get from application server approach.
  3. Compatibility: I totally understand huge amount of deployed instances and the need to keep them all running on the same platform. That's why this code does nothing to break that platform compatibility. I believe the changes proposed are evolutionary (as you say: "we need to evolve slowly") yet at the same time give application developers new possibilities. Also note that these changes are relatively small so the overhead of keeping both models running is also relatively small.
  4. Also please note that it is not my attempt to have the application server model running for every single request - most of the time Nextcloud works really (or reasonably) well. I noticed there are a few key request endpoints that need performance improvements: serving image previews and WebDAV are the two that are most important for me. If other developers want to join in improving performance - great. If not - we still get benefit for at least one endpoint that people already complained about.
  5. As for potential bugs: every change brings the possibility of introducing bugs. In this approach I believe the most probable one is when someone uses some request based variable in middleware's constructor (therefore caching it). However, we do not write that many new middlewares. Secondly, if all existing middlewares are converted I thing the temptation to write the code incorrectly would be minimal. Thirdly, if we encounter any unexpected issues we could write tests to guard us against it happening again.

Do you think those changes could go into Nextcloud codebase given fixes to parts mentioned by @nickvergessen are provided?

nickvergessen and others added 2 commits Apr 20, 2019
Co-Authored-By: tomasz-grobelny <tomasz@grobelny.net>
Co-Authored-By: tomasz-grobelny <tomasz@grobelny.net>
@MorrisJobke

This comment has been minimized.

Copy link
Member

MorrisJobke commented May 2, 2019

@tomasz-grobelny Thanks for this detailed explanation. It makes in this way a lot more sense. 👍

@MorrisJobke

This comment has been minimized.

Copy link
Member

MorrisJobke commented May 6, 2019

@tomasz-grobelny I guess the best way to get changes in is to split them a bit more up. Some general restructurings in here are totally fine, but where the comments are on are still not decided. So the best approach is to leave this here as it is now and take the changes without comments and create a new PR for them. Those will most likely be accepted. Then we can focus on the remaining items in here. Does that make sense for you as well?

@tomasz-grobelny

This comment has been minimized.

Copy link
Contributor Author

tomasz-grobelny commented May 23, 2019

@MorrisJobke, @nickvergessen - actually it seems we have two points that have open discussions:

  1. Changes in apps/provisioning_api/lib/AppInfo/Application.php - here it would be great to have input from @nickvergessen which way I should go: provide compatibility (if possible) or keep as in this PR?
  2. Changes in lib/private/Files/Filesystem.php - I can of course leave out the reset() method, but I am not sure where that leads us - either apps will be able to get the same functionality by some convoluted means (eg. reflection) therefore complicating code or they will not be able to achieve this so we miss the whole point of those changes.
@tomasz-grobelny

This comment has been minimized.

Copy link
Contributor Author

tomasz-grobelny commented May 29, 2019

@MorrisJobke, @nickvergessen, any thoughts? Maybe instead of leaving out the reset() method it would make sense to provide some means of notification that the request context changed?

And BTW, here you can have a look how application using those changes might look like: https://cloud.grobelny.net/s/aDFDzeqL2WadmRa

This one speeds up file serving for my case by a factor of 5+ once it warms up. It is not finished by any means, but a strong PoC.

@nachoparker

This comment has been minimized.

Copy link
Member

nachoparker commented Jun 29, 2019

This would be no small change, so would need to be planned out very carefully, but in my opinion will have to happen sooner or later to achieve decent performance.

+1

@MorrisJobke MorrisJobke referenced this pull request Jul 15, 2019
12 of 28 tasks complete
@rullzer rullzer removed this from the Nextcloud 17 milestone Aug 8, 2019
@nachoparker nachoparker referenced this pull request Sep 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Performance
  
Awaiting triage
5 participants
You can’t perform that action at this time.