New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a daemon mode to memoize parse state, fork off workers to analyze single files quickly #563
Add a daemon mode to memoize parse state, fork off workers to analyze single files quickly #563
Conversation
Misc parts of this that I could split out into separate parts, if requested:
|
Hi @TysonAndre. Is there any reason you removed the branch? Are you planning on splitting it out into smaller patches? |
@morria , I was running a script that deleted all of the branches that were merged to my fork's |
@TysonAndre Did you not intent to close these two as well? |
string $key, | ||
bool $clear_all_memoize | ||
) : Type { | ||
// TODO: Figure out why putting this into a static variable results in test failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confusing helper method was a workaround for issues I encountered investigating #571
- Fixing 571 first would be a better solution.
/** | ||
* | ||
*/ | ||
public function asFQSENString() : string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant, also in NativeType. Unlikely to be a bottleneck, since __toString() in scalars is also defined the same way, and won't call asFQSENString
use Phan\Output\IssuePrinterInterface; | ||
use Symfony\Component\Console\Output\OutputInterface; | ||
|
||
final class PHPLikePrinter implements IssuePrinterInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't useful to me at this point - If a user wants something to stand in for php -l
, they probably also want to preserve the exact same relative/absolute paths, like phan_client is doing.
645c1fe
to
2468157
Compare
2468157
to
a18b07f
Compare
This is waiting for the next round of review comments. No new changes, fixed merge conflicts with No further changes (unless bug fixes need to be made) are planned within this PR.
There's nothing else that makes sense split out of this PR, that hasn't been done already
|
@morria - are there any more requested changes? |
Merged upstream/master to confirm that tests were still passing. There were no merge conflicts. If there are no more requested changes on/after april 14th, this will be rebased, tested again, then merged. |
phan_client
Outdated
} | ||
// Exit if any of the requested files are syntactically invalid. | ||
if ($failure_code !== 0) { | ||
self::debugError("Files were syntactically invalid\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe pass system error thru?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://secure.php.net/manual/en/function.system.php#refsect1-function.system-description
System already did that. All of the output is passed through to stderr (not applicable here) and stdout.
The return value of system()
is the last line of it.
This logging statement also covers php interpreter missing from $PATH, file being somehow unreadable, etc.
Added notes on this extension to the
Also, added |
This depends on the pcntl extension. Because there isn't a database, it is less likely that code changes will break this. However, this will keep using RAM in the background (Around the same amount of ram to analyse a project in single process mode) Server implementation `./phan --daemonize-tcp-port 4846 --quick` Client implementation (compatible with php 7.0 and 7.1) ```bash ./phan_client --daemonize-tcp-port 4846 -l src/Phan/Phan.php ``` (Similar to `php -l src/Phan/Phan.php`, and can be used as a substitute for syntax check in some scripts) This will parse every single file, but will only analyze a small subset of those files. When the daemon receives a request 1. Check for modified files (it would be slightly faster if we did this for all files except the requested file(s) ahead of time, e.g. with inotify PECL) This requires stat()ing all files 2. Remove classes, methods, functions, and constants from the files that were deleted or modified, and in the config.. 3. Add classes, methods, functions, and constants from the files that were added or modified, and in the config. 4. Fork, and run the remaining method checks and analysis parse steps. The forked process doesn't affect the state of the daemon. (The memory(state) of the fork is a copy of the original, but doesn't affect the original) Commit Details: - exit(1) if the command line parameters of phan_client are invalid. - Listen on a TCP port or unix socket for requests as JSON blobs(unix in progress) ("method" can probable be translated to POSTs to the path /analyze_files, etc, if HTTP is supported in the future) - nit: Show 0% progress bar as well, make animation smoother - Depends on pcntl extension and assumes unix-like OS. (folder, path names) - The functionality to undo parse steps into CodeBase. An alternative would be to make CodeBase and the undoable CodeBase be classes implementing the same interface, but that would require changing all code using CodeBase, and is prone to braking. - Added micro-optimizations that affect only daemon mode (This script is invoked from an editor on file save/change, so it's desirable to be as fast as possible on large as possible. Outside of daemon mode, the pre-analyze steps (analyzeFunctions) don't take up a large fraction of the time.) - misc optimizations - misc TODOs, noting bugs that would happen in code that is part of parse mode. - Add debugging logs for things which would affect daemon mode (e.g. accidental duplicate classes). Disable this by default. Once this is stable, debugging logs can be removed - change analyzing file list to take a closure to generate the file list. This allows daemon mode to process any files that were added, deleted, or removed since the last time files were passed - make `phan_client` Print the requested relative/absolute path, not the relative path within the project (and only print the paths if they are requested) (so that editors open error view with the correct file name) Optimization steps and reasons: 1. Without daemon mode, phan analysis of phan in debug mode took 11 seconds (would have been faster if php wasn't using --enable-debug mode) 2. After keeping the parse state and memory, and forking off new processes to execute the remaining steps, steps took 2 seconds. 3. Cut out analyzing files which aren't in the set of requested files. (Some steps after parse are applied to the whole codebase, not just the files to analyze). (takes 0.1 seconds on phan) At the current point in time, all issues I saw are with getContext()->getFile(). Some plugins may emit issues in a different file from the one being checked, but I'm not aware of any. Add partial documentation on using the plugin Full documentation is pending (Step by step on setting up a VM) Building php on the host should be much faster. Work for later PRs: - make the example plugin able to automatically start the daemon. - automatically stop after 30 minutes without new requests or so - rate limit - Plugins in other languages will go in different PRs - etc. Properly clean up defunct child processes. Clean up formatting, undo change to bootstrap.php (made error traces less readable)
see issue 642
4560edb
to
5b93033
Compare
Rebased, and commented out an unused variable (The usage was a commented out TODO) in Language/Element/Clazz.php |
src/Phan/Analysis.php
Outdated
{ | ||
$pluginSet = ConfigPluginSet::instance(); | ||
$hasPlugins = $pluginSet->hasPlugins(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable names should use underscores rather than camel case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed for variables introduced in this PR
src/Phan/CLI.php
Outdated
$this->file_list = array_merge( | ||
$this->file_list, | ||
file(Config::projectPath($file_name), FILE_IGNORE_NEW_LINES|FILE_SKIP_EMPTY_LINES) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an unrelated fix? Perhaps it should be in its own commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a merge conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored the version from etsy/phan master
@@ -247,6 +244,27 @@ public function __construct() | |||
// that we can get the project root directory to | |||
// base other config flags values on | |||
break; | |||
case 's': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like 's' is defined in the short-options list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/Phan/CodeBase/UndoTracker.php
Outdated
use Phan\Daemon; | ||
|
||
/** | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having not read this code yet, I'm unable to guess what this class does based on its name. A comment explaining what it intends to do would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Done.
src/Phan/Daemon.php
Outdated
} | ||
// Unless debugging Phan itself, these two configurations are unnecessarily adding slowness. | ||
if (PHP_DEBUG) { | ||
echo "Warning: This daemon is slower when php is compiled with --enable-debug\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps these warnings should be to STDERR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done for the warnings about slow settings.
And emit phan daemon warnings about slow php settings to stderr
Addressed review comments. If there are no more requested changes on/after April 21, this will be rebased, tested again, then merged. |
Followup of phan#563 If a file is re-parsed in daemon mode, then clear the previous parse errors in that file. Otherwise, they'd keep showing up after they were fixed. Add options for emacs flycheck plugin to work with phan (Allows substituting a file in /tmp/ for a file in the project) (linked in the original PR for daemon mode). An AST class node's name is null for an anonymous class. - --temporary-file-map (JSON mapping of multiple files) and --flycheck-file (For use with a single file (for only one file provided)) (Inconvenient to JSON encode in flycheck) Add --disable-plugins to the server, which speeds up quick mode even more if lower latency for daemon mode requests is desired in a large project. Make phan_client compatible with php 5.6 to reduce setup needed to test phan_client in daemon mode.
Related to #22 and TysonAndre#14
Mostly stable, but there's still some bugs when switching between git branches.
Intermediate work on the squashed commits can be seen on https://github.com/TysonAndre/phan/commits/daemonize-and-analyze-individual-files
This depends on the pcntl extension.
Because there isn't a database, it is less likely that code changes will
break this. However, this will keep using RAM in the background (Around the
same amount of ram to analyse a project in single process mode)
Server implementation
./phan --daemonize-tcp-port 4846 --quick
Client implementation
(Similar to
php -l src/Phan/Phan.php
, and can be used as a substitutefor syntax check in some scripts)
This will parse every single file, but will only analyze a small subset
of those files.
When the daemon receives a request
for all files except the requested file(s) ahead of time, e.g. with
inotify PECL)
This requires stat()ing all files
were deleted or modified, and in the config..
were added or modified, and in the config.
The forked process doesn't affect the state of the daemon.
(The memory(state) of the fork is a copy of the original,
but doesn't affect the original)
Commit Details:
("method" can probable be translated to POSTs to the path /analyze_files, etc,
if HTTP is supported in the future)
names)
CodeBase and the undoable CodeBase be classes implementing the same
interface, but that would require changing all code using CodeBase,
and is prone to braking.
(This script is invoked from an editor on file save/change,
so it's desirable to be as fast as possible on large as possible.
Outside of daemon mode, the pre-analyze steps (analyzeFunctions)
don't take up a large fraction of the time.)
parse mode.
accidental duplicate classes).
Disable this by default. Once this is stable, debugging logs can be
removed
This allows daemon mode to process any files that were added, deleted,
or removed since the last time files were passed
phan_client
Print the requested relative/absolute path,not the relative path within the project
(and only print the paths if they are requested)
(so that editors open error view with the correct file name)
Optimization steps and reasons:
Without daemon mode, phan analysis of phan in debug mode took 11 seconds
(would have been faster if php wasn't using --enable-debug mode)
After keeping the parse state and memory, and forking off new
processes to execute the remaining steps, steps took 2 seconds.
Cut out analyzing files which aren't in the set of requested files.
(Some steps after parse are applied to the whole codebase, not just
the files to analyze).
(takes 0.1 seconds on phan)
At the current point in time, all issues I saw are with
getContext()->getFile(). Some plugins may emit issues in a different
file from the one being checked, but I'm not aware of any.
Add partial documentation on using the plugin
Full documentation is pending (Step by step on setting up a VM)
Building php on the host should be much faster.
Work for later PRs:
make the example plugin able to automatically start the daemon.
rate limit
Plugins in other languages
plugin was wasn't revived
Investigate and fix exceptions for classes which are suddenly missing from
use
statements when switching between checkoutsetc.
Examples
Simple Vim and emacs plugins exist to analyze a single file on file save. (aside: Vim output became slightly more verbose)
phan_client
requests)