Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a daemon mode to memoize parse state, fork off workers to analyze single files quickly #563

Merged

Conversation

TysonAndre
Copy link
Member

@TysonAndre TysonAndre commented Feb 26, 2017

Related to #22 and TysonAndre#14
Mostly stable, but there's still some bugs when switching between git branches.

Intermediate work on the squashed commits can be seen on https://github.com/TysonAndre/phan/commits/daemonize-and-analyze-individual-files

This depends on the pcntl extension.

Because there isn't a database, it is less likely that code changes will
break this. However, this will keep using RAM in the background (Around the
same amount of ram to analyse a project in single process mode)

Server implementation

./phan --daemonize-tcp-port 4846 --quick

Client implementation

./phan_client --daemonize-tcp-port 4846 -l src/Phan/Phan.php

(Similar to php -l src/Phan/Phan.php, and can be used as a substitute
for syntax check in some scripts)

This will parse every single file, but will only analyze a small subset
of those files.

When the daemon receives a request

  1. Check for modified files (it would be slightly faster if we did this
    for all files except the requested file(s) ahead of time, e.g. with
    inotify PECL)
    This requires stat()ing all files
  2. Remove classes, methods, functions, and constants from the files that
    were deleted or modified, and in the config..
  3. Add classes, methods, functions, and constants from the files that
    were added or modified, and in the config.
  4. Fork, and run the remaining method checks and analysis parse steps.
    The forked process doesn't affect the state of the daemon.
    (The memory(state) of the fork is a copy of the original,
    but doesn't affect the original)

Commit Details:

  • exit(1) if the command line parameters of phan_client are invalid.
  • Listen on a TCP port or unix socket for requests as JSON blobs(unix in progress)
    ("method" can probable be translated to POSTs to the path /analyze_files, etc,
    if HTTP is supported in the future)
  • nit: Show 0% progress bar as well, make animation smoother
  • Depends on pcntl extension and assumes unix-like OS. (folder, path
    names)
  • The functionality to undo parse steps into CodeBase. An alternative would be to make
    CodeBase and the undoable CodeBase be classes implementing the same
    interface, but that would require changing all code using CodeBase,
    and is prone to braking.
  • Added micro-optimizations that affect only daemon mode
    (This script is invoked from an editor on file save/change,
    so it's desirable to be as fast as possible on large as possible.
    Outside of daemon mode, the pre-analyze steps (analyzeFunctions)
    don't take up a large fraction of the time.)
  • misc optimizations
  • misc TODOs, noting bugs that would happen in code that is part of
    parse mode.
  • Add debugging logs for things which would affect daemon mode (e.g.
    accidental duplicate classes).
    Disable this by default. Once this is stable, debugging logs can be
    removed
  • change analyzing file list to take a closure to generate the file list.
    This allows daemon mode to process any files that were added, deleted,
    or removed since the last time files were passed
  • make phan_client Print the requested relative/absolute path,
    not the relative path within the project
    (and only print the paths if they are requested)
    (so that editors open error view with the correct file name)

Optimization steps and reasons:

  1. Without daemon mode, phan analysis of phan in debug mode took 11 seconds
    (would have been faster if php wasn't using --enable-debug mode)

  2. After keeping the parse state and memory, and forking off new
    processes to execute the remaining steps, steps took 2 seconds.

  3. Cut out analyzing files which aren't in the set of requested files.
    (Some steps after parse are applied to the whole codebase, not just
    the files to analyze).
    (takes 0.1 seconds on phan)

    At the current point in time, all issues I saw are with
    getContext()->getFile(). Some plugins may emit issues in a different
    file from the one being checked, but I'm not aware of any.

Add partial documentation on using the plugin

Full documentation is pending (Step by step on setting up a VM)
Building php on the host should be much faster.

Work for later PRs:

  • make the example plugin able to automatically start the daemon.

    • automatically stop after 30 minutes without new requests or so
  • rate limit

  • Plugins in other languages

    • I wasn't familiar with how to install/develop a syntastic plugin, so that
      plugin was wasn't revived
  • Investigate and fix exceptions for classes which are suddenly missing from use statements when switching between checkouts

  • etc.

Examples

Simple Vim and emacs plugins exist to analyze a single file on file save. (aside: Vim output became slightly more verbose)

screenshot from 2017-02-23 22-29-21

flycheck_phan_example

  1. Invoking the server (While --quick isn't mandatory, that option significantly reduces the duration of phan_client requests)
~/phan $ ./phan --daemonize-tcp-port 4849 --quick
Listening for Phan analysis requests at tcp://127.0.0.1:4849
  1. Invoking the client, after server indicates it is listening for requests
~/phan $ echo '{"method":"analyze_files","files":["src/Phan/Phan.php"]}' | nc 127.0.0.1 4849
{"status":"ok","issue_count":0,"issues":[]}
~/phan $ ./phan_client --daemonize-tcp-port 4849 -l src/Phan.php
No syntax errors detected in src/Phan/Phan.php
~/phan $ vim src/Phan/Phan.php # add  buggy, unused function to test out the daemon
~/phan $ echo '{"method":"analyze_files","files":["src/Phan/Phan.php"]}' | nc 127.0.0.1 4849
{"status":"ok","issue_count":2,"issues":[{"type":"issue","check_name":"PhanUndeclaredMethod","description":"UndefError PhanUndeclaredMethod Call to undeclared method \\Phan\\Config::thismethodisundefined","severity":5,"location":{"path":"src\/Phan\/Phan.php","lines":{"begin":417,"end":417}}},{"type":"issue","check_name":"PhanTypeMismatchReturn","description":"TypeError PhanTypeMismatchReturn Returning type null but examplefn() is declared to return string","severity":5,"location":{"path":"src\/Phan\/Phan.php","lines":{"begin":418,"end":418}}}]

~/phan $ time ./phan_client --daemonize-tcp-port 4849 -l src/Phan/Phan.php
No syntax errors detected in src/Phan/Phan.php
Phan error: UndefError: PhanUndeclaredMethod: Call to undeclared method \Phan\Config::thismethodisundefined in src/Phan/Phan.php on line 417
Phan error: TypeError: PhanTypeMismatchReturn: Returning type null but examplefn() is declared to return string in src/Phan/Phan.php on line 418
./phan_client --daemonize-tcp-port 4849 -l src/Phan/Phan.php  0.03s user 0.01s system 40% cpu 0.089 total

@TysonAndre
Copy link
Member Author

Misc parts of this that I could split out into separate parts, if requested:

  1. Limiting analysis phase to a subset of files (Just Analysis::analyzeFile)
    • Vim plugin for that, instead of daemon mode
  2. Optimizations of phase between parse and analysis
  3. phplike output formatter (Or remove that, since it's not much use if the working directory isn't the project root)
  4. Non-zero exit codes in Phan\CLI (should exit(1) if Phan quits due to an invalid option).
  5. Smoother progress bar CLI output

@TysonAndre TysonAndre closed this Mar 6, 2017
@TysonAndre TysonAndre deleted the daemonize-and-client-rebased-upstream branch March 6, 2017 05:15
@morria
Copy link
Member

morria commented Mar 6, 2017

Hi @TysonAndre. Is there any reason you removed the branch? Are you planning on splitting it out into smaller patches?

@TysonAndre
Copy link
Member Author

@morria , I was running a script that deleted all of the branches that were merged to my fork's master branch, and didn't notice this.

@TysonAndre TysonAndre reopened this Mar 6, 2017
@morria
Copy link
Member

morria commented Mar 6, 2017

@TysonAndre Did you not intent to close these two as well?

#573
#572

string $key,
bool $clear_all_memoize
) : Type {
// TODO: Figure out why putting this into a static variable results in test failures.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confusing helper method was a workaround for issues I encountered investigating #571

  • Fixing 571 first would be a better solution.

/**
*
*/
public function asFQSENString() : string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant, also in NativeType. Unlikely to be a bottleneck, since __toString() in scalars is also defined the same way, and won't call asFQSENString

use Phan\Output\IssuePrinterInterface;
use Symfony\Component\Console\Output\OutputInterface;

final class PHPLikePrinter implements IssuePrinterInterface
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't useful to me at this point - If a user wants something to stand in for php -l, they probably also want to preserve the exact same relative/absolute paths, like phan_client is doing.

@TysonAndre TysonAndre force-pushed the daemonize-and-client-rebased-upstream branch 2 times, most recently from 645c1fe to 2468157 Compare March 14, 2017 05:45
@TysonAndre TysonAndre force-pushed the daemonize-and-client-rebased-upstream branch from 2468157 to a18b07f Compare March 15, 2017 00:28
@TysonAndre
Copy link
Member Author

TysonAndre commented Mar 15, 2017

This is waiting for the next round of review comments.
This change should have limited effects on regular (non-daemon) phan runs.

No new changes, fixed merge conflicts with --extended-help.
I finished cleaning up some unnecessary parts 8 days ago.

No further changes (unless bug fixes need to be made) are planned within this PR.

  • Small note: When phan supports class_alias, an additional undo step will have to be added

There's nothing else that makes sense split out of this PR, that hasn't been done already

  • phan_client could have a mode to run the full phan checks without daemonizing, but that's only useful for very small projects

@TysonAndre
Copy link
Member Author

@morria - are there any more requested changes?

@TysonAndre
Copy link
Member Author

TysonAndre commented Apr 8, 2017

Merged upstream/master to confirm that tests were still passing. There were no merge conflicts.
Edit: merged yet again due to PRs being merged. Resolved minor conflicts( in code comment), and rewrote condition in an equivalent manner

If there are no more requested changes on/after april 14th, this will be rebased, tested again, then merged.

phan_client Outdated
}
// Exit if any of the requested files are syntactically invalid.
if ($failure_code !== 0) {
self::debugError("Files were syntactically invalid\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pass system error thru?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://secure.php.net/manual/en/function.system.php#refsect1-function.system-description

System already did that. All of the output is passed through to stderr (not applicable here) and stdout.
The return value of system() is the last line of it.

This logging statement also covers php interpreter missing from $PATH, file being somehow unreadable, etc.

@TysonAndre
Copy link
Member Author

TysonAndre commented Apr 14, 2017

Added notes on this extension to the NEWS file.
Additionally, mentioned that this would work on Linux/Unix, but not Windows (pcntl_fork() doesn't exist on windows, and there's no quick alternative).

  • The client would work, but not the server.
    One way for Windows users to test this out is to create a tutorial for running the server with docker (or a VM with an exposed port 4846), and installing PHP with a few extensions/PECLs to imitate their environment

Also, added phan_client to the list of files that would be added to vendor/bin/ when phan is installed.

This depends on the pcntl extension.

Because there isn't a database, it is less likely that code changes will
break this. However, this will keep using RAM in the background (Around the
same amount of ram to analyse a project in single process mode)

Server implementation

`./phan --daemonize-tcp-port 4846 --quick`

Client implementation (compatible with php 7.0 and 7.1)

```bash
./phan_client --daemonize-tcp-port 4846 -l src/Phan/Phan.php
```

(Similar to `php -l src/Phan/Phan.php`, and can be used as a substitute
for syntax check in some scripts)

This will parse every single file, but will only analyze a small subset
of those files.

When the daemon receives a request

1. Check for modified files (it would be slightly faster if we did this
   for all files except the requested file(s) ahead of time, e.g. with
   inotify PECL)
   This requires stat()ing all files
2. Remove classes, methods, functions, and constants from the files that
   were deleted or modified, and in the config..
3. Add classes, methods, functions, and constants from the files that
   were added or modified, and in the config.
4. Fork, and run the remaining method checks and analysis parse steps.
   The forked process doesn't affect the state of the daemon.
   (The memory(state) of the fork is a copy of the original,
   but doesn't affect the original)

Commit Details:

- exit(1) if the command line parameters of phan_client are invalid.
- Listen on a TCP port or unix socket for requests as JSON blobs(unix in progress)
  ("method" can probable be translated to POSTs to the path /analyze_files, etc,
  if HTTP is supported in the future)
- nit: Show 0% progress bar as well, make animation smoother
- Depends on pcntl extension and assumes unix-like OS. (folder, path
  names)
- The functionality to undo parse steps into CodeBase. An alternative would be to make
  CodeBase and the undoable CodeBase be classes implementing the same
  interface, but that would require changing all code using CodeBase,
  and is prone to braking.
- Added micro-optimizations that affect only daemon mode
  (This script is invoked from an editor on file save/change,
  so it's desirable to be as fast as possible on large as possible.
  Outside of daemon mode, the pre-analyze steps (analyzeFunctions)
  don't take up a large fraction of the time.)
- misc optimizations
- misc TODOs, noting bugs that would happen in code that is part of
  parse mode.
- Add debugging logs for things which would affect daemon mode (e.g.
  accidental duplicate classes).
  Disable this by default. Once this is stable, debugging logs can be
  removed
- change analyzing file list to take a closure to generate the file list.
  This allows daemon mode to process any files that were added, deleted,
  or removed since the last time files were passed
- make `phan_client` Print the requested relative/absolute path,
  not the relative path within the project
  (and only print the paths if they are requested)
  (so that editors open error view with the correct file name)

Optimization steps and reasons:

1. Without daemon mode, phan analysis of phan in debug mode took 11 seconds
  (would have been faster if php wasn't using --enable-debug mode)
2. After keeping the parse state and memory, and forking off new
   processes to execute the remaining steps, steps took 2 seconds.
3. Cut out analyzing files which aren't in the set of requested files.
   (Some steps after parse are applied to the whole codebase, not just
   the files to analyze).
   (takes 0.1 seconds on phan)

   At the current point in time, all issues I saw are with
   getContext()->getFile(). Some plugins may emit issues in a different
   file from the one being checked, but I'm not aware of any.

Add partial documentation on using the plugin

Full documentation is pending (Step by step on setting up a VM)
Building php on the host should be much faster.

Work for later PRs:

- make the example plugin able to automatically start the daemon.
  - automatically stop after 30 minutes without new requests or so
- rate limit
- Plugins in other languages will go in different PRs
- etc.

Properly clean up defunct child processes.

Clean up formatting, undo change to bootstrap.php
(made error traces less readable)
@TysonAndre TysonAndre force-pushed the daemonize-and-client-rebased-upstream branch from 4560edb to 5b93033 Compare April 14, 2017 05:14
@TysonAndre
Copy link
Member Author

Rebased, and commented out an unused variable (The usage was a commented out TODO) in Language/Element/Clazz.php

{
$pluginSet = ConfigPluginSet::instance();
$hasPlugins = $pluginSet->hasPlugins();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable names should use underscores rather than camel case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed for variables introduced in this PR

src/Phan/CLI.php Outdated
$this->file_list = array_merge(
$this->file_list,
file(Config::projectPath($file_name), FILE_IGNORE_NEW_LINES|FILE_SKIP_EMPTY_LINES)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an unrelated fix? Perhaps it should be in its own commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a merge conflict

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored the version from etsy/phan master

@@ -247,6 +244,27 @@ public function __construct()
// that we can get the project root directory to
// base other config flags values on
break;
case 's':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like 's' is defined in the short-options list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

use Phan\Daemon;

/**
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having not read this code yet, I'm unable to guess what this class does based on its name. A comment explaining what it intends to do would be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Done.

}
// Unless debugging Phan itself, these two configurations are unnecessarily adding slowness.
if (PHP_DEBUG) {
echo "Warning: This daemon is slower when php is compiled with --enable-debug\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps these warnings should be to STDERR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for the warnings about slow settings.

@TysonAndre
Copy link
Member Author

Addressed review comments.

If there are no more requested changes on/after April 21, this will be rebased, tested again, then merged.

@TysonAndre TysonAndre merged commit a265be0 into phan:master Apr 21, 2017
TysonAndre added a commit to TysonAndre/phan that referenced this pull request Apr 29, 2017
Followup of phan#563

If a file is re-parsed in daemon mode,
then clear the previous parse errors in that file.
Otherwise, they'd keep showing up after they were fixed.

Add options for emacs flycheck plugin to work with phan
(Allows substituting a file in /tmp/ for a file in the project)
(linked in the original PR for daemon mode).
An AST class node's name is null for an anonymous class.
- --temporary-file-map (JSON mapping of multiple files)
  and --flycheck-file (For use with a single file (for only one file
  provided))
  (Inconvenient to JSON encode in flycheck)

Add --disable-plugins to the server, which speeds up quick mode even more if
lower latency for daemon mode requests is desired in a large project.

Make phan_client compatible with php 5.6 to reduce setup needed to test
phan_client in daemon mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants