Parsing is computationally expensive task, to which the PHP language is not very well suited. Nonetheless, there are a few things you can do to improve the performance of this library, which are described in the following.
Running PHP with XDebug adds a lot of overhead, especially for code that performs many method calls. Just by loading XDebug (without enabling profiling or other more intrusive XDebug features), you can expect that code using PHP-Parser will be approximately five times slower.
As such, you should make sure that XDebug is not loaded when using this library. Note that setting
xdebug.default_enable=0 ini option does not disable XDebug. The only way to disable
XDebug is to not load the extension in the first place.
If you are building a command-line utility for use by developers (who often have XDebug enabled), you may want to consider automatically restarting PHP with XDebug unloaded. The composer/xdebug-handler package can be used to do this.
If you do run with XDebug, you may need to increase the
xdebug.max_nesting_level option to a
higher level, such as 3000. While the parser itself is recursion free, most other code working on
the AST uses recursion and will generate an error if the value of this option is too low.
Assertions should be disabled in a production context by setting
zend.assertions=0 if set at runtime). The library currently doesn't make heavy use of assertions,
but they are used in an increasing number of places.
Many objects in this project are designed for reuse. For example, one
Parser object can be used to
parse multiple files.
When possible, objects should be reused rather than being newly instantiated for every use. Some objects have expensive initialization procedures, which will be unnecessarily repeated if the object is not reused. (Currently two objects with particularly expensive setup are lexers and pretty printers, though the details might change between versions of this library.)
A limitation in PHP's cyclic garbage collector may lead to major performance degradation when the active working set exceeds 10000 objects (or arrays). Especially when parsing very large files this limit is significantly exceeded and PHP will spend the majority of time performing unnecessary garbage collection attempts.
Without GC, parsing time is roughly linear in the input size. With GC, this degenerates to quadratic runtime for large files. While the specifics may differ, as a rough guideline you may expect a 2.5x GC overhead for 500KB files and a 5x overhead for 1MB files.
Because this a limitation in PHP's implementation, there is no easy way to work around this. If possible, you should avoid parsing very large files, as they will impact overall execution time disproportionally (and are usually generated anyway).
Of course, you can also try to (temporarily) disable GC. By design the AST generated by PHP-Parser is cycle-free, so the AST itself will never cause leaks with GC disabled. However, other code (including for example the parser object itself) may hold cycles, so disabling of GC should be approached with care.