Archived - no longer maintained
Please note that this project is no longer being maintained. It is only kept here for historical purposes.
Installation and usage
This is the phpjoern utility for Joern. It uses the
to generate ASTs from PHP projects and exports these to CSV files
suitable to be parsed by Joern.
More information on Joern and PHP may be found in our paper Efficient and Flexible Discovery of PHP Application Vulnerabilities published at EuroS&P 2017.
Prerequisite: Installing the php-ast extension
First off, you need a working installation of PHP 7. Next, you need to
set up the
php-ast extension, available at:
Essentially, clone the repository, then compile and install the extension as follows:
git clone https://github.com/nikic/php-ast cd php-ast git checkout 701e853 phpize ./configure make sudo make install
Lastly, add the line
extension=ast.so to your
Using the parser
The parser is implemented in PHP and makes use of the
A simple Bash wrapper script in the repository's root directory called
php2ast serves as an entry point. It takes the path to a PHP file or to
a directory as an argument. If the provided argument is a directory, the
parser will recursively search for all PHP files in that directory and
generate an AST for each of them.
Before executing the script, the environment variable
$PHP7 should be
set to the location of the
php executable of PHP 7. If no such variable
is set, the location
/usr/bin/php will be used by default.
./php2ast somefile.php ./php2ast somedirectory/
Either of these calls will generate two CSV files
representing the nodes of the generated AST(s) and their relationships,
respectively. In addition, directory and file nodes are also created and
connected to the individual AST root nodes to reflect a scanned directory's
structure and obtain a single large tree.
By default, the specific format of the CSV files is the format required by
batch-import tool for Neo4J (see below), available at:
for help. However, note that Joern currently only supports the default format as an input format. In addition, Joern outputs code property graph edges only in this same format, although additional output modules should be easy to implement.
Generating code property graphs with Joern
The CSV files generated in the previous step can now be passed to Joern. Joern will read these files, analyze the ASTs, generate control flow and program dependence edges for them, and output the calculated edges in another CSV file. First off, obtain Joern here:
Essentially, clone the repository and build the project:
git clone https://github.com/octopus-platform/joern gradle build
In Joern's root directory, there is a small Bash wrapper script that serves
as an entry point for generating code property graphs for PHP, called
phpast2cpg. It takes two arguments: The node files and the edges file
generated in the previous step, in that order. Use it as follows:
./phpast2cpg nodes.csv rels.csv
Joern will then output a file
cpg_edges.csv, representing the calculated
control flow and program dependence edges.
Importing the code property graphs into Neo4J
You should now have three CSV files, named
cpg_edges.csv by default. These files can be used to create a Neo4J
database using the tool batch-import.
It is easiest to download a precompiled
batch-import for the particular
Neo4J version you intend to use. For instance, for Neo4J 2.1:
mkdir batch-import cd batch-import curl -O https://dl.dropboxusercontent.com/u/14493611/batch_importer_21.zip unzip batch_importer_21.zip
In the following, let let
$JEXP_HOME be the absolute path to the newly
$PHPJOERN_HOME the absolute path
to your installation of the present repository.
To import the generated CSV files into a Joern Neo4J database, simply use the following:
java -classpath "$JEXP_HOME/lib/*" -Dfile.encoding=UTF-8 org.neo4j.batchimport.Importer $PHPJOERN_HOME/conf/batch.properties graph.db nodes.csv rels.csv,cpg_edges.csv
The performance you experience will mainly depend on the heap size that you
allocate. You should edit the file
accordingly, see here.
batch.properties file that comes with
phpjoern is optimized for heap
sizes larger than 4 GB that you should allocate accordingly, e.g.,
HEAP=6G java -classpath "$JEXP_HOME/lib/*" -Xmx$HEAP -Xms$HEAP -Dfile.encoding=UTF-8 org.neo4j.batchimport.Importer conf/batch.properties graph.db nodes.csv rels.csv
Once the import is finished, you will have a directory
graph.db suitable for Neo4J.
You may now point your Neo4J installation to that database and start your analysis.
For further discussion, refer to http://joern.readthedocs.io.