pad edited this page Aug 18, 2015 · 23 revisions

-*- org -*-

0.31 (Q1 2015) ((principia parenthesis))

extra: a lot


support for ocaml

0.30 (Q4 2014) ((principia parenthesis), lpizer)

extra: a lot

introduce lpizer

fine grained lpizer with support for C (to automatically

generate chunks of a literate programs)

0.29 (Q3 2014) (datalog for “flowing”, codeslicer)


first datalog-based interprocedural context insensitive dataflow analysis!!!

using toy datalog (written in Lua)

switch to bddbddb for backend, scales to xv6 and even plan9 kernel!

7000 and 100000 LOC respectively

introduce codeslicer

it tries to detect automatically less relevant code, that is the yet-another-extension-plugins that are not important to understand the software architecture (e.g. for Linux it would be the thousands of device drivers, the many fs, internen protocols, etc).

mini c

simple miniC AST and parser (abuse lang_cpp/ and lang_c/)

so can experiment with datalog fact generation and datalog logic


far better typedef inference, better heuristics (xx *yy), more cases handled

(e.g. function pointers)

handle kencc extensions

graph_code_c port of graph_code_clang

handle dependencies of cpp constructs!

basic datalog-based interprocedural context insensitive dataflow analysis!

mostly port of miniC, but doing the linearization of expression and handle more than just one file by hooking into graph_code_c!


basic parser (had just lexer before)

basic graph_code for files, modules, functions, macros

can explore dependencies in emacs source :)


codequery add return/2 predicate, complement parameter/4

introduce lang_rust/

basic lexer and highlighter


far better backend for C

toy backend for lisp


read/write fields for C, type info for C, hooks in graph_code

for graph_code_clang and also graph_code_c

0.28 (Q2 2014) ((kernel.tex.nw parenthesis))

extra: kernel.tex.nw


highlight searched entity

have a current entity selection that stays

so can explore usages of a field for instance via emacs middle click


can be applied to a list of dirs, not just one dir, allow to

run for instance on include/ kernel/ while still using information from the skip_list

new backend for js using avik’s data and objC using mathieu’s data

a LOC counter ‘codemap -test_loc’ which modulate the size of files

with archi_code classification


better infer typedefs, for unnamed structures (kencc ext)

and toplevel prototypes


handle local extern decl (they are not locals!),

fix unincluder when have extern (forward) decl and decl in same .c


support for $x::class (facebook extension), newtype x as x, …$args (lexer only)

fixed bugs in pretty printer and add -index_php <int> so can be configured to

4 spaces

codequery: type/2 predicate

sgrep/spatch: trailing commas, xhp, nested patterns


improve speed of parser on patholigical element_list case

stags: better support


better handling of labels, const vs globals in codegraph, dependencies

for implicit fields


fixed all new ocaml 4.02 warnings, e.g. unused parameters

less common specific code, ++ -> @, push2 -> push,

use of spatch for ocaml :)

0.27 (Q1 2014) (“hypertext” codemap, generalized scheck)

extra: plan9, kencc


match defs and uses even when the file has been modified

(do not use the probably obsolete line information in graph_code, trust more the lexer/parser/highlighter fuzzy def/use classifier)

show uses/defs relations also at use site

so can hover on some code and see the corresponding definition (as well as the other users of this definition and the uses of this definition).

show uses/defs macrolevel dependencies for directories

scalable spirit! a visualization should apply from the topmost entities to the lowest one! Also factorized more code.

show information about entity under cursor (def or use) in tooltip

go to definition when on a use

finally … basic TAGS functionality, “hypertext” because can follow “links”

handle skip list

so easy to manually configure which parts to not display when the heuristics in archi_code_lexer are not enough.

improve completion box by considering not only files mentions in the

light db (while still filtering object files and skip list files)

cleanup the code, introduce world type, less use of refs


improve automatic layering by using hill climbing algorithm

less backward dependencies on plan9 for instance


support more languages via graph_code_checker

software architecture related checks

Deadcode, UnusedExport, UndefinedDefOfDecl

!!deadcode false positives really help to find bugs in other parts of the analysis!!

annotation support to skip false positives

(port what was in cmf)

token-based indexer to detect false positives


codegraph: less dupes, less false edges due to dupes,

codegraph: better handle typedefs, more accurate type dependencies

for structures


support for short lambda, require traits, and try finally (facebook



support for arrow (aka short lambda), variable number of parameters,

trailing commas, interpolated strings, and fix bugs in regexp and xhp

(avik) support for generics

internal, also many things works now with multiple dirs and

skip list

cleanup commons/, remove lib-xml, lib-sexp, lib-json

configure option for Javalib and cmt

works again with ocaml 3.12

new OPAM package, easier to install, more portable, less requirments


0.26 (Q4 2013) (basic OPAM package)

extra: zamcov part2


opam package (just the parsers and commons)

port to 4.01, make it configurable

simplified configure output too

split h_program-lang, introduce graph_code/


new -derived_data option (generate TAGS, layers, prolog, light db)

build light db from graph_code, and make it fast enough

a -test_transitive_deps action to extract parts of huge codebase


support for namespace part2 (graph_code)


support new class syntax, JSX (xhp), and typescript type syntax


talk codemap/codegraph ocaml workshop

0.25 (Q3 2013) (codemap/codegraph integration)


highlight users and uses of the file under the cursor

via graph_code

highlight users and uses of the entity under the cursor

via graph_code

treemap focus on users and uses of the file under the cursor

via graph_code, great!

bottom up and topdown layer using graph_code

associate a depth in the use graph to each file and entities.


support for namespace part1 (parsing)

support shape


started to play with zamcov

refresh to 4.01, fix many bugs, cleanup makefiles, added some builtins, added debugging support, better error message, stack trace, etc.


better heuristics to preserve space around commas


support for mercurial, help diff analysis on mercurial repo


use Travis CI for build and test regressions,

0.24 (Q2 2013) (web-codemap, web-codegraph, generalized sgrep/spatch)

web: resume work on ocsigen

port to eliom3

use of js_of_ocaml

assume OPAM for web/

(but not for the rest of pfff/ so people don’t need OPAM to compile pfff)

basic facebook integration in web UI, use oauth to track who is using it


web frontend, using ocsigen

with rpc calls for getting cell information

new builder for dotfiles

new emitters to generate tags and basic prolog db

tested for ocaml


web frontend, using ocsigen

macro and micro level (separately for now)

(fixed my long standing issue with canvas and flexible text font size)


use graph_code as an entity finder

ranking and filtering of errors via -filter <n> or -rank

better process to update builtins


partial support for C/C++/OCaml/Js/Java/PHP via ast_fuzzy

better error message when try to do transformations not handled well

(e.g. minus on a ‘…’)

fuzzy parser for C/C++/Js/Java/OCaml/PHP


many facebook extensions to the language: generics, typedefs, collections,

implicit fields, xhp comment syntax, trailing comma, (new Foo) and id, trait renaming feature

adapted each time:

  • CST, AST (but not pretty print AST)
  • visitors, dumpers, mappers
  • tags, prolog db
  • sgrep/spatch
  • scheck/cmf
  • codegraph

huge refactoring on the PHP grammar and AST, unified lvalue and expr,

got from 7 s/r conflicts to 1, and can now parse more code by being more orthogonal.


removed cruft, old/ directories, dead subprojects

make huge refactoring on ASTs easier

0.23 (Q1 2013) (clang parser)


packing less relevant entities in a fake “…” entity

making codegraph usable on ugly projects with hundreds of subdirectories (but very fragile, lots of new exceptions)

deadcode/internal_module highlighting

backward dependencies statistics and highlightings

finer grained support for clang/C

introduce lang_clang/

a thin wrapper around the clang/llvm ast dumps of clang-check -emit-ast.

ast sexp parser

patch to clang AST dumper

basic compile_database.json generator from make traces

unincluder/deduper so can have a simplified global analysis

introduce lang_objc/

basic lexer and grammar

java/bytecode: mapping bytecode to java

Handle nested classes, generic classes, anon classes (understand better how things are bytecompiled, e.g. where does foo$1$bar.class come from?)


0.22 (Q4 2012) (codegraph faster/finer, .cmt/.class parsers)

codegraph for more languages, at very fine grained level

added OCaml (.ml and .cmt), Java (.java and .class), and improved PHP. Also improved speed a lot.

codequery for more languages

in addition to PHP, support for java (via .class and .java files), OCaml (via .cmt files)

introduce lang_bytecode/ (mostly wrapper around javalib/),

with graph_code support.

added support for .cmt files for ocaml 4.00 allowing better analysis

of ocaml code, including pfff code

spatch: improved unparser, more flexible, easier to add heuristics

regarding spacing issues

skip list file, more flexible, feel less overwhelmed

can fix bugs one by one

0.21 (Q3 2012) (codegraph, java/c parser)

introduce codegraph, a hierarchical matrix-based dependency visualizer

with support for PHP, C,

introduce lang_c/

a simplified version of lang_cpp/ just focusing on C. Also added, mainly for analyzing xv6

improve lang_java/, support for generics, annotations, and other recent

Java features. Also added


new variable checker using

far less false positives

0.20 (Q2 2012) (luisa)

introduce, another way to represent a full program

futur backend of codegraph


metavariables for XHP attributes



good basis then for, new,, etc.

0.19 (Q1 2012) (pfff_logger)

introduce pfff_logger (in OPA), monitoring the use of pfff tools

(please rerun ./configure if you have compilation pbs)

introduce lang_opa/

basic support

introduce type inference for PHP (julien)


case insensitive mode by default

metavariables for XHP tags


heavily commented abstract interpreter and type inference

refactored lang_php/analysis/

less error messages, progress meter, split files, each PHP database has its own file, refactored abstract interpreter, etc.


0.18 (Q4 2011) (codequery)

introduce codequery, an interactive Prolog-based code query engine,

especially useful to query inheritance information.

introduce simplified AST for PHP (julien)

introduce abstract interpreter for PHP (julien)

which leads to a more precise callgraph, type information, and opens the way for many more checks (including security checks using tainting analysis).

scheck: lots of new checks, and removed lots of false positives

while still being reasonably fast, thanks to the use of a lazy entity finder.


leverage pretty printer so can reindent correctly the code

after some transformation. Thx to julien. spatch –pretty-printer.

a “sed mode” so can do spatch -e ‘s/foo(X,Y)/foo(X)/’ *.php

lang_ml, more highlight

so can parse julien’s code which heavily use modules.

lang_cpp, better parsing


removed lots of dead code now that commited to use the abstract interpreter

(and prolog) instead of a PIL+cflow+dataflow+db+…

refacrored lang_php/analysis, get rid of directories

0.17 (Q3 2011) (overlay, pm_depend)

introduce notion of code overlay

with some helper functions to check the validity of an overlay. Also added some support for overlay in codemap.

Overlays help organize and visualize a (bad) codebase from a different point of view.

introduce pm_depend

a package/module dependency visualizer exporting data for Gephi (only for ocaml code for now). Played with it on the code of pfff (in package mode) and some of its components: codemap, cpp, php, cmf (in module mode with and without extern mode).

update: superseded by codegraph

started port of codemap to ocsigen

can now display the treemap. Had to report many bugs to the ocsigen team to get this to work.

sgrep: support for regexp when matching string constants

0.16 (Q2 2011) (c++ parser refactoring)

better C++ parser

complete refactoring of the parser. Far less heuristics techniques. Closer to what was described in the CC’09 paper.

0.15 (Q1 2011) (html/css parsers, ocsigen try)

extra: mmm and meh (related to html/css parser) extra: zamcov part1 (I was working on php coverage for fb using hphp)

introduce web-based source code navigator a la LXR using ocsigen

first play with ocsigen.

update: not used

introduce lang_html/ with clean type

(also using code from a stripped down version of ocamlnet)

introduce lang_css/

(using code from ccss by dario teixeira)

introduce lang_web/ with combined html+js+css parser


0.14 (Q4 2010) (layers, spatch php, more language highlighters)

extra: c– (and first use of ocamldoc -dotall to visualize soft archi)


layer type, so can save results of global analysis and process them

later in codemap or pfff_statistics

layers! like in google earth

  • architecture/aspect layer (default one)
  • static dead code layer
  • dead: dynamic live code layer (using xhprof data)
  • test coverage layer (using phpunit and xdebug data)
  • bugs layer
  • security layer
  • cyclomatic complexity,
  • age (and activity) layer
  • number of authors layer

more semantic visual feedback

can see arguments passed by refs visually (TakeArgNByRef) as well as functions containing dynamic calls (ContainDynamicCall)

visual grep

can now visualize the result of a git grep on a project

better visualization of directories

use different color for dirs and files labels, and highlight first letter of label at depth = 1

introduce spatch, a syntactical patch

a DSL to express easily refactoring on PHP.


better support for XHP patterns with flexible matching

on attributes

experimental support for statement patterns

can now express patterns like: sgrep -e ‘foreach($A as $V) { if (strpos($T, $V) !== false) { return Z; }}’

introducing lang_nw/

so can visualize also Tex/Latex/Noweb source (which includes the documentation of pfff!)

introducing lang_java/

based on joust

introducing lang_lisp/

introducing lang_haskell/

introducing lang_python/

introducing lang_csharp/

introducing lang_erlang/


more highlights

php analysis

finalized the PIL

update: superseded by

dead? dataflow analysis using PIL (thanks to iproctor)

update: supersed by abstract interpreter

global analysis

store additional attributes/properties per entities in the light code db

  • does it take argument by refs.
  • does it contain dynamic calls ($fn(…))

This can help the visualizer to give more semantic visual feedback.


wrote wiki pages (intro, sgrep, spatch, features, vision, roadmap, etc)

applied codemap on many open source project and generated screenshots.


refactored the code in visual/ to have smaller and cleaner files

thanks to literate programming and codemap itself to show the problem and assist in the refactoring

refactored code about defs/uses in

and put more generic stuff in h_program-lang/

renamed pfff_visual in codemap


a polymorphic wrapper around ocamlgraph

(to compute strongly connected components of php callgraph, in prevision of a bottom up analysis of php)

update: also useful for codegraph backend.


first public release!

0.12 (Q3 2010) (codemap, light db, tags php, scheck php, ocaml/js parsers)

Real start of multi-language support.

introduce source code navigator/searcher/visualizer using cairo

Show treemap and thumbnails of file content! Have also minimap, zoom, labels, alpha for overlapping labels, labels in diagonal, anamorphic content showing in bigger fonts the important stuff, magnifying glass, clickable content where a click opens the file in your editor at the right place, etc. => A kind of google maps but on code :)

Support for PHP, Javascript, ML, C++, C, thrift.

For PHP do also URL highlighting which helps understand the control flow in webapps. Also highlight local/globals/parameters variables differently. Also highlight bad smells (especially security related bad smells)

Integrate other PL artifacts:

  • The builtins API reference
  • PLEAC cookbooks

=> a single place to query information about the code (no need to first grep the code, then google for the function because it turns out to be a builtin).

Can easily go the definition of a function (whether it’s a builtin or not, thanks to the parsable PHP manual and HPHP idl files).

Can easily go to the example of use of a function (whether it’s a builtin or not, thanks to PLEAC for the builtin functions).

Far more flexible and powerful than the previous treemap visualizer which was using Graphics. Now also render file content!

new tool, stags, a TAG generator using ASTs not fragle regexp

support for PHP ocaml

introduce lang_ml/

Allow to use and experiment the treemap code visualizer on the pfff source itself; to see if such features are useful.

introduce lang_cpp/

introduce analyze_js/, analyze_cpp/, analyze_ml/

very basic support. Just highlighting

introduce, a generic code-information database

using JSON as support. Will help make pfff less php-specific.


support linear patterns (e.g. sgrep -e ‘X & X’) and a -pvar option to print matched metavarables instead of matched code


reorganized the treemap and h_program-lang/ to be less facebook and pfff specific. Have a commons/ for instance.


introduce checker, scheck

warn about “unused variable” and “use of undefined variable”.

use fast global analysis (bonus: it’s flib-aware and desugar the require_module_xxx and other flib conventions).

introduce php_etags

a more precise TAGS file generator (bonus: it’s xhp-aware).

introduce lang_js/

parsing/unparsing/dumping. preliminary refactoring support.

php, introduce builtin XHP support


dead? introduce PIL, PHP Intermediate Language

a more conveninent AST to work on for doing complex analysis such as dataflow, type-inference, tainted analysis, etc.

include/require analysis as well as flib-unsugaring. Make it possible

to grab all the files needed to check one file, in a way similar to what gcc does with cpp. Provide a DFS and BFS algo.

0.10 (Q2 2010) (test coverage php)

finish test coverage analysis using and

rank, filter, parallelize (using MPI), cronize.

helpers to write some PHP refactorings

fix parsing (lexer) and unparsing bugs

introduce the transfo field, mimicing part of coccinelle.

improve support for XHP and refactoring, merging tokens heuristic.


static method calls analysis (with self/parent special cases handling)

users_of_class, users_of_define, extenders/implementers of class


fix bugs in lexer, now can parse <?= code


split analyze_php/ in multiple dirs

and moved code from facebook/ to analyze_php/

started to use !

unit tests for parsing, analysis, deadcode, callgraph, xdebug

first work on web gui

extract and modularize php highlighting logic from gtk gui.

started integrate treemap and web gui.

first work on thrift interface to pfff services

used by web ui of acrichton


static arrays lint checks

dead? proto of undeterministic PHP bugs finder using diff and xdebug


phpunit result analysis and parsing


control flow graph analysis: useful for cyclomatic complexity, and potentially useful or far more things (sgrep, dataflow, etc)

dead? start of dataflow analysis

start of coverage analysis (static and dynamic)

start of include_require static analysis (and flib file dependencies too)

dead? start of type unioning

dead? introduce compile_php/

but for now very rudimentary update: not used for now


reorganized json/sexp output, factorize code and use more

0.8 (Q1 2010) (sgrep php, treemap, xdebug dynamic analysis)

xdebug trace parsing, can now do dynamic analysis!

Done for type “inference/extraction” at the beginnning and useful for coverage too!

dead: GUI, trivial type inference feedback based on xdebug info

update: not really used for now, superseded by julien static type inference

sgrep: introducing $V special metavar

dead? introducing parsing_sql/

could be useful at some point for better type checking or type inference update: not used for now


dead: introducing ppp, php pre processor, and implement closure

by source-to-source transformation.

now I can code in PHP :)

improved pretty printer, and helpers for AST transformation

with Used by ppp and closure implemetation.


  • a -emacs flag
  • improved -xhp and made it the default operating mode


  • do fixpoint analysis per file


introducing sgrep_php

a code matcher working at the AST level

introducing treemap viewer using Graphics.mli

update: superseded by cairo-based viewer (but reused most of the algorithms)

treemap algorithms library and basic literate programming manual

dead? introducing code_rank

ref from sebastien bergmann


dead: XHP poor’s man support.

Just have A new -pp option to give opportunity to call a preprocessor (eg ‘xhpize -d’).


a new -json option and json support

also supported in sgrep.



programmer manual for parsing_php/ internals manual for parsing_php/

!!use literate programming method (via noweb/syncweb)!! (hence the special marks in the source)


callgraph for methods (using weak heuristic), with optimisations to scale (partially because use weak heuristic)


0.3 (Q4 2009) (php parser, berkeley db, deadcode)


deadcode analysis v2, v3, v4


IRC support (adapting ocamlirc/) update: not used anymore




deadcode analysis v1


dead: introducing PHP gui (with ocamlgtk/)

update: superseded by codemap, a fancy gui using cairo and gtk


global analysis first draft, PHP database (with ocamlbdb/)

alpha (Nov 2009)

PHP parser first draft!

reused Zend flex/bison code.

visitor (using ocamltarzan)

AST dumper (also using ocamltarzan and lib-sexp)