Kyle Stevenson edited this page Oct 21, 2013 · 44 revisions


For now only PHP has support for all the following features. The other programming languages have only partial support (see Matrix).


  • lexer/parser generating a concrete syntax tree (CST) with comments, spaces, and preprocessor directives information easily accessible. This CST is thus source-to-source transformation friendly. It also helps when displaying code in Codemap.
  • visitor, usually autogenerated using https://github.com/aryx/ocamltarzan
  • concrete syntax tree pretty printer, also autogenerated, useful to debug and learn which OCaml constructor corresponds to which construction (see also Sgrep#manual-low-level-matching).
  • unparser
  • basic source-to-source transformation support: parse, annotate programmatically tokens and their 'transfo' field, unparse (see also Spatch#manual-low-level-refactoring).

Basic analysis

Global analysis

  • persistent storage (using berkeley DB or a single marshalled or JSON file) to scale to thousands of files and million lines of code
  • function callgraph
  • def/use of entities (functions, classes, constants)
  • SEMI method callgraph (handling self/parent, and ambiguities because of imprecise analysis)
  • include analysis (can require to define an environment containing global settings to a project)
  • builtins analysis
  • TODO class analysis (requires dataflow in general)
  • TODO type inference (requires simplified AST and dataflow)
  • TODO interprocedural bottom-up analysis
  • TODO summary-based interprocedural analysis
  • TODO context sensitive interprocedural analysis (k-CFA)
  • emacs TAGS file generator

Static analysis

  • dead functions and classes reaper
  • code_rank, useful to understand the core functions of a project (but not than better than a simple number_of_callers analysis when want to understand the important functions in a API; so complementary)
  • cyclomatic complexity
  • reaching definitions (using data flow analysis)
  • liveness (using data flow analysis)
  • TODO tainted analysis (hard, requires interprocedural analysis)
  • static test coverage (can be complentary to dynamic test coverage by showing which functions are not directly tested)
  • test_rank

Dynamic analysis

  • tracing/profling (using xhprof and xdebug)
  • test coverage (requires tracing)
  • underministic bug finder (using trace and diff)


  • unused entity (deadcode, sometimes the sign of a bug)
  • undefined entity (function, class, constant)
  • unused variable (often because of a typo)
  • use of undefined variable
  • dead statements (using control flow graph)
  • TODO dead assignements (using data flow liveness)
  • TODO undefined field/method (requires class analysis for full generality, but can at least be done easily when in class context with use of 'this')
  • TODO typing error (requires type inference, obviously)


  • expression matcher
  • metavariables
  • '...' in function calls
  • linear patterns
  • metavariables printing
  • basic isomorphisms (e.g don't care abour order of fields in struct definitions, xhp attributes, etc)
  • TODO typed metavariables
  • SEMI statement matcher
  • TODO full language matcher
  • TODO '...' between statements or entities


  • syntactical sed for expressions
  • +/- on any PHP constructions (a la coccinelle) in any expression context
  • SEMI support not just PHP expressions but also statements function header, class header, etc
  • TODO typed metavariables


Google maps on source code

Code overview using treemap + thumbnails. "A Google Maps on source code". Can see both the macro and micro organization of a project.

  • architecture/category of a file using color scheme
  • labels with different size and alpha value
  • thumbnail source view
  • summary of important entities of a file in the thumbnail source view
  • TODO anamorphic representation using importance of a file instead of size of file where importance can be number of callers/users or code_rank like metric


Layers a la google earth with flexible color scheme:

  • deadcode layer (using static information)
  • liveness layer (using dynamic information from profilers)
  • code coverage layer (when have unit tests to run and trace)
  • bugs/bad smell layer (using scheck)
  • cycomatic complexity layer
  • age layer (using git)
  • number of authors layer (using git)
  • TODO code_rank layer
  • TODO test_rank layer
  • TODO distance_to_bottom layer

Code search and navigation

  • entity search, with completion (with integration for builtins too)
  • file/dir search
  • code navigation, directory/file navigation
  • multi-dir navigation, to allow for instance to multi-navigate on a project with the same directory name spreaded in many places (e.g. include/project, test/project, ui/core/project, etc)
  • code<->covering_test integration
  • code<->PLEAC integration
  • TOREPUT minimap
  • TODO smooth zoom (a la google maps)
  • TODO callers/callees navigation

Visual grep

  • micro-level visual grep
  • macro-level visual grep
  • TODO sgrep integration
  • TODO flexible color scheme with sgrep poweful patterns

Editor integration

Editor integration with line-precision file opening.

  • emacs
  • TODO: other

Advanced source code highlighting

  • token-based source highlighting
  • identifier highlighting; usually requires parsing to disambiguate the multiple uses of identifiers in a program (for classes, functions, parameters, locals, types, etc)
  • varying font size depending on class of the token
  • local/global/parameter highlighting
  • semantic highlighting, visual feedback on importance of entities (anamorphic entities)
  • bad smell highlighting (use of globals, call by reference, see below)
Visual feedback on bad smell
  • dangerous functions
  • dangerous patterns
  • use of globals
  • function calls with variable passed by ref
  • TODO dead code
  • TODO test coverage
  • TODO test_rank

Visual debugger

  • TODO dynamic trace highlighting (use layer)
  • TODO trace browsing
  • TODO aspect layer + trace

API finder

  • builtins integration
  • code<->test navigation


  • integration with version control system (git)
  • SEMI integration with documentation (e.g. manual describing builtins)
  • TODO integration with mailing list
  • preprocessor
  • closure by source-to-source transformation