Skip to content
This repository has been archived by the owner on Jun 4, 2019. It is now read-only.
Michael Mior edited this page May 22, 2015 · 53 revisions

The goal of sgrep is to allow programmers to express complex code patterns while using a syntax they already are familiar with. For instance to find all comparisons of strstr to false one can simply write:

$ sgrep -e 'strstr(...) == false' foo.php

or:

$ sgrep -e 'strstr(...) == false' <directory>

to process all PHP files recursively under <directory>.

This will work even if the expression is split across multiple lines in the PHP files or has extra spaces between == and false, because sgrep works at the abstract syntax tree level, not at the token or string level like grep.

See also Spatch to not only match but also transform code patterns.

See https://github.com/facebook/pfff/blob/master/main_sgrep.ml

A current solution when one wants to find code is to use grep. It is fine when the pattern is simple, such as the name of a function, but very tedious to use when one wants to find certain kinds of calls. For instance to find all calls to foo where the second argument is 1 one could write the foo(.*, 1, .*) regexp but this would not handle function calls split across multiple lines, or using different amount of space, or function having nested function calls as arguments. Working at a string-level is not the adequate level. With sgrep one can simply do:

$ sgrep -e 'foo(X, 1, ...)' *.php

Another solution would be to use a compiler frontend and write a visitor on the abstract syntax tree that recognizes the complex pattern. Unfortunately this is also tedious to write as a compiler frontend is usually a large software and the abstract syntax tree is a complex structure.

The idea of sgrep is to mix the convenience of grep with the correctness and precision of a compiler frontend.

Note that sgrep is for very precise matching. Most of the time you would be fine with grep, but for the few occasions where you need precise matching, then sgrep can be a useful "refiner".

The synopsis is:

$ sgrep [-lang <lang>] [ -pvar <var> ] -e <pattern>  <files_or_dirs>

For instance to find certain patterns of use of strstr, do:

$ sgrep -e 'strstr(...) == false' *.php

There is support for a few programming languages. See Matrix to check for your favourite programming language.

You can use metavariables that match any expression:

$ sgrep -e 'foo(X)'   *.php

This will match code such as foo(1+1).

The metavariable has to be a single upper case letter (so that you can also match regular constants if you want, hoping nobody use constants with a single letter) and optionally followed by a number and some optional _ whatever, e.g. X, X1, Y_ILOVEPUPPIES.

NEW: You can use multiple times the same metavariable in which case the pattern will match only if all the occurences of the metavariable have the same value. For instance:

$ sgrep -e 'X && X' *.php

will find all binary And operations where both operands are the same (which is usually buggy code).

NEW: You can also use the -pvar flag of sgrep to print not the matched code but the matched metavariables. For instance:

$ sgrep -pvar X -e 'X && X' *.php

will print the content of the matched metavariable X.

If you want to match function calls with a PHP variable, use $X, $Y, or any uppercase letter as in the previous section as in:

$ sgrep -e 'foo($X)' *.php

this will match foo($a), foo($b), but not foo(1) !! Use instead foo(X) to match any expression.

You can also use metavariables for XHP attribute values as in:

$ sgrep -e  '<ui:section-header border=X></ui:section-header>' *.php

You can also use '...' in arguments at the end to say you dont care about the other args as in:

$ sgrep -e 'foo(1, ...)' *.php

this will match foo(1), foo(1,2), etc.

NEW This also works in array expression as in array(...).

You can use "..." in a pattern to say you want to match only constant strings as in:

$ sgrep -e 'foo("...")' *.php

This with match foo("foo"), foo(""), but not foo(1).

NEW You can also bind metavariable to string content as in foo("X"). Because PHP has no first-class function or class, it's quite common to pass around function or class name via strings. The metavariable X above is then binded to the content of the string without the quote, so it can be used another time to match a class name.

NEW You can use '=~/.../ in a pattern to say you want regexp matching (using the Perl regexp syntax) as in:

$ sgrep -e 'foo("=~/^cst/")' *.php

The principle of sgrep is to take a pattern and match it over a source file. By using metavariables we get a more flexible pattern that can accomodate more source files. In the same way even if the pattern contains extra spaces between tokens, or if an expression is split on multiple lines, it will still match source files using a different indentation style because sgrep works at the AST level.

Here are a few other tricks done by sgrep called isomorphisms which allow the pattern to accomodate more source files:

NEW: People abuse assignements in PHP to mimic keyword argument passing as in Smalltalk. sgrep can handle such equivalence/isomorphism:

$ sgrep -e 'foo(true)' *.php

will match foo(true) as well as foo($x=true).

In XHP, attributes can be given in any order but we actually don't care about that order. When we write a pattern like:

<x:frag border="1" foo="2"></x:frag>

we want it to match even code like:

<x:frag foo="2" border="1"></x:frag>

Actually we also want it by default to match code like:

<x:frag foo="2" bar="3" border="1"></x:frag>

or code like:

<x:frag foo="2" border="1" foobar="3">this is a body</x:frag>

To accomodate those needs the sgrep code matching engine has hardcoded a few equivalences (isomorphisms) regarding XHP.

You can write any expression as a pattern e.g.:

$ sgrep -e '1+X'  *.php
$ sgrep -e 'foo(bar(foobar(X, ...), ..., 2, "large", $X, $Y)))' *.php

Here are example to find bugs:

$ sgrep -e 'strstr(...) == false'
$ sgrep -e 'fbt($X)'
$ sgrep -e 'fbt(X . $Y)'
$ sgrep -e 'fbt($X . Y)'

See also https://github.com/facebook/pfff/blob/master/lang_php/matcher/unit_matcher_php.ml for some unit tests showing the capabilities of sgrep.

sgrep is significantly slower than grep because it works on a more complex structure than a stream of characters, the abstract syntax tree. Nevertheless you can combine it with git grep piped to xargs to speedup things:

$ git grep -l foo |xargs sgrep -e 'foo(X, "large", ...)'

Look at pfff/editor/emacs/sgrep.el.

If the syntactical grep notation is not expressive enough for your search needs, you can try to express your match by using the internal pfff API that works on the ASTs of the source code.

One difficulty is to find which OCaml constructor corresponds to which PHP construct (which was one of the main motivation behind sgrep). To alleviate the problem the pfff command line tool has a flag, -dump_php, that allows to output on stdout the internal representation of a program. This output can then be copy pasted directly into a .ml file; it is a valid OCaml pattern. Here is an example:

$ pfff -dump_php demos/foo.php
[FuncDef(
 {f_tok: i_2; f_ref: None; f_name: Name(("foo", i_3)); f_params:
  (i_4,
   [Left(
...

See https://github.com/facebook/pfff/blob/master/demos/simple_code_search.ml for a full example.

See also https://github.com/facebook/pfff/blob/master/lang_php/analyze/foundation/include_require_php.ml for complex code patterns copy pasted from pfff -dump_php.

Use an expression metavariable, as in:

$ sgrep -e 'X->addPreparable(...)'

You can not find method calls with sgrep -e 'addPreparable(...)' because this will be parsed as a function call, not a method call. To match methods you need to use the -> syntax and so find something on the left of the arrow.

Because sgrep works at the Abstract Syntax Tree level where a function call is considered something different than an object instantiation.

Allow more complex patterns, allow to match over statements, not just expressions, or functions, or classes.