pad edited this page Apr 2, 2014 · 59 revisions

Syntactical patch


The goal of spatch is to allow programmers to express and perform refactorings while using a syntax they already are familiar with, the patch syntax. For instance to remove everywhere the second argument of a function foo one can write this syntactical patch:

-    ,Y

and then apply it on a codebase with:

$ spatch -f remove_second_arg_foo.spatch *.php


$ find | grep .php | xargs spatch -f remove_second_arg_foo.spatch

This will work even if the function call is splitted on multiple lines or has extra spaces between the comma and the second expression, because spatch works at the abstract syntax tree level, not at the token or string level like patch or sed.

One could also write it as:

// remove_second_arg_foo_alt.spatch
- foo(X,Y)
+ foo(X)

(although it has some caveats as explained in the section about spaces below)

Finally one can also use the "sed mode" of spatch as in:

$ spatch -e 's/foo(X,Y)/foo(X)/' *.php



Most programming languages do not have refactoring tools and when they have, like Java with Eclipse, the programmer is often limited to a restricted set of refactorings such as "dropping an argument", "adding an argument", "move a function". Just like for Sgrep, we want to easily express complex code patterns but also source-to-source transformation on those patterns in a flexible way. Spatch is domain specific language to express such refactorings.


The synopsis is:

$ spatch (-f <spatch_file> | -e <s/before/after/>) [options] <files_or_dirs>

By default spatch generates a diff on stdout. Once you are confident that your syntactical patch is correct, you can then use the --apply-patch to actually modify the relevant files.

The further options are:

[--apply-patch] [--pretty-printer] [-lang <lang>]

There is support for a few programming languages. See Matrix to check for your favourite programming language.


Any Expression, any transformation, any context

One can write any PHP expressions inside a syntactical patch and annotate subparts of it with - and + any way you want.

For instance with this spatch:

- foo(1)
+ foo(2)

we want to replace every calls to foo(1) by foo(2) but only when the call is nested inside a specific kind of calls to f, the ones where the first argument of f is 2.

On this file:

f(2, foo(2));
f(1, foo(1));
f(2, foo(1));

spatch will generate:

$ ./spatch -f tests/php/spatch/foo.spatch tests/php/spatch/foo.php
--- tests/php/spatch/foo.php 2010-11-04 22:58:16.000000000 -0700
+++ /tmp/trans-31284-13ff71.php      2010-11-04 23:12:35.000000000 -0700
@@ -5,8 +5,8 @@

  f(1, foo(1));

- f(2, foo(1));
+ f(2, foo(2));

-  foo(1));
+  foo(2));


Just like for Sgrep, spatch supports metavariables so you can write syntactical patches like:

// remove_second_arg_foo_alt.spatch
- foo(X,Y)
+ foo(X)

You can use metavariables in place of full PHP expressions.

You can also use metavariables for XHP attribute values as in:

-   border=X

See Sgrep#Metavariables for more examples.


The principle of spatch is to take a pattern file, the spatch file, and match it over a source file. By using metavariables we get a more flexible pattern that can accomodate more source files. In the same way even if the spatch file contains extra spaces between tokens, or if an expression is split on multiple lines, it will still match source files using a different indentation style because spatch like sgrep works at the AST level.

See Sgrep#Isomorphisms for a few other tricks done by spatch called isomorphisms which allow the pattern to accomodate more source files

Spacing issues

spatch unfortunately sometimes generates diffs that break the indentation of the original code. For instance on this code:


the application of this spatch file:

- foo(X, Y)
+ bar(X, Y)

will generate this code:

bar(1, 2);

and not:


as one would expect.

The following spatch file on the opposite will perform the right thing:

- foo
+ bar
  (X, Y)

which may seem surprising because both spatch files look equivalent. To understand the difference, one must understand how internally spatch works, how it handles the minus code, plus code and the metavariables.

Here is what spatch internally does given this spatch file:

- foo(X, Y)
+ bar(X, Y)
  • it extracts the sgrep "pattern" from the spatch file by just looking at the minus and contextual lines. A contextual line is a line without any sign (in our case there is no such lines). So here the extracted pattern is foo(X, Y)
  • it annotates the tokens in the pattern with a minus and/or plus sign, to indicate which transformation to perform on the token. Here: [-foo; -(; -X; -,; -Y; -)+"bar(X,Y)"].
  • it then matches the (annotated) pattern on the code, and transfers the annotation (the - and +), on the tokens in the actual code. So on the foo(1,2) example, the tokens in the PHP code will then be [-foo; -(; -1; -,; -2; -)+"bar(1,2)"].
  • it pretty prints the tokens and associated spaces/comments in the original file if the token had no annotation. Otherwise, with a - annotation it does not print the token and with a + annotation it prints the string attached to the +. So here most tokens will be removed and the last parenthesis will be replaced by the string "bar(1,2)".

Here is what spatch internally does with the spatch file below, which should explain why this spatch file is more "space friendly":

- foo
+ bar
  (X, Y)
  • it extracts the sgrep pattern, still 'foo(X, Y)'
  • it annotates the tokens in the pattern, which this time are [-foo+bar; (; X; ,; Y; )]. As you can see only one token has an annotation.
  • it matches the code and transfer the annotation. So on the foo(1,2) example, only the foo token will have an annotation.
  • it pretty prints the tokens and associated spaces/comments in the original file if the token had no annotation, which here is the case for most of the tokens involved, including the token for the comma, which will then have its subsequent newline and tab pretty printed.

So to minimize the number of spacing issues, try to maximize the number of contextual lines in the spatch file, that is lines without any leading -.

Pretty printer

NEW There is a new --pretty-printer option to spatch that will cause spatch to call a pretty printer on the modified code to possibly reindent the code in a nice way (but it currently does not support the whole PHP language).

For instance on this code:

function test1() {
   return foo('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');

and this spatch:

- foo(X)
+ foo(X, 1, 2, 3, 4)

then spatch --pretty-printer -f test.spatch test.php will generate:

--- test.php  2011-11-08 14:26:23.000000000 -0800
+++ /tmp/trans-8024-37a89b.php        2011-11-08 14:26:36.000000000 -0800
@@ -1,5 +1,11 @@

 function test1() {
-  return foo('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
+  return foo(
+    'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',
+    1,
+    2,
+    3,
+    4
+    );

Speeding up things

spatch is significantly slower than tools like sed because it works on a more complex structure than a stream of characters, the abstract syntax tree. Nevertheless you can combine it with git grep piped to xargs to speedup things:

$ git grep -l foo |xargs spatch -f remove_second_arg_foo.spatch


How do I rename a function with a variable number of arguments?

Here is the rename_foo_in_bar.spatch file:

- foo
+ bar

Manual low-level refactoring

If the syntactical patch notation is not expressive enough for your refactoring needs, you can still express the refactoring by using the internal pfff API that works on the ASTs of the source code.

Here is the content of pfff/demos/ which explains how to use the internal OCaml pfff API to perform a simple refactoring:

Future work

Just Like for sgrep, generalizing spatch patterns to the full PHP language, not just PHP expressions, so one can refactor class definitions, function headers, statements, etc.

Related work

spatch is a continuation of the work I've done on coccinelle, an advanced refactoring tool for C I co-designed with Julia Lawall.

Related tools: