Permalink
Find file
Fetching contributors…
Cannot retrieve contributors at this time
247 lines (195 sloc) 11.9 KB

With, Unhygienic, and Call-By-Name Semantics in Rewrite

Rewrite is a gem that adds code-rewriting to the Ruby programming language. Recently, Caleb Clausen announced RubyMacros. We've had some discussion about the differences between the two projects on the Ruby Forum list and in emails: Caleb has graciously given me permission to repeat some of our discussion here.

Note well that I will say things like "In rewrite, you can do X" and also things like "In rewrite, Y is the case." These statements do not imply that you cannot do X in RubyMacros, nor do they imply that Y is not the case in RubyMacros. They are simply statements about rewrite.

what's with 'with' in rewrite?

Rewrite does no rewrite all of the code in your project out-of-the-box. Instead, you supply code you wish to be rewritten in a block to the with method, indicating which "rewrites" you wish to apply. In this example, we are taking a block of code and applying the andand rewrite to it:

with(andand) do
    ...
    first_name = Person.find_by_last_name('Braithwaite').andand.first_name
    ...
end

You have probably figured this out from the example, but rewrites are first-class Ruby objects. You can define them in local variables like this:

andand = Rewrite::ByExample::Unhygienic.
  from(:receiver, :message, [:parameters]) {
    receiver.andand.message(parameters)
  }.to {
    lambda { |andand_temp|
      andand_temp.message(parameters) if andand_temp
    }.call(receiver)
  }

Or you can define them inline, just as Ruby lambdas and blocks are equivalent to functions defined inline:

with(
    Rewrite::ByExample::Unhygienic.
      from(:receiver, :message, [:parameters]) {
        receiver.andand.message(parameters)
      }.to {
        lambda { |andand_temp|
          andand_temp.message(parameters) if andand_temp
        }.call(receiver)
      }
) do
    ...
    first_name = Person.find_by_last_name('Braithwaite').andand.first_name
    ...
end

Or you can use some of the built-in rewriters from rewrite's prelude:

include Rewrite::Prelude

with(please, try) do
    # ...
    @phone = Location.find(:first, ...elided... ).try(:phone)
    # ...
    @area_code = @phone.please.area_code
    # ...
end

with is a deliberate design choice. The idea is that you can explicitly state what is to be rewritten and how it is to be rewritten, using Ruby in a Ruby-like way. Of course, some people like magic, and if you look at Rails, the initializers and environment.rb file allow you to sprinkle magic throughout your project implicitly. My feeling when I designed rewrite was that that if I started with explicit "with," it would easy to build implicit into a project or framework later.

And yes, with can accept a list of rewrites and it can be nested.

what is the difference between unhygienic and called_by_name?

Rewrite provides a facility for code rewriting, which is one level above unhygienic macros. A traditional unhygienic macro is a way of saying "when you see something that looks like a method call, replace it with the following code, performing substitutions here and here and here." Rewrite supports this as well as a number of other arbitrary rewriting rules.

Rewrite is very low-level. It uses these ridiculous s-expressions generated by a gem called ParseTree. That is not ParseTree's fault, ParseTree is giving us the very implementation-specific AST that MRI 1.8.x produces. Other Ruby implementations will have different trees. RubyMacros uses its own AST format.

Here's an example of rewrite working directly with s-expressions. It's an excerpt from try.rb:

def process_call(exp)
  # [:call, [:dvar, :foo], :try, [:array, [:lit, :bar]]]
  exp.shift
  # [[:dvar, :foo], :try, [:array, [:lit, :bar]]]]
  receiver_sexp = exp.first
  if exp[1] == :try
    message_expression = exp[2][1]
    exp.clear
    s(:call, 
      s(:iter, 
        s(:fcall, :lambda), 
        s(:masgn,
          s(:array,
            s(:dasgn_curr, :receiver),
            s(:dasgn_curr, :message)
          )
        ), 
        s(:if, 
          s(:call, s(:dvar, :receiver), :respond_to?, s(:array, s(:dvar, :receiver))), 
          s(:call, s(:dvar, :receiver), :send, s(:array, s(:dvar, :message))),
          s(:nil)
        )
      ), 
      :call, 
      s(:array, 
        process_inner_expr(receiver_sexp), # [:dvar, :foo]
        process_inner_expr(message_expression)
      )
    )
  else
    # pass through
    begin
      s(:call,
        *(exp.map { |inner| process_inner_expr inner })
      )
    ensure
      exp.clear
    end
  end
end

Lovely stuff, that.

Unhygienic and called_by_name are both a level above that kind of direct manipulation of the Abstract Syntax Tree. They both work by defining rewrites in Ruby code, and of course they do it in different ways. So, Rewrite provides a low-level, implementation-specific way to rewrite code. Unhygienic and called_by_name are built on top of rewrite and provide a higher level of abstraction.

unhygienic

Unhygienic defines something like a simple search-and-replace. You define a from and a to, specifying which pieces of the from are variables. For example, defining something like && using Unhygienic is:

Unhygienic.from(:x, :y) {
    our_and(x, y)
}.to {
  if temp = x
      y
  else
      temp
  end
}

And we could use it like this:

with(
    Unhygienic.from(:x, :y) {
        our_and(x, y)
    }.to {
      if temp = x
          y
      else
          temp
      end
    }
) do
    # ...
    our_and(MyActiveRecordModel.find(:first, ...), something_something())
    #...
end

And you will get:

begin
  # ...
  if temp = MyActiveRecordModel.find(:first, ...)
      something_something()
  else
      temp
  end
    # ...
end

Unhygienic is very literal, so it will always call the temporary variable temp. In Ruby 1.8, this is a problem. Also, it replaces x and y with any expression you put in, so if you use one of these variables twice, you can have interesting issues if the expression generates side effects or is computationally expensive.

That's why the example above uses temp. Had we written it as:

Unhygienic.from(:x, :y) {
    our_and(x, y)
}.to {
  if x
      y
  else
      x
  end
}

Then we would hit the database twice whenever we wrote something like our_and(MyActiveRecordModel.find(:first, ...), something_something()). Note also the scoping issues in Ruby 1.8: temp will interfere with any other variable named temp. Now you know why it is called unhygienic.

called_by_name

called_by_name is a little more complicated that a simple unhygienic rewrite: called_by_name actually defines a lambda that you use. When you write:

with(
    called_by_name(:our_and) { |x,y|
        if temp = x
            y
        else
            temp
        end
    }
) do
    # ...
    our_and(MyActiveRecordModel.find(:first, ...), something_something())
    #...
end

You get:

lambda do |our_and|
    # ...
    our_and.call(
        lambda { MyActiveRecordModel.find(:first, ...) }, lambda { something_something() })
    #...
end.call(
    lambda do |x,y|
        if temp = x.call
            y.call
        else
            temp
        end
    end
)

What just happened is that our_and is defined as a lambda, with called_by_name doing some jigger_pokery to turn the expressions you provide into thunks. This implements call-by-name semantics for Ruby lambdas. And as a bonus, you can get rid of the annoying .call method invocation.

There are some important implications of this approach. First, with unhygienic, our_and disappears. There is no our_and function or method, invocations are replaced by whatever to expression you provide. Whereas, called_by_name actually defines a lambda for our_and and defines it in scope for our block of code.

Note: Caleb asked about the fetish for lambdas. In Ruby 1.8 MRI, it makes no difference. In other implementations, these constructs will by hygienic. Yet, these implementations do not exist yet. So, either I don't understand YAGNI, or perhaps I have spent too much time with languages like JavaScript that actually get this right, or I am from the future and I know that Ruby will get this right.

Second, even though temp is still around and still could shadow some variable where it is defined, it doesn't shadow any definition inside your block. So you could define rewriters with called_by_name at the top level or inside of a method somewhere and be assured that you are making 100% hygienic code.

Caleb pointed out that called_by_name is less powerful than full-blown macros. True and false. It is less powerful than macros, plural. In that you can use macros to define a called_by_name macro that you could then use on your code. And there are things you can do with a macro that you obviously cannot do with called_by_name, because called_by_name does a very specific transformation (defining a lambda and transforming parameters into thunks).

However, called_by_name cannot be replicated using a single macro, because it needs to do one transformation on the entire block and then another on each invocation. If I were implementing it with RubyMacros, I would write a macro-writing macro, in the tradition of Paul Graham's On Lisp.

Although rewrite can do a lot more than called_by_name, I have found that most of what I want to accomplish with macros works surprisingly well with call-by-name semantics. I sincerely think that if call-by-name semantics were an option throughout Ruby, including with method calls and block invocations, the language would become ridiculously powerful.

As a example, things like andand become trivial if you have called-by-name semantics for method calls. YMMV.


My recent work:

JavaScript AllongéCoffeeScript RistrettoKestrels, Quirky Birds, and Hopeless Egocentricity


(Spot a bug or a spelling mistake? This is a Github repo, fork it and send me a pull request!)

Reg Braithwaite | @raganwald