Workarounds before ruby-core officially supports Proc#to_source (& friends)
Ruby
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 161 commits behind ngty:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
spec
.document
.gitignore
.infinity_test
.rvmrc
HISTORY.txt
LICENSE
README.rdoc
Rakefile
VERSION
sourcify.gemspec

README.rdoc

Sourcify

ParseTree is great, it accesses the runtime AST (abstract syntax tree) and makes it possible to convert any object to ruby code & S-expression, BUT ParseTree doesn't work for 1.9.* & JRuby.

RubyParser is great, and it works for any rubies (of course, not 100% compatible for 1.8.7 & 1.9.* syntax yet), BUT it works only with static code.

I truely enjoy using the above tools, but with my other projects, the absence of ParseTree on the different rubies is forcing me to hand-baked my own solution each time to extract the proc code i need at runtime. This is frustrating, the solution for each of them is never perfect, and i'm reinventing the wheel each time just to address a particular pattern of usage (using regexp kungfu).

Enough is enough, and now we have Sourcify, a unified solution to extract proc code. When ParseTree is available, it simply works as a thin wrapper round it, otherwise, it uses a home-baked ragel-generated scanner to extract the proc code. Further processing with RubyParser & Ruby2Ruby to ensure 100% with ParseTree (yup, there is no denying that i really like ParseTree).

Installing It

The religiously standard way:

$ gem install ParseTree sourcify

Or on 1.9.* or JRuby:

$ gem install ruby_parser file-tail sourcify

Using It

Sourcify adds 3 methods to Proc:

1. Proc#to_source

Returns the code representation of the proc:

require 'sourcify'

lambda { x + y }.to_source
# >> "proc { (x + y) }"

proc { x + y }.to_source
# >> "proc { (x + y) }"

Like it or not, a lambda is represented as a proc when converted to source (exactly the same way as ParseTree). It is possible to only extract the body of the proc by passing in {:strip_enclosure => true}:

lambda { x + y }.to_source(:strip_enclosure => true)
# >> "(x + y)"

lambda {|i| i + 2 }.to_source(:strip_enclosure => true)
# >> "(i + 2)"

2. Proc#to_sexp

Returns the S-expression of the proc:

require 'sourcify'

x = 1
lambda { x + y }.to_sexp
# >> s(:iter,
# >>  s(:call, nil, :proc, s(:arglist)),
# >>   nil,
# >>    s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))

To extract only the body of the proc:

lambda { x + y }.to_sexp(:strip_enclosure => true)
# >> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))

3. Proc#source_location

By default, this is only available on 1.9.*, it is added (as a bonus) to provide consistency under 1.8.*:

# /tmp/test.rb
require 'sourcify'

lambda { x + y }.source_location
# >> ["/tmp/test.rb", 5]

Performance

Performance is embarassing for now, benchmarking results for processing 500 procs (in the ObjectSpace of an average rails project) yiels the following:

ruby                               user       system    total      real
ruby-1.8.7-p299  (w ParseTree)     10.270000  0.010000  10.280000  ( 10.311430)
ruby-1.8.7-p299  (static scanner)  14.120000  0.080000  14.200000  ( 14.283817)
ruby-1.9.1-p376  (static scanner)  17.380000  0.050000  17.430000  ( 17.405966)
jruby-1.5.2      (static scanner)  21.318000  0.000000  21.318000  ( 21.318000)

Since i'm still pretty new to ragel, the code scanner will probably become better & faster as my knowlegde & skills with ragel improve. Also, instead of generating a pure ruby scanner, we can generate native code (eg. C or java, or whatever) instead. As i'm a C & java noob, this will probably take some time to realize.

Gotchas

Nothing beats ParseTree's ability to access the runtime AST, it is a very powerful feature. The scanner-based (static) implementation suffer the following gotchas:

1. The source code is everything

Since static code analysis is involved, the subject code needs to physically exist within a file, meaning Proc#source_location must return the expected *[file, lineno]*, the following will not work:

def test
  eval('lambda { x + y }')
end

test.source_location
# >> ["(eval)", 1]

test.to_source
# >> Sourcify::CannotParseEvalCodeError

The same applies to *Blah#to_proc* & *&:blah*:

klass = Class.new do
  def aa(&block); block ; end
  def bb; 1+2; end
end

klass.new.method(:bb).to_proc.to_source
# >> Sourcify::CannotHandleCreatedOnTheFlyProcError

klass.new.aa(&:bb).to_source
# >> Sourcify::CannotHandleCreatedOnTheFlyProcError

2. Multiple matching procs per line error

Sometimes, we may have multiple procs on a line, Sourcify can handle this as long as the subject proc has arity that is unique from others:

# Yup, this works as expected :)
b1 = lambda {|a| a+1 }; b2 = lambda { 1+2 }
b2.to_source
# >> proc { (1 + 2) }

# Nope, this won't work :(
b1 = lambda { 1+2 }; b2 = lambda { 2+3 }
b2.to_source
# >> raises Sourcify::MultipleMatchingProcsPerLineError

As observed, the above does not work when there are multiple procs having the same arity, on the same line. Furthermore, this bug under 1.8.* affects the accuracy of this approach.

To better narrow down the scanning, try:

  • passing in the {:attached_to => …} option

    x = lambda { proc { :blah } }
    
    x.to_source
    # >> Sourcify::MultipleMatchingProcsPerLineError
    
    x.to_source(:attached_to => :lambda)
    # >> "proc { proc { :blah } }"
  • passing in the {:ignore_nested => …} option

    x = lambda { lambda { :blah } }
    
    x.to_source
    # >> Sourcify::MultipleMatchingProcsPerLineError
    
    x.to_source(:ignore_nested => true)
    # >> "proc { lambda { :blah } }"
  • attaching a body matcher proc

    x, y = lambda { def secret; 1; end }, lambda { :blah }
    
    x.to_source
    # >> Sourcify::MultipleMatchingProcsPerLineError
    
    x.to_source{|body| body =~ /^(.*\W|)def\W/ }
    # >> 'proc { def secret; 1; end }'

Pls refer to the rdoc for more details.

3. Occasional Racc::ParseError

Under the hood, sourcify relies on RubyParser to yield s-expression, and since RubyParser does not yet fully handle 1.8.7 & 1.9.* syntax, you will get a nasty Racc::ParseError when you have any code that is not compatible with 1.8.6.

Is it really working ??

Sourcify spec suite currently passes in the following rubies:

  • MRI-1.8.6 (ParseTree mode ONLY)

  • MRI-1.8.7, REE-1.8.7 (ParseTree & static scanner modes)

  • JRuby-1.5.*, MRI-1.9.1, MRI-1.9.2 (static scanner ONLY)

Besides its own spec suite, sourcify has also been tested to handle:

ObjectSpace.each_object(Proc) {|o| puts o.to_source }

For projects:

(TODO: the more the merrier)

Projects using it

Projects using sourcify include:

Additional Resources

Sourcify is heavily inspired by many ideas gathered from the ruby community:

The sad fact that Proc#to_source wouldn't be available in the near future:

Note on Patches/Pull Requests

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add tests for it. This is important so I don't break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)

  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright © 2010 NgTzeYang. See LICENSE for details.