Skip to content

GettEngineering/sourcify

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sourcify

ParseTree is great, it accesses the runtime AST (abstract syntax tree) and makes it possible to convert any object to ruby code & S-expression, BUT ParseTree doesn’t work for 1.9.* & JRuby.

RubyParser is great, and it works for any rubies (of course, not 100% compatible for 1.8.7 & 1.9.* syntax yet), BUT it works only with static code.

I truely enjoy using the above tools, but with my other projects, the absence of ParseTree on the different rubies is forcing me to hand-baked my own solution each time to extract the proc code i need at runtime. This is frustrating, the solution for each of them is never perfect, and i’m reinventing the wheel each time just to address a particular pattern of usage (using regexp kungfu).

Enough is enough, and now we have Sourcify, a unified solution to extract proc code. When ParseTree is available, it simply works as a thin wrapper round it, otherwise, it uses a home-baked ragel-generated scanner to extract the proc code. Further processing with RubyParser & Ruby2Ruby to ensure 100% with ParseTree (yup, there is no denying that i really like ParseTree).

Installing It

The religiously standard way:

$ gem install ParseTree sourcify

Or on 1.9.* or JRuby:

$ gem install ruby_parser file-tail sourcify

Sourcify adds 4 methods to Proc

1. Proc#to_source

Returns the code representation of the proc:

require 'sourcify'

lambda { x + y }.to_source
# >> "proc { (x + y) }"

proc { x + y }.to_source
# >> "proc { (x + y) }"

Like it or not, a lambda is represented as a proc when converted to source (exactly the same way as ParseTree). It is possible to only extract the body of the proc by passing in {:strip_enclosure => true}:

lambda { x + y }.to_source(:strip_enclosure => true)
# >> "(x + y)"

lambda {|i| i + 2 }.to_source(:strip_enclosure => true)
# >> "(i + 2)"

2. Proc#to_sexp

Returns the S-expression of the proc:

require 'sourcify'

x = 1
lambda { x + y }.to_sexp
# >> s(:iter,
# >>  s(:call, nil, :proc, s(:arglist)),
# >>   nil,
# >>    s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))

To extract only the body of the proc:

lambda { x + y }.to_sexp(:strip_enclosure => true)
# >> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))

3. Proc#to_raw_source

Unlike Proc#to_source, which returns code that retains only functional aspects, fetching of raw source returns the raw code enclosed within the proc, including fluff like comments:

lambda do |i|
  i+1 # (blah)
end.to_raw_source
# >> "proc do |i|
# >>   i+1 # (blah)
# >> end"

NOTE: This is extracting of raw code, it relies on static code scanning (even when running in ParseTree mode), the gotchas for static code scanning always apply.

4. Proc#source_location

By default, this is only available on 1.9.*, it is added (as a bonus) to provide consistency under 1.8.*:

# /tmp/test.rb
require 'sourcify'

lambda { x + y }.source_location
# >> ["/tmp/test.rb", 5]

Sourcify adds 3 methods to Method

IMPORTANT: These only work for MRI-1.9.2, as currently, only it supports (1) discovering of the original source location with Method#source_location, and (2) reliably determinig a method’s parameters with Method#parameters. Attempting to use these methods on other rubies will raise Sourcify::PlatformNotSupportedError.

NOTE: The following works for methods defined using both def .. end & Module#define_method. However, when a method is defined using the later approach, sourcify uses Proc#to_source to handle the processing, thus, the usual gotchas related to proc source extraction apply.

1. Method#to_source

Returns the code representation of the method:

require 'sourcify'

class MyMath
  def self.sum(x, y)
    x + y # (blah)
  end
end

MyMath.method(:sum).to_source
# >> "def sum(x, y)
# >>   (x + y)
# >> end"

Just like the Proc#to_source equivalent, u can set :strip_enclosure => true to extract only the body within.

2. Method#to_sexp

Returns the S-expression of the method:

require 'sourcify'

class MyMath
  def self.sum(x, y)
    x + y # (blah)
  end
end

MyMath.method(:sum).to_sexp
>> s(:defn,
>>  :sum,
>>  s(:args, :x, :y),
>>  s(:scope, s(:block, s(:call, s(:lvar, :x), :+, s(:arglist, s(:lvar, :y))))))

Just like the Proc#to_sexp equivalent, u can set :strip_enclosure => true to extract only the body within.

3. Method#to_raw_source

Unlike Method#to_source, which returns code that retains only functional aspects, fetching of raw source returns the method’s raw code, including fluff like comments:

require 'sourcify'

class MyMath
  def self.sum(x, y)
    x + y # (blah)
  end
end

MyMath.method(:sum).to_raw_source
# >> "def sum(x, y)
# >>   x + y # (blah)
# >> end"

Just like the Proc#to_raw_source equivalent, u can set :strip_enclosure => true to extract only the body within.

Performance

Performance is embarassing for now, benchmarking results for processing 500 procs (in the ObjectSpace of an average rails project) yiels the following:

ruby                               user       system    total      real
ruby-1.8.7-p299  (w ParseTree)     10.270000  0.010000  10.280000  ( 10.311430)
ruby-1.8.7-p299  (static scanner)  14.120000  0.080000  14.200000  ( 14.283817)
ruby-1.9.1-p376  (static scanner)  17.380000  0.050000  17.430000  ( 17.405966)
jruby-1.5.2      (static scanner)  21.318000  0.000000  21.318000  ( 21.318000)

Since i’m still pretty new to ragel, the code scanner will probably become better & faster as my knowlegde & skills with ragel improve. Also, instead of generating a pure ruby scanner, we can generate native code (eg. C or java, or whatever) instead. As i’m a C & java noob, this will probably take some time to realize.

Gotchas

Nothing beats ParseTree’s ability to access the runtime AST, it is a very powerful feature. The scanner-based (static) implementation suffer the following gotchas:

1. The source code is everything

Since static code analysis is involved, the subject code needs to physically exist within a file, meaning Proc#source_location must return the expected *[file, lineno]*, the following will not work:

def test
  eval('lambda { x + y }')
end

test.source_location
# >> ["(eval)", 1]

test.to_source
# >> Sourcify::CannotParseEvalCodeError

The same applies to *Blah#to_proc* & *&:blah*:

klass = Class.new do
  def aa(&block); block ; end
  def bb; 1+2; end
end

klass.new.method(:bb).to_proc.to_source
# >> Sourcify::CannotHandleCreatedOnTheFlyProcError

klass.new.aa(&:bb).to_source
# >> Sourcify::CannotHandleCreatedOnTheFlyProcError

2. Multiple matching procs per line error

Sometimes, we may have multiple procs on a line, Sourcify can handle this as long as the subject proc has arity that is unique from others:

# Yup, this works as expected :)
b1 = lambda {|a| a+1 }; b2 = lambda { 1+2 }
b2.to_source
# >> proc { (1 + 2) }

# Nope, this won't work :(
b1 = lambda { 1+2 }; b2 = lambda { 2+3 }
b2.to_source
# >> raises Sourcify::MultipleMatchingProcsPerLineError

As observed, the above does not work when there are multiple procs having the same arity, on the same line. Furthermore, this bug under 1.8.* affects the accuracy of this approach.

To better narrow down the scanning, try:

  • passing in the {:attached_to => …} option

    x = lambda { proc { :blah } }
    
    x.to_source
    # >> Sourcify::MultipleMatchingProcsPerLineError
    
    x.to_source(:attached_to => :lambda)
    # >> "proc { proc { :blah } }"
    
  • passing in the {:ignore_nested => …} option

    x = lambda { lambda { :blah } }
    
    x.to_source
    # >> Sourcify::MultipleMatchingProcsPerLineError
    
    x.to_source(:ignore_nested => true)
    # >> "proc { lambda { :blah } }"
    
  • attaching a body matcher proc

    x, y = lambda { def secret; 1; end }, lambda { :blah }
    
    x.to_source
    # >> Sourcify::MultipleMatchingProcsPerLineError
    
    x.to_source{|body| body =~ /^(.*\W|)def\W/ }
    # >> 'proc { def secret; 1; end }'
    

Pls refer to the rdoc for more details.

3. Occasional Racc::ParseError

Under the hood, sourcify relies on RubyParser to yield s-expression, and since RubyParser does not yet fully handle 1.8.7 & 1.9.* syntax, you will get a nasty Racc::ParseError when you have any code that is not compatible with 1.8.6.

4. Lambda operator doesn’t work

When a lambda has been created using the lambda operator “->”, sourcify can’t handle it:

x = ->{ :blah }
x.to_source
# >> Sourcify::NoMatchingProcError

Is it really working ??

Sourcify spec suite currently passes in the following rubies:

  • MRI-1.8.*, REE-1.8.7 (both ParseTree & static scanner modes)

  • JRuby-1.6.*, MRI-1.9.* (static scanner ONLY)

Besides its own spec suite, sourcify has also been tested to handle:

ObjectSpace.each_object(Proc) {|o| puts o.to_source }

For projects:

(TODO: the more the merrier)

Projects using it

Projects using sourcify include:

Additional Resources

Sourcify is heavily inspired by many ideas gathered from the ruby community:

The sad fact that Proc#to_source wouldn’t be available in the near future:

Note on Patches/Pull Requests

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add tests for it. This is important so I don’t break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)

  • Send me a pull request. Bonus points for topic branches.

Copyright © 2010 NgTzeYang. See LICENSE for details.

About

Workarounds before ruby-core officially supports Proc#to_source (& friends)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published