Segmentation Fault during rails assets:precompile #179

Closed
edlebert opened this Issue Mar 13, 2013 · 13 comments

Comments

Projects
None yet
5 participants

This past week I've been noticing random segmentation faults during the asset:precompile stage of my heroku deploys. I don't yet know the underlying cause, but I know that if I freeze tilt at 1.3.3, the problem goes away. I couldn't find anyone else having the same problem, then I realized that 1.3.4 was cut just a couple weeks ago.

I'm using ruby 2.0.0p0, and a few asset gems:

group :assets do
  gem 'jquery-rails'
  gem 'sass-rails'
  gem 'font-awesome-sass-rails' 
  gem 'bootstrap-sass', '~> 2.3'
  gem 'bourbon'
  gem 'neat'
  gem 'uglifier'
  gem 'coffee-script'
end

I get random segmentation faults during rake assets:precompile that look something like this:

/Users/edlebert/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/uglifier-1.3.0/lib/uglifier.rb:65: [BUG] Segmentation fault
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin12.2.0]

c:0050 p:0055 s:0230 e:000225 METHOD /Users/edlebert/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/uglifier-1.3.0/lib/uglifier.rb:65
c:0049 p:0011 s:0221 e:000220 METHOD /Users/edlebert/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/actionpack-3.2.12/lib/sprockets/compressors.rb:74
c:0048 p:0010 s:0217 e:000216 BLOCK  /Users/edlebert/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/sprockets-2.2.2/lib/sprockets/processing.rb:265 [FINISH]
c:0047 p:---- s:0213 e:000212 CFUNC  :call
c:0046 p:0016 s:0208 e:000207 METHOD /Users/edlebert/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/sprockets-2.2.2/lib/sprockets/processor.rb:29
c:0045 p:0034 s:0203 e:000202 METHOD /Users/edlebert/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/tilt-1.3.4/lib/tilt/template.rb:77
c:0044 p:0025 s:0197 E:000c50 BLOCK  /Users/edlebert/.rbenv/versions/2.0.0-p0/lib/ruby/gems/2.0.0/gems/sprockets-2.2.2/lib/sprockets/context.rb:193 [FINISH]

I know this isn't a very helpful issue report, but I'm hoping more people will run into this problem and find this issue.

Collaborator

judofyr commented Mar 14, 2013

This is very interesting.

We got a segfault on Travis on 2.0.0p0 here: https://travis-ci.org/rtomayko/tilt/jobs/5479138

But a few minutes later (after I pushed CHANGELOG change) it's green: https://travis-ci.org/rtomayko/tilt/jobs/5479184

uglifier.rb:65 doesn't make me any wiser either.

Collaborator

judofyr commented Mar 14, 2013

@edlebert Could you dump the rest of the report (with the C level backtrace)?

Collaborator

judofyr commented Mar 14, 2013

The reason why this could have started in Tilt 1.3.4 was because we started always compiling templates to a method. In 1.3.3 we did a plain instance_eval first and then invoked the compiled method on subsequent renderings, while in 1.3.4 we never instance_eval. I'd guess that even in 1.3.3 it might be able to get the segfault (if e.g. it's possible to trigger while your Sinatra runs).

@judofyr judofyr closed this Mar 14, 2013

@judofyr judofyr reopened this Mar 14, 2013

Collaborator

judofyr commented Mar 15, 2013

I've opened an issue over at bugs.ruby-lang.org related to the other segfault: http://bugs.ruby-lang.org/issues/8100

@ujifgc ujifgc referenced this issue in padrino/padrino-framework Mar 18, 2013

Closed

[Helpers] [BUG] Segmentation fault #1131

I did a bunch of trial/error work on this and I found out it only happened if I did some file IO during javascript precompiling on rails. Specifically, I was exporting a bunch of ActiveSupport::TimeZone data to javascript. I found that even if I had a js.erb file that contained only this line (note that no javascript code is actually being created here), I would get the segmentation fault during rake assets:precompile.:

<% ActiveSupport::TimeZone.all %>

As a workaround, I had a hunch that if I pre-loaded the timezone data into memory before the asset precompile, I wouldn't get a segfault. So I created a rails initializer that simply pre-reads all the timezone data:

ActiveSupport::TimeZone.all

And presto, no more segmentation faults :)

Collaborator

rkh commented Mar 19, 2013

Might be related to the segfault we sometimes seen in sinatra-contrib's content_for implementation on 2.0. @zzak is investigating.

Wardrop commented Mar 22, 2013

Below is the smallest and most isolated script I've so far been able to produce that suffers from the segfault. I've found that as long as Tilt is rendering anything, it can segfault. I've observed that the more you do during rendering (e.g. loading files), the higher the odds of a segfault.

require 'tilt'
run(proc do
  body = Tilt['str'].new{'Hello world #{test = ["1", "2"] + ["3"] }'}.render
  [200, {}, [body]]
end)

Segfaults are very rare under this test though due to the simplicity of the the script; very little happens while rendering. Out of 1,162,202 HTTP requests at ~500 requests per second, I got 37 segfaults. I've done the same test at 1,003,356 requests without Tilt, just using ERB, and didn't get any segfaults. Only a few timeouts.

So it does indeed seem to be related to something Tilt is doing. It also appears to be time bound. If I throw 40 threads at the problem, even though I can push through more requests per unit of time, it seems to give the same segfault rate. Weird. I'm not sure how to proceed narrowing it down further. I could spend hours stripping down Tilt, but it may not reveal anything.

Collaborator

judofyr commented Mar 22, 2013

How are you running the test? Can you show us the config.ru? What server do you use?

Magnus Holm

On Fri, Mar 22, 2013 at 7:28 AM, Tom Wardrop notifications@github.com
wrote:

Below the smallest and most isolated script I've so far been able to produce, which suffers from the segfault. I've found that as long as Tilt is rendering anything, it can segfault. I've observed that the more you do during rendering (e.g. loading files), the higher the odds of a segfault.
require 'tilt'
run(proc do
body = Tilt['str'].new{'Hello world #{test = ["1", "2"] + ["3"] }'}.render
[200, {}, [body]]
end)
Segfaults are very rare under this test though due to the simplicity of the the script; very little happens while rendering. Out of 1,162,202 HTTP requests at ~500 requests per second, I got 37 segfaults. I've done the same test at 1,003,356 requests without Tilt, just using ERB, and didn't get any segfaults. Only a few timeouts.

So it does indeed seem to be related to something Tilt is doing. It also appears to be time bound. If I throw 40 threads at the problem, even though I can push through more requests, per minute of work, it seems to give the same segfault rate. Weird. I'm not sure how to proceed narrowing it down further. I could spend hours stripping down Tilt, but it may not reveal anything.

Reply to this email directly or view it on GitHub:
#179 (comment)

Collaborator

judofyr commented Mar 22, 2013

I've been able to reduce it to this code which segfaults in 2.0.0-p0. Testing trunk now.

class Fail
  def render(scope = Object.new)
    compiled_method.bind(scope).call
  end

  def compiled_method
    @compiled_method ||= compile_template_method
  end

  def source
    "Hello world".inspect
  end

  def compile_template_method
    method_name = "__tilt_#{Thread.current.object_id.abs}"
    Object.class_eval("def #{method_name}; #{source} end")
    unbind_compiled_method(method_name)
  end

  def unbind_compiled_method(method_name)
    method = Object.instance_method(method_name)
    Object.class_eval { remove_method(method_name) }
    method
  end
end

loop do
  Fail.new.render
end

Wardrop commented Mar 22, 2013

@judofyr that code block is the config.ru :)

I did intend to post the test script though, sorry about that. Here it is:

require 'peach'
require 'net/http'

results = {}
10 000 000.times.peach(12) do
  key = begin
    Net::HTTP.get_response(URI('http://localhost:3000/')).code
  rescue => e
    e.class
  end
  results[key] ||= 0
  results[key] += 1
  STDOUT.write "\r#{results.inspect}"
end
puts

The peach gem provides the peach method which parallelises the requests; 12 parallel threads are allowed in this case.

Wardrop commented Mar 22, 2013

@judofyr By the way, nice work on reducing it down. That's awesome. In hindsight, I have know no idea why I didn't try looping tilt within a single process. I guess I assumed the whole stack played a part and went with the brute-force over HTTP approach.

Collaborator

judofyr commented Mar 24, 2013

@judofyr judofyr closed this Mar 24, 2013

zzak commented Mar 24, 2013

@judofyr Yup that fixed it for me too!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment