New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in external_links check #1294

Closed
iay opened this Issue Jan 15, 2018 · 10 comments

Comments

Projects
None yet
2 participants
@iay
Contributor

iay commented Jan 15, 2018

nanoc check external_links giving No live threads left. Deadlock? error.

Steps to reproduce

  1. Compile my site, which I've just added a lot of old content to.
  2. nanoc check external_links

Expected behavior

I expected to see some warnings about broken external links.

Actual behavior

Nanoc dies with a No live threads left. Deadlock? error.

Details

I have just added a lot of older content to the site. It's possible that some of the old content has links to unresponsive, very slow or disappeared sites.

Crash log

kite:site iay$ bundle exec nanoc check external_links
Loading site… done
  Running check external_links…   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/extra/parallel_collection.rb:37:in `push': No live threads left. Deadlock? (fatal)
1 threads, 1 sleeps current:0x00007f9778c02a60 main thread:0x00007f9778c02a60
* #<Thread:0x00007f977a07ef68 sleep_forever>
   rb_thread_t:0x00007f9778c02a60 native:0x00007fffa783b340 int:0
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/extra/parallel_collection.rb:37:in `push'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/extra/parallel_collection.rb:37:in `block in each'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/extra/parallel_collection.rb:37:in `each'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/extra/parallel_collection.rb:37:in `each'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/extra/parallel_collection.rb:50:in `block in map'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/extra/parallel_collection.rb:48:in `tap'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/extra/parallel_collection.rb:48:in `map'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/checking/checks/external_links.rb:41:in `select_invalid'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/checking/checks/external_links.rb:15:in `run'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/gems/nanoc-4.8.18/lib/nanoc/checking/runner.rb:123:in `block in run_checks'
...
@ddfreyne

This comment has been minimized.

Show comment
Hide comment
@ddfreyne

ddfreyne Jan 15, 2018

Member

@iay Can you try out the gh-1294-safer-parallel-collection branch and see whether the issue persists?

Member

ddfreyne commented Jan 15, 2018

@iay Can you try out the gh-1294-safer-parallel-collection branch and see whether the issue persists?

@iay

This comment has been minimized.

Show comment
Hide comment
@iay

iay Jan 15, 2018

Contributor

Being a bit short on Ruby-fu, I'm not sure how I would do that. Do you have a quick summary of how I'd go about it?

Contributor

iay commented Jan 15, 2018

Being a bit short on Ruby-fu, I'm not sure how I would do that. Do you have a quick summary of how I'd go about it?

@ddfreyne

This comment has been minimized.

Show comment
Hide comment
@ddfreyne

ddfreyne Jan 16, 2018

Member
  1. In your Gemfile, replace the gem 'nanoc' line with

    gem 'nanoc', github: 'nanoc/nanoc', branch: 'gh-1294-safer-parallel-collection'
  2. Run bundle

  3. Re-run bundle exec nanoc check external_links

Member

ddfreyne commented Jan 16, 2018

  1. In your Gemfile, replace the gem 'nanoc' line with

    gem 'nanoc', github: 'nanoc/nanoc', branch: 'gh-1294-safer-parallel-collection'
  2. Run bundle

  3. Re-run bundle exec nanoc check external_links

@iay

This comment has been minimized.

Show comment
Hide comment
@iay

iay Jan 16, 2018

Contributor

Oh, that's really useful.

Bad news and worse news, I'm afraid.

It seems as if this problem is intermittent. I ran my test a few times this morning with the original configuration, and it worked (reporting bad links).

Switching to the new configuration, my first run was a failure with the same No threads left error message as I was getting yesterday. A subsequent run succeeded. The failure looked like this:

  Running check external_links…   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/bundler/gems/nanoc-a137b1811d60/nanoc/lib/nanoc/extra/parallel_collection.rb:37:in `push': No live threads left. Deadlock? (fatal)
1 threads, 1 sleeps current:0x00007fbfbfc01ec0 main thread:0x00007fbfbfc01ec0
* #<Thread:0x00007fbfc0082f78 sleep_forever>
   rb_thread_t:0x00007fbfbfc01ec0 native:0x00007fffa783b340 int:0
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/bundler/gems/nanoc-a137b1811d60/nanoc/lib/nanoc/extra/parallel_collection.rb:37:in `push'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/bundler/gems/nanoc-a137b1811d60/nanoc/lib/nanoc/extra/parallel_collection.rb:37:in `block in each'

Finally, I'm also seeing this warning message, which I think is new:

Loading site… /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/bundler/gems/nanoc-a137b1811d60/nanoc/lib/nanoc/data_sources/filesystem/parser.rb:55: warning: regexp match /.../n against to UTF-8 string

This is on the latest macOS, with ruby 2.4.3p205 (2017-12-14 revision 61247) [x86_64-darwin17] installed using MacPorts.

Contributor

iay commented Jan 16, 2018

Oh, that's really useful.

Bad news and worse news, I'm afraid.

It seems as if this problem is intermittent. I ran my test a few times this morning with the original configuration, and it worked (reporting bad links).

Switching to the new configuration, my first run was a failure with the same No threads left error message as I was getting yesterday. A subsequent run succeeded. The failure looked like this:

  Running check external_links…   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/bundler/gems/nanoc-a137b1811d60/nanoc/lib/nanoc/extra/parallel_collection.rb:37:in `push': No live threads left. Deadlock? (fatal)
1 threads, 1 sleeps current:0x00007fbfbfc01ec0 main thread:0x00007fbfbfc01ec0
* #<Thread:0x00007fbfc0082f78 sleep_forever>
   rb_thread_t:0x00007fbfbfc01ec0 native:0x00007fffa783b340 int:0
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/bundler/gems/nanoc-a137b1811d60/nanoc/lib/nanoc/extra/parallel_collection.rb:37:in `push'
   /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/bundler/gems/nanoc-a137b1811d60/nanoc/lib/nanoc/extra/parallel_collection.rb:37:in `block in each'

Finally, I'm also seeing this warning message, which I think is new:

Loading site… /Volumes/Cased/git/iay/iay/site/vendor/ruby/2.4.0/bundler/gems/nanoc-a137b1811d60/nanoc/lib/nanoc/data_sources/filesystem/parser.rb:55: warning: regexp match /.../n against to UTF-8 string

This is on the latest macOS, with ruby 2.4.3p205 (2017-12-14 revision 61247) [x86_64-darwin17] installed using MacPorts.

@iay

This comment has been minimized.

Show comment
Hide comment
@iay

iay Jan 17, 2018

Contributor

My current working theory is:

  • Something is causing all of the worker threads to exit prematurely
  • This results in the addition of new work (correctly) detecting a deadlock
  • Worker threads execute prematurely if any other worker thread catches an exception
  • So any worker thread catching an exception early in the set of URLs to be validated will cause this

It's hard to see where an exception would come from in external link validation. The most obvious case would be timeouts, but that seems to be handled inside the validate method in external_links.rb. In any case, those tend to happen late in the process, because by definition they are delayed.

If URI.parse(href) can throw something other than URI::InvalidURIError, that would be one possibility.

Another would be if a redirect goes to something that isn't a valid URI: the URI.parse call in this case isn't guarded.

A third thing I noticed was that if I added a lot of puts calls for debugging, I started to get timeouts from this method in cleaning_stream.rb:

    # @see IO#puts
    def puts(*str)
      _nanoc_swallow_broken_pipe_errors_while do
        @stream.puts(*str.map { |ss| _nanoc_clean(ss) })
      end
    end

For now, I have added some diagnostics to two locations:

  • I added a rescue clause to the block in select_invalid in external_links.rb to catch and diagnose any validation exception early.
  • I added some puts calls to diagnose any exception caught within the worker thread.

So far, I haven't had the issue recur but I will keep this stuff live until I do, and hopefully we will see an exception being raised that we aren't expecting.

Contributor

iay commented Jan 17, 2018

My current working theory is:

  • Something is causing all of the worker threads to exit prematurely
  • This results in the addition of new work (correctly) detecting a deadlock
  • Worker threads execute prematurely if any other worker thread catches an exception
  • So any worker thread catching an exception early in the set of URLs to be validated will cause this

It's hard to see where an exception would come from in external link validation. The most obvious case would be timeouts, but that seems to be handled inside the validate method in external_links.rb. In any case, those tend to happen late in the process, because by definition they are delayed.

If URI.parse(href) can throw something other than URI::InvalidURIError, that would be one possibility.

Another would be if a redirect goes to something that isn't a valid URI: the URI.parse call in this case isn't guarded.

A third thing I noticed was that if I added a lot of puts calls for debugging, I started to get timeouts from this method in cleaning_stream.rb:

    # @see IO#puts
    def puts(*str)
      _nanoc_swallow_broken_pipe_errors_while do
        @stream.puts(*str.map { |ss| _nanoc_clean(ss) })
      end
    end

For now, I have added some diagnostics to two locations:

  • I added a rescue clause to the block in select_invalid in external_links.rb to catch and diagnose any validation exception early.
  • I added some puts calls to diagnose any exception caught within the worker thread.

So far, I haven't had the issue recur but I will keep this stuff live until I do, and hopefully we will see an exception being raised that we aren't expecting.

@iay

This comment has been minimized.

Show comment
Hide comment
@iay

iay Jan 17, 2018

Contributor

If URI.parse(href) can throw something other than URI::InvalidURIError, that would be one possibility.

Here, I'm referring to the call in validate that happens after a redirect. I've tried to provoke this by providing bad URLs as the result of a redirect, but it turns out to be really hard to provoke an InvalidUIError from an http-scheme URI (the Ruby library source tests don't seem to have any that don't involve non-ASCII characters, which I don't think I can persuade Apache to respond with)

Contributor

iay commented Jan 17, 2018

If URI.parse(href) can throw something other than URI::InvalidURIError, that would be one possibility.

Here, I'm referring to the call in validate that happens after a redirect. I've tried to provoke this by providing bad URLs as the result of a redirect, but it turns out to be really hard to provoke an InvalidUIError from an http-scheme URI (the Ruby library source tests don't seem to have any that don't involve non-ASCII characters, which I don't think I can persuade Apache to respond with)

@ddfreyne

This comment has been minimized.

Show comment
Hide comment
@ddfreyne

ddfreyne Jan 18, 2018

Member

I’ll give this a closer look on the weekend!

This might be a good time for me to replace ParallelCollection with something more battle-tested, like parallel.

As they say, is difficult. Writing threaded code

Member

ddfreyne commented Jan 18, 2018

I’ll give this a closer look on the weekend!

This might be a good time for me to replace ParallelCollection with something more battle-tested, like parallel.

As they say, is difficult. Writing threaded code

@ddfreyne

This comment has been minimized.

Show comment
Hide comment
@ddfreyne

ddfreyne Jan 21, 2018

Member

@iay Can you give the branch I mentioned another try?

  1. In your Gemfile, ensure that the nanoc line looks like this:

    gem 'nanoc', github: 'nanoc/nanoc', branch: 'gh-1294-safer-parallel-collection'
  2. Run bundle update nanoc --conservative

  3. Re-run bundle exec nanoc check external_links

This won’t fix the problem yet, but my hope is that this will reveal the underling error that is causing the problem.

Member

ddfreyne commented Jan 21, 2018

@iay Can you give the branch I mentioned another try?

  1. In your Gemfile, ensure that the nanoc line looks like this:

    gem 'nanoc', github: 'nanoc/nanoc', branch: 'gh-1294-safer-parallel-collection'
  2. Run bundle update nanoc --conservative

  3. Re-run bundle exec nanoc check external_links

This won’t fix the problem yet, but my hope is that this will reveal the underling error that is causing the problem.

ddfreyne added a commit that referenced this issue Jan 21, 2018

@ddfreyne

This comment has been minimized.

Show comment
Hide comment
@ddfreyne

ddfreyne Jan 21, 2018

Member

Update: This is now in master, so what you’ll need in the Gemfile is

gem 'nanoc', github: 'nanoc/nanoc'

(i.e. no more branch: …)

Member

ddfreyne commented Jan 21, 2018

Update: This is now in master, so what you’ll need in the Gemfile is

gem 'nanoc', github: 'nanoc/nanoc'

(i.e. no more branch: …)

@iay

This comment has been minimized.

Show comment
Hide comment
@iay

iay Jan 21, 2018

Contributor

I have updated to that and run bundle exec nanoc check elinks five times in a row with no problems showing up.

I don't know that proves anything, as the problem was intermittent anyway, but it's hopeful.

I suggest just closing this issue now; I can reopen it if the problem comes back.

Contributor

iay commented Jan 21, 2018

I have updated to that and run bundle exec nanoc check elinks five times in a row with no problems showing up.

I don't know that proves anything, as the problem was intermittent anyway, but it's hopeful.

I suggest just closing this issue now; I can reopen it if the problem comes back.

@ddfreyne ddfreyne closed this Jan 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment