Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest BLC does not finish properly #90

Open
dbogatov opened this issue Nov 16, 2017 · 10 comments
Open

Latest BLC does not finish properly #90

dbogatov opened this issue Nov 16, 2017 · 10 comments

Comments

@dbogatov
Copy link

dbogatov commented Nov 16, 2017

Sometimes (and I would say most of the time), latest BLC (v0.7.6) silently fails in the middle of the work. As a consequence, does not report the result (and exit code is of no use).

Last known version that does not have that bug is v0.7.3.

See output

$ docker run -it node:8.9.1-alpine /bin/sh
/ # npm install -g broken-link-checker
npm WARN deprecated nopter@0.3.0: try optionator
/usr/local/bin/blc -> /usr/local/lib/node_modules/broken-link-checker/bin/blc
/usr/local/bin/broken-link-checker -> /usr/local/lib/node_modules/broken-link-checker/bin/blc
+ broken-link-checker@0.7.6
added 100 packages in 3.561s
/ # blc https://google.com
Getting links from: https://google.com/
├───OK─── https://www.google.com/imghp?hl=en&tab=wi
├───OK─── https://maps.google.com/maps?hl=en&tab=wl
├───OK─── https://play.google.com/?hl=en&tab=w8
├───OK─── https://news.google.com/nwshp?hl=en&tab=wn
├───OK─── https://mail.google.com/mail/?tab=wm
├───OK─── https://www.youtube.com/?gl=US&tab=w1
├───OK─── https://drive.google.com/?tab=wo
├───OK─── https://www.google.com/intl/en/options/
├───OK─── http://www.google.com/history/optout?hl=en
├───OK─── https://www.google.com/preferences?hl=en
├───OK─── https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/
├───OK─── https://www.google.com/search?site=&ie=UTF-8&q=Chinua+Achebe&oi=ddle&ct=chinua-achebes-87th-birthday-5104396332433408&hl=en&sa=X&ved=0ahUKEwjC4dzIo8TXAhUB4iYKHYtvB7UQPQgD
├───OK─── https://www.google.com/logos/doodles/2017/chinua-achebes-87th-birthday-5104396332433408.3-l.png
├───OK─── https://www.google.com/advanced_search?hl=en&authuser=0
├───OK─── https://www.google.com/language_tools?hl=en&authuser=0
├───OK─── https://www.google.com/intl/en/ads/
├───OK─── https://www.google.com/services/
├───OK─── https://plus.google.com/116899029375914044550
├───OK─── https://www.google.com/intl/en/about.html
├───OK─── https://www.google.com/intl/en/policies/privacy/
/ #
@pointandyshoot
Copy link

I've just had the same happen to me. It took a lot longer on my site crawl (has logged about 15mb of output to file) but is sitting there spinning but not going anywhere.

Did anyone find a workaround to this?

@stevenvachon
Copy link
Owner

stevenvachon commented Feb 27, 2018

Node version? Also, try the v0.8.0 branch

@pointandyshoot
Copy link

Node version v6.11.4
BLC 0.7.7

@breville
Copy link

Not sure if relevant, but we were seeing a hang in our own broken link checker that uses this library. My colleague implemented a small workaround in our code that seems to be helping: code-dot-org/code-dot-org#21310

@breville
Copy link

Update: we've continued getting zombie processes that didn't exit, after all.

@pingevt
Copy link

pingevt commented Sep 15, 2020

I've been getting something similar. BLC will just spin. For what its worth, I traced it down to trying to look up this address: https://www.sothebys.com/en/ I didn't notice anything crazy on that page or the headers, so no clue past that.

@jcdarwin
Copy link

For what it's worth, I've struck this problem too, where it seems BLC simply hangs near the end of processing the links.

It's been working fine for a long time for me (version 0.7.6), but suddenly starting hanging and never completing -- I suspect there's a particular link somewhere that's not getting processed correctly (an unresolved promise?), though I notice when I process different quantities of links (e.g. 10, 100, 400), it processes pretty much all of them before hanging.

In order to work-around this, I've used a setTimeout in the link API callback, such that if the setTimeout is not cleared within 30 seconds, it calls my finish routine that would normally be called by the end API callback:

  let timeout
  let linkCount
  ...

let htmlUrlChecker = new blc.HtmlUrlChecker({
  excludeInternalLinks: true,
  cacheResponses: false,
  excludeLinksToSamePage: true,
}, {
  link: function (result) {
    linkCount++
    log(`Processed ${linkCount}: ${result.url.original}`)

    clearTimeout(timeout)
    timeout = setTimeout(() => {
      // broken-link-checker may not finish -- refer:
      // * https://github.com/stevenvachon/broken-link-checker/issues/90
      // It does however seem to always get stuck almost at the end.
      // After waiting 30 seconds for the next link to be processed,
      // we'll exit.
      finish()
    }, 30000) // 30 seconds
  },
  end: function () {
    finish()
  },
})

@Jeandcc
Copy link

Jeandcc commented May 10, 2021

I'm also having the same issue. @jcdarwin suggestion is exactly what I was thinking of doing, so I'm glad to know that I'm not the only one dealing with that issue.

However, it would be good to find the culprit for the process to be hanging close to the end. Right now we're unable to check roughly 40 links in a database of more than 1000 links.

matkoniecz added a commit to matkoniecz/website-checklist that referenced this issue Aug 6, 2021
@aarongustafson
Copy link

Still seeing this today…

@aarongustafson
Copy link

In order to work-around this, I've used a setTimeout in the link API callback, such that if the setTimeout is not cleared within 30 seconds, it calls my finish routine that would normally be called by the end API callback:

@jcdarwin What does your finish() routine look like? I tried to figure out how to un-stall it by looking at the project source, but didn’t see a clear way to access done() on the item.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants