Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails with TypeError on recursive #196

Open
dbogatov opened this issue May 31, 2020 · 7 comments
Open

Fails with TypeError on recursive #196

dbogatov opened this issue May 31, 2020 · 7 comments

Comments

@dbogatov
Copy link

dbogatov commented May 31, 2020

Describe the bug

Exits with TypeError: Cannot read property 'call' of null when run against (at least) my website using -r.

See full log
# ./bin/blc --filter-level 3 -r https://dbogatov.org

Starting recursive scan...

Getting links from: https://dbogatov.org/

Getting links from: https://dbogatov.org/
TypeError: Cannot read property 'isAllowed' of null

======================
Links found: 0
Links skipped: 0
Links OK: 0
Links broken: 0
Time elapsed: 1 second
======================

├───OK─── https://www.googletagmanager.com/gtag/js?id=UA-65293382-4
Finished! 1 links found. 0 broken.
TypeError: Cannot read property 'call' of null
    at HtmlUrlChecker._completedPage2 (/broken-link-checker/lib-cjs/public/HtmlUrlChecker.js:264:44)
    at HtmlChecker.<anonymous> (/broken-link-checker/lib-cjs/public/HtmlUrlChecker.js:142:407)
    at HtmlChecker.emit (events.js:315:20)
    at HtmlChecker.emit (/broken-link-checker/lib-cjs/internal/SafeEventEmitter.js:20:13)
    at HtmlChecker._complete2 (/broken-link-checker/lib-cjs/public/HtmlChecker.js:211:8)
    at UrlChecker.<anonymous> (/broken-link-checker/lib-cjs/public/HtmlChecker.js:99:395)
    at UrlChecker.emit (events.js:315:20)
    at UrlChecker.emit (/broken-link-checker/lib-cjs/internal/SafeEventEmitter.js:20:13)
    at RequestQueue.<anonymous> (/broken-link-checker/lib-cjs/public/UrlChecker.js:68:54)
    at RequestQueue.emit (events.js:315:20)
    at RequestQueue._removeItem2 (/broken-link-checker/node_modules/limited-request-queue/lib-es5/index.js:373:65)
    at /broken-link-checker/node_modules/limited-request-queue/lib-es5/index.js:303:63
    at RequestQueue.<anonymous> (/broken-link-checker/lib-cjs/public/UrlChecker.js:67:7)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

To Reproduce

Here is what I did:

$ docker run -it node:14.3.0-alpine3.10 /bin/sh
apk add --update bash git
git clone https://github.com/stevenvachon/broken-link-checker.git
cd broken-link-checker/
npm install
npm run build

# here is the call
./bin/blc --filter-level 3 -r https://dbogatov.org

Expected behavior

Earlier versions (e.g. v0.7.x) work fine.

For the record, the reason I tried to switch to v8 is because all of sudden earlier versions started to dislike perfectly fine SSL certificates...

Environment:

  • OS and version: node:14.3.0-alpine3.10 /bin/sh Docker image
  • Node.js version: 14.0.3
  • broken-link-checker version: built from master (a08abcd)
@viatcheslavmogilevsky
Copy link

viatcheslavmogilevsky commented Oct 19, 2020

@dbogatov one of workarounds is to simulate /robots.txt path

nginx example:

location = /robots.txt {
    return 200 "User-agent: *\nDisallow: /\n";
}

@dbogatov
Copy link
Author

@viatcheslavmogilevsky

What do you mean?
Do you suggest that I modify my server config just to make BLC not fail?

@viatcheslavmogilevsky
Copy link

@dbogatov

yes, this is just workaround

it seems blc fails if there is no /robots.txt path

@dbogatov
Copy link
Author

@viatcheslavmogilevsky

This seems like an impractical workaround.
I test dozens of websites running NGINX, Apache or plain .NET Core / NodeJS.
For some of the servers I don't even have proper access to configs.
On top of that, introducing a hack into server config codebase just to make a particular buggy CI tool succeed is, IMHO, a bad practice.

Thanks for the idea anyway!
Good catch that the robots.txt is related to the issue!

@dbogatov
Copy link
Author

@viatcheslavmogilevsky

By the way, out of frustration that the bugs are not being fixed for years, I decided to code my own scaled-down alternative to BLC.
Works well at least for my websites!

https://github.com/dbogatov/broken-links-inspector

@viatcheslavmogilevsky
Copy link

anyway there is another bug even with /robots.txt workaround -in v0.8.0-alpha recursive mode doesn't work for me: it checks all internal links, but it doesn't visit to them

in v0.7.8 it does visit to all internal links ( recursive mode ) - but it seems it doesn't work with sites whose support only http/2

@gauravgandhi1315
Copy link

gauravgandhi1315 commented Sep 29, 2021

@dbogatov I was getting the same error but it was never fixed! I did the same, code my own alternative to BLC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants