Skip to content

link_checking: prevent rate-limiting#1421

Merged
Keats merged 5 commits into
getzola:nextfrom
angristan:link-checking-threads
Apr 21, 2021
Merged

link_checking: prevent rate-limiting#1421
Keats merged 5 commits into
getzola:nextfrom
angristan:link-checking-threads

Conversation

@angristan
Copy link
Copy Markdown
Contributor

@angristan angristan commented Mar 30, 2021

Fix for #1056.

  • assign all links for a domain to the same thread
  • reduce number of threads from 32 to 8
  • add sleep between HTTP calls

Demo

devenv :: ~/dawn-zola ‹main› » time ~/zola/target/debug/zola check
Checking site...
Checking 0 internal link(s) with an anchor.
Checking 502 external link(s).
Thread for domain: soundiiz.com
Thread for domain: grafana.com
Thread for domain: web.archive.org
[...]
Thread for domain: docs.docker.com
Domain: docs.docker.com, url: "https://docs.docker.com/docker-hub/github/"
Domain: docs.docker.com, url: "https://docs.docker.com/engine/reference/commandline/login/#provide-a-password-using-stdin"
Thread for domain: wiki.diasporafoundation.org
Domain: wiki.diasporafoundation.org, url: "https://wiki.diasporafoundation.org/Installation/Debian/Jessie"
Thread for domain: core.telegram.org
Domain: core.telegram.org, url: "https://core.telegram.org/bots"
Domain: grafana.com, url: "https://grafana.com/plugins?type=datasource"
Domain: twitter.com, url: "https://twitter.com/fuolpit"
Domain: docs.docker.com, url: "https://docs.docker.com/storage/storagedriver/zfs-driver/"
Thread for domain: caniuse.com
Domain: caniuse.com, url: "https://caniuse.com/#feat=tls1-3"
Domain: twitter.com, url: "https://twitter.com/torproject?ref_src=twsrc%5Etfw"
[...]
Domain: github.com, url: "https://github.com/hyperic/sigar/issues/74"
Domain: github.com, url: "https://github.com/Angristan/dockerfiles/tree/master/diaspora"
Domain: github.com, url: "https://github.com/angristan/dockerfiles/tree/master/diaspora"
Domain: github.com, url: "https://github.com/caddyserver/caddy/issues/2080"
Domain: github.com, url: "https://github.com/caddyserver/caddy/releases/tag/v0.11.5"
Domain: github.com, url: "https://github.com/sivel/speedtest-cli"
Domain: github.com, url: "https://github.com/sivel/speedtest-cli#installation"
Domain: github.com, url: "https://github.com/sivel/speedtest-cli#usage"
Domain: github.com, url: "https://github.com/Microsoft/vscode/issues/51132"
Domain: github.com, url: "https://github.com/alexanderyakusik"
Domain: github.com, url: "https://github.com/Microsoft/vscode/issues/51132#issuecomment-424132330"
> Checked 502 external link(s): 1 error(s) found.
Failed to check the site
Error: Dead link in /root/dawn-zola/content/2018-03-diaspora-in-docker/index.md to https://gitlab.koehn.com/docker/diaspora: error sending request for url (https://gitlab.koehn.com/docker/diaspora): operation timed out
~/zola/target/debug/zola check  14.13s user 1.44s system 8% cpu 3:05.32 total

Considering there is 500 links to check, it takes about 2.7s per link. Out of these 500 links there are 150 github.com links, that's why we see all the github links at the end, since this is the last thread running. I didn't get any 429 Too Many Requests using this patch.


This is my first time using Rust 🦀, so I'm not familiar with the idiomatic way of doing things. I had a hard times with iterators, but it looks like I ended up with something that works! I look forward to some review so I can see what can be improved. 🙏 I added a few comments on things that I'm not sure how to handle.

Fix for getzola#1056.

- assign all links for a domain to the same thread
- reduce number of threads from 32 to 8
- add sleep between HTTP calls
Comment thread components/site/src/link_checking.rs Outdated
Comment thread components/site/src/link_checking.rs Outdated
Comment thread components/site/src/link_checking.rs
Comment thread components/site/src/link_checking.rs Outdated
Comment thread components/site/src/link_checking.rs Outdated
Comment thread components/site/src/link_checking.rs
Comment thread components/site/src/link_checking.rs Outdated
Comment thread components/site/src/link_checking.rs
Comment thread components/site/src/link_checking.rs
Comment thread components/site/src/link_checking.rs Outdated
Comment thread components/site/src/link_checking.rs Outdated
@angristan angristan force-pushed the link-checking-threads branch from a98b4d3 to a7f862c Compare April 1, 2021 23:06
@angristan angristan force-pushed the link-checking-threads branch from 7f98368 to cd2d021 Compare April 6, 2021 22:46
Copy link
Copy Markdown
Collaborator

@Keats Keats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just need to remove the println

Comment thread components/site/src/link_checking.rs Outdated
@Keats Keats merged commit 47b9207 into getzola:next Apr 21, 2021
@Keats
Copy link
Copy Markdown
Collaborator

Keats commented Apr 21, 2021

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants