Skip to content
walks websites
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cmd/walker
config
.gitignore
Makefile
README.md
extract.go
extract_test.go
reports.go
scrape.go
scrapeloop.go
service.go
validate.go
vo.go
walker.go
walkerstatus.go

README.md

Walker

Walker walkes aka as crawls through websites and collects performance and SEO relevant data. The results can be browsed through a very simple web interface. Apart from that they are exposed as prometheus metrics (not implemented yet).

Be careful when crawling your website with walker with aggressive settings, it might take your site down

Configuration

---
# target of your scrape
target: http://www.bestbytes.de
# number of concurrent go routines
concurrency: 2
# where to run the webinterface
addr: ":3001"
# if you want to ignore <meta name="robots" content="noindex,nofollow"/>
ignorerobots: true
# in some cases using cookies is friendlier to the server
usecookies: true

# ignoring urls
## based on query parameters in this example all links, that contain a queryparameter foo
ignorequerieswith:
  - foo
## skip everything that has a query
ignoreallqueries: true
# what paths (that would be a prefixes)
ignore:
  - /foomo
...

error detection

  • everything greater than 400 will be tracked as an error

external link validation (not implemented yet)

  • check external links
  • forbidden sites like a stage system

seo validation

  • missing title, description, h1
  • duplication title, description, h1

metrics (not implemented yet)

  • vector of status codes
  • performance buckets
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.