A Go web crawler library, command-line program, and REST API that aims to automate generating sitemaps for any size website
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
blacklist
cmd
config
crawler
page
sitemap
.travis.sh
.travis.yml
LICENSE
Makefile
README.md

README.md

GoDoc Build Status Go Report Card

SitemapBot

A Go web crawler library, CLI program, and REST API that aims to automate the generation of sitemaps for any size website.

Library

cmd/main.go is an example that uses the library. At this point, browsing the code in the crawler dir is the best starting point, especially the tests.

CLI program

The CLI tool lets you run the bot locally, or on a server. You can use it to generate sitemaps locally and upload, or to schedule a crawl and generation with cron or the likes.

Installing it

go get -u github.com/lukeheuer.org/sitemapbot/cmd

Using it

sitemapbot https://www.domain.tld

Sitemap will be exported to public/sitemap.txt. Currently only plain-text single sitemap files are generated (valid for <50k sitemap pages only).

Report.csv gives a report of all URLs fetched along with load times (sorted longest load time to shortest).

Configuring it

  1. Generate a general new sitemapbot.conf
sitemapbot -nc

note: You can optionally generate a config per domain like this: sitemapbot -nc domain.tld - it will load these settings when invoked with domain in future like this sitemapbot domain.tld.

  1. Edit sitemapbot.conf See config/config.go for details on config directives.

  2. run sitemapbot in dir with config to run with your settings.

Building

If you'd like to build SitemapBot, it is currently built with support for go1.8 and up.