Skip to content
Little script that creates a human-readable sitemap given a domain name - DEPRECATED
Ruby JavaScript HTML CSS
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
app
bin
config
lib/sitemap
public
spec
.buildpacks
.env
.gitignore
.node
.travis.yml
Gemfile
Gemfile.lock
Procfile
README.md
Rakefile
Vagrantfile
app.rb
config.ru
sitemap-generator.gemspec

README.md

Sitemap Generator

A simple command-line Sitemap generator tool. Useful for quickly auditing a website.

Distributed as a Ruby Gem [https://rubygems.org/gems/sitemap-generator], it is not intended to be a Search Engine sitemap or integrated CMS/Rails/etc. - there are plenty of other gems that do that well.

NOTE: LinkedIn have changed their policy and the API this depended on is no longer available, meaning this tool no longer works, and is no longer actively maintained as a result.

Gem Version Build Status

Getting started

gem install sitemap-generator

Examples

Generate a standard CSV Sitemap file

The following command will generate a basic sitemap, listing all links recursively from the site, containing only URIs from the specified domain name (in this case, onegeek.com.au) and will save to a file named sitemap.csv

sitemap generate http://www.onegeek.com.au/ sitemap.csv

Generate a standard Sitemap JSON format

This command deliberately doesn't write to file in order to allow unix-style pipelining

sitemap generate --format=json http://www.onegeek.com.au/

Generate a Sitemap 3 levels deep

sitemap generate --depth=3 http://www.onegeek.com.au/ sitemap.csv

Generate a Sitemap containing links only on the specified URI

sitemap generate --no-recursion http://www.onegeek.com.au/ sitemap.csv

Generate a Sitemap that contains URI fragments and query strings

By default, URI fragments like foo.com/#!/some-page and query strings like foo.com/?bar=baz are ignored - they are generally duplicitous so sitemap-generator strips them off entirely. This lets them back in:

sitemap generate --query-strings --fragments http://www.onegeek.com.au/ sitemap.csv

Getting Help

sitemap
sitemap generate --help

Alternatives?

So of course, after spending a few hours writing this I forgot that wget can do this for you, well basically anyway:

wget --spider --recursive --no-verbose --output-file=wgetlog.txt http://somewebsite.com
sed -n "s@.\+ URL:\([^ ]\+\) .\+@\1@p" wgetlog.txt | sed "s@&@\&@" > sedlog.txt

Website

Run Server

foreman start

You can’t perform that action at this time.