Use your Google sitemap XML file to download a copy of your website from the Google Cache
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Ruby script to recover HTML/text content from your site that has suffered a catastrophic database failure.

This script assumes you have a Google Sitemap file, usually at <yourdomain>/sitemap.xml

Requires the following gems:

* crack
* nokogiri

sudo gem install nokogiri
sudo gem install jnunemaker-crack -s
git clone git://
cd Google-Cache-Sitemap-Downloader
ruby google_cache_sitemap_downloader.rb '' path/to/sitemap.xml 'your_css_selector'

Eg. google_cache_sitemap_downloader.rb '' ~/Desktop/sitemap.xml 'body'

The last item is a CSS selector to check that the page is valid. This uses standard CSS notation to check if the specified element exists, otherwise it will not write the file. Examples: 'body', 'h1#logo', '#main_menu.navigation' and so on.

The Google cache result for each url linked in your sitemap will then be downloaded into a directory 'results' as html, with slashes replaced with '_-_'

Google will rate-limit you and potentially complain that you are using an automated script. If that happens, the script will exit and you will have to wait some time before Google allows you to use the cache again. 

The script waits 2 seconds in between requests by default to attempt to avoid this.