Skip to content

SitemapGenerator Usage

Robert edited this page Feb 28, 2023 · 29 revisions

SitemapGenerator Usage Examples

Please add your name, site and how many links in your Sitemap, and, if you feel like it, a small snippet of cool code, showing how SitemapGenerator made your life easier.


Produced sitemaps with more than 280M links in a matter of 4.3 hours by running the sitemap generation in parallel (parallel gem) over 4 cores.


....
Sitemap stats: 35,622,979 links / 713 sitemaps / 129m58s
Sitemap stats: 35,622,979 links / 713 sitemaps / 130m04s
Sitemap stats: 35,622,979 links / 713 sitemaps / 130m11s
Sitemap stats: 35,622,979 links / 713 sitemaps / 131m18s
....
15460.43 real     39030.90 user     14347.09 sys

Parallel.each(domains, :in_processes => 4) do |domain|
  SitemapGenerator::Sitemap.default_host = "http://#{domain}"
  SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/#{domain}"
  SitemapGenerator::Sitemap.adapter = SitemapGenerator::FileAdapter.new
  SitemapGenerator::Sitemap.create do
    add '/', changefreq: 'monthly', priority: 1.0
    add '/signup', changefreq: 'monthly', priority: 0.8
    add '/login', changefreq: 'monthly', priority: 0.8
    add '/about', changefreq: 'monthly', priority: 0.8
    add '/contact', changefreq: 'monthly', priority: 0.8
    add '/faq', changefreq: 'monthly', priority: 0.8
    add '/careers', changefreq: 'monthly', priority: 0.8
    add '/privacy', changefreq: 'monthly', priority: 0.8
    add '/terms', changefreq: 'monthly', priority: 0.8
    add '/password_resets/new', changefreq: 'monthly', priority: 0.64
    ...
  end
end

Andrew Cetinick, www.sherpi.com, 233,939 links, 4m40s


Sitemap stats: 233,939 links / 5 sitemaps / 4m40s

Rake task on Heroku to push to S3 bucket. Also added this route to my Rails app so that it would redirect the sitemaps to S3

get '/sitemaps/:filename.xml.gz' => 'pages#sitemap'


Adam Salter, www.answermyoffice.com, 72,956 links, 2m03s


Zipcode.find(:all, :include => :city).each do |z|
  sitemap.add zipcode_path(:state => z.city.state, :city => z.city, :zipcode => z)
end

Rob Biedenharn, stylepath.com, Sitemap stats: 4,684,358 links, 6h21m31s

Old: Sitemap stats: 78,645 links, 3m37s

New: Sitemap stats: 4,684,358 links, 6h21m31s


  Category.find_in_order.each do |category|
    sitemap.add category_page_path(category), :changefreq => 'daily', :priority => 0.6
    Product.interesting_from_category(category.id, 0, nil, true).each do |product|
      sitemap.add details_id_path(product), :changefreq => 'weekly', :priority => 0.5
    end
  end

And running against a Rails 1.2.2 project. Only a few changes needed:

  • Need to provide a String#present? (which was easy since I already had String#nonblank?)
  • Cope with the change from app/controllers/application.rb to app/controllers/application_controller.rb by adding:
    • require 'app/controllers/application' to lib/sitemap_generator/helper.rb

mattmueller, 1.9 million urls

It took about 2 hours to generate on a very powerful production server without niceing it. If you decide to nice it (we tried at 15) for that sort of load it would take > 8 hours


openc, 104+million urls for OpenCorporates

Takes several days to generate. Runs weekly on worker server (also processes Resque jobs), and then SCP’d to shared folder on app server, which is symlinked from production.


Since my main sitemap takes too long for Google to process, I take advantage of sitemap_generator’s multiple config option. I generate smaller sitemaps for rapidly changing content such as news.

I use Heroku and S3 (via the Wave Adapter). Due to Google’s Webmaster Tools restriction that sitemap submission must be on same domain, I use 302s to point to sitemap the S3 buckets. Google now indexes them beautifully!


Resque task


Sitemap stats: 1,242,638 links / 25 sitemaps / 17m45s

businessprofiles, ~130M pages indexed for the corporate registration directory, Business Profiles

We store the sitemap files, which take around a week to generate, on S3 space and have Rails routes to appropriately direct requests to sitemap.xml on our primary app server. The gem allowed us to index the site much more efficiently and has resulted in improved indexation by Google of our many millions of pages.


Simple sidekiq job ran daily and generate the sitemap of all “changes”


Sitemap stats: 2,607,677 links / 56 sitemaps / 15m11s

Like many others, we run sitemaps as a worker job on a separate server. Currently generating over 10 million links in under an hour usually.


Sitemap stats: 10,942,929 links / 219 sitemaps / 42m35s

Sitemap is running with a cronjob on a weekly basis. Currently we generate Sitemaps with ~12M product links using a batch size of 25k with find_each including all images of the specific products. Great gem – we highly recommend it!


Sitemap stats: 12,886,563 links / 573 sitemaps / 497m55s (incl. ~5M images)

Jack Kinsella. Been using this gem for perhaps seven years to power my law notes business Oxbridge Notes

class GenerateSitemapService
  # Without this, the `x_url` helpers are only available through the `linkset` instance
  include Rails.application.routes.url_helpers

  def initialize(linkset, default_store: 'gb')
    @linkset = linkset
    @store = default_store
  end

  def run
    add_gb_specific_pages
    add_australia_specific_pages # not shown
    ...
  end

  private 

  attr_accessor :store
 
  def add_gb_specific_pages
    self.store = 'gb'

    Product.active.in_store(store).find_each do |product|
      add product_path(tutor), lastmod: product.updated_at
    end
    ...
  end


  def add(url, options = {})
    defaults = {
      # This is set by default, but I have no idea how often my site changes,
      # so I'll using :lastmod instead.
      changefreq: nil,
      host: host_based_on_store
    }

    linkset.add url, defaults.merge(options)
  end

  def host_based_on_store
    HostDeterminer.for_store(store) # returns something like https://www.example.com
  end
end

SitemapGenerator::Sitemap.create do
  # The sitemap variable is explicitly made available by the library
  # maintainers within the `create` scope
  GenerateSitemapService.new(sitemap).call
end

Boris Tveritnev, ladendirekt.de. ~30M links, gtin-lookup.com. ~80M links.

We’re running sitemaps generation for ladendirekt.de in four independently running tasks, each of which takes ~1h. Uploading them to the block storage (via S3 client) and proxying requests with nginx.

Generating gtin-lookup.com takes a tad longer: ~5 hrs. The rest is the same: S3 to block storage + nginx to proxy requests.