Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

How to disable compression and create uncompressed files by rake task? #64

Closed
lexer opened this Issue · 18 comments

9 participants

Alexey Zakharov Karl Varga Evan R. Murphy Yury Trofimenko Jason Rust Jeroen van Ingen James Martin Jude Nikos Timiopulos
Alexey Zakharov

No description provided.

Karl Varga
Owner

it is possible at the moment but you have to slightly modify the default SitemapGenerator::FileAdapter and the SitemapGenerator::Namer. I'll have to add an option for this at some point to automate the process, but it's tricky if you are customizing namers and things. Until then, use this in your config:

SitemapGenerator::Sitemap.sitemaps_namer = SitemapGenerator::SitemapNamer.new(:sitemap, :extension => '.xml')
SitemapGenerator::Sitemap.sitemap_index_namer = SitemapGenerator::SitemapIndexNamer.new(:sitemap_index, :extension => '.xml')
SitemapGenerator::Sitemap.adapter = Class.new(SitemapGenerator::FileAdapter) { def gzip(stream, data); stream.write(data); stream.close end }.new

[Updated 2013-05-06]

  • Fixed sitemap_index_namer for the index above, was using a SitemapNamer, should been a SitemapIndexNamer
  • As of version 4.0 you can use the SimpleNamer so it becomes:
SitemapGenerator::Sitemap.namer = SitemapGenerator::SimpleNamer.new(:sitemap, :extension => '.xml')
SitemapGenerator::Sitemap.adapter = Class.new(SitemapGenerator::FileAdapter) { def gzip(stream, data); stream.write(data); stream.close end }.new
Evan R. Murphy

Thanks, Karl. Is there a similar configuration tweak that would allow generation of both compressed and uncompressed sitemaps? This way I could easily check my sitemaps in the browser but not sacrifice performance.

Karl Varga
Owner

Yeah I think this should work for you:

SitemapGenerator::Sitemap.adapter = Class.new(SitemapGenerator::FileAdapter) {
  def gzip(stream, data)
    stream.write(data)
    stream.close
    gz = Zlib::GzipWriter.new(open(location.path.to_s + '.gz', 'wb'))
    gz.write(data)
    gz.close
  end
}.new 

It's a bit hacky but gets the job done :)

Evan R. Murphy

Thanks for taking the time to make that up and share it. Unfortunately it's not working over here, am I doing something wrong?

Here are the contents of my config/sitemap.rb. As you can see, I've added your snippet to the bottom:

# Set the host name for URL creation
SitemapGenerator::Sitemap.default_host = "http://myapp.com"

SitemapGenerator::Sitemap.create do
 # Put links creation logic here
 # ...
end

# Hack to generate both compressed and uncompressed versions of sitemaps
# From https://github.com/kjvarga/sitemap_generator/issues/64#issuecomment-4210753
SitemapGenerator::Sitemap.adapter = Class.new(SitemapGenerator::FileAdapter) {
  def gzip(stream, data)
    stream.write(data)
    stream.close
    gz = Zlib::GzipWriter.new(open(location.path.to_s + '.gz', 'wb'))
    gz.write(data)
    gz.close
  end
}.new

After adding your code, I ran bundle exec rake sitemap:create. It created a sitemap_index.xml.gz and sitemap1.xml.gz as usual, but no uncompressed .xml files.

I'm running version 3.0.0 of the sitemap_generator gem.

Karl Varga
Owner

You have to set the adapter before calling SitemapGenerator::Sitemap.create

Evan R. Murphy

Thank you for that tip and for helping me on this issue.

I modified my config/sitemap.rb file so that the adapter is set before calling SitemapGenerator::Sitemap.create:

## Contents of config/sitemap.rb

# Set the host name for URL creation
SitemapGenerator::Sitemap.default_host = "http://myapp.com"

# Hack to generate both compressed and uncompressed versions of sitemaps
# From https://github.com/kjvarga/sitemap_generator/issues/64#issuecomment-4210753
SitemapGenerator::Sitemap.adapter = Class.new(SitemapGenerator::FileAdapter) {
  def gzip(stream, data)
    stream.write(data)
    stream.close
    gz = Zlib::GzipWriter.new(open(location.path.to_s + '.gz', 'wb'))
    gz.write(data)
    gz.close
  end
}.new 

SitemapGenerator::Sitemap.create do
  # Put links creation logic here.
  # ...
end

Now when I run bundle exec rake sitemap:create, it complains:

rake aborted!
undefined local variable or method `location' for #<#<Class:0xb050c20>:0xb050bd0>

(It appears to be having problems with the location variable in the line gz = Zlib::GzipWriter.new(open(location.path.to_s + '.gz', 'wb')).)

I've posted a full stack trace in case you have time to look at this again: https://gist.github.com/2024439

Karl Varga
Owner

Ok this works:

# Set the host name for URL creation
SitemapGenerator::Sitemap.default_host = "http://myapp.com"

# Hack to generate both compressed and uncompressed versions of sitemaps
# From https://github.com/kjvarga/sitemap_generator/issues/64#issuecomment-4210753
SitemapGenerator::Sitemap.adapter = Class.new(SitemapGenerator::FileAdapter) {
  def gzip(stream, data)
    stream.write(data)
    stream.close
    open(stream.path.sub(/\.gz/, ''), 'wb') do |file|
      file.write(data)
    end
  end
}.new

SitemapGenerator::Sitemap.create do
  # Put links creation logic here.
  # ...
end

The links in the sitemap will still refer to the gzipped files...which is probably what you want.

Yury Trofimenko

Actually it generates both .xml and .xml.gz versions as .xml inside.

Jason Rust

Thanks, needed this because Bing, for some reason, couldn't read our compressed version. Little tweak to ensure that the uncompressed index file points to the uncompressed sitemap files:

# Hack to generate both compressed and uncompressed versions of sitemaps
# From https://github.com/kjvarga/sitemap_generator/issues/64#issuecomment-4210753
SitemapGenerator::Sitemap.adapter = Class.new(SitemapGenerator::FileAdapter) {
  def gzip(stream, data)
    stream.write(data)
    stream.close
    open(stream.path.sub(/\.gz/, ''), 'wb') do |file|
      if stream.path.include? 'sitemap_index.xml'
        file.write(data.gsub(/\.gz/, ''))
      else
        file.write(data)
      end
    end
  end
}.new
Jeroen van Ingen

I wanted a compressed as an uncompressed sitemap as well, so I did the following:
2.times do |i|
if i == 1
SitemapGenerator::Sitemap.sitemaps_namer = SitemapGenerator::SitemapNamer.new(:sitemap, :extension => '.xml')
SitemapGenerator::Sitemap.sitemap_index_namer = SitemapGenerator::SitemapNamer.new(:sitemap_index, :extension => '.xml')
SitemapGenerator::Sitemap.adapter = Class.new(SitemapGenerator::FileAdapter) { def gzip(stream, data); stream.write(data); stream.close end }.new
end

SitemapGenerator::Sitemap.create do
#do stuff
end
end

it works fine!

But I wonder which sitemap would be pinged to the search engine if I enable pinging. The compressed or the uncompressed file?

James Martin

On a side but similar note, do you guys know if it's possible to turn off the whitespace compression too? My Google webmaster tools is reporting an invalid XML tag on line 1. Which is hard to debug when you have 50,000 urls in one sitemap file.

Karl Varga
Owner

Jamsi, no white space is added to start with, that's why it's all on one line. I'm not sure if there's a way to do it with rubys Builder, but it Might be a bit complicated for you because you'd have to edit the source. Better to use textmates XML tidy feature to format the XML then get google to reingest the uncompressed formatted XML file

Jude

I'm already using the sitemap adapter with this, so i cant follow the code snippet you recommend.
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new

Is there another way to generate uncompressd sitemap files?

Karl Varga
Owner

I'd like to add this as a feature at some point because I think it's needed. But it's gonna be tricky because it touches on a lot of points in the code. And the user could want compressed or uncompressed, or both. And if both, which sitemaps should be linked to in the index file? At the least a little built-in hack "recipe" would be nice.

Karl Varga kjvarga closed this
Karl Varga
Owner

As of version 4.0 writing out uncompressed sitemaps only simplifies to this:

SitemapGenerator::Sitemap.namer = SitemapGenerator::SimpleNamer.new(:sitemap, :extension => '.xml')
SitemapGenerator::Sitemap.adapter = Class.new(SitemapGenerator::FileAdapter) { def gzip(stream, data); stream.write(data); stream.close end }.new

@vinchi777 The WaveAdapter uses a FileAdapter in its write() method to write out the file(s) before uploading it. If you were to redefine the gzip() method on the FileAdapter class using the code above then that should work. Try this:

class SitemapGenerator::FileAdapter
  def gzip(stream, data); stream.write(data); stream.close end
end
SitemapGenerator::Sitemap.namer = SitemapGenerator::SimpleNamer.new(:sitemap, :extension => '.xml')
Nikos Timiopulos

Thank you for this. I'm not sure, but is this gziping necessary at all? I think most of HTTP servers has gzip compression already so for the end users or robots it's the same.

Karl Varga
Owner

@alpracka the compression saves on storage and bandwidth too, which all adds up. Plus most people have no control of the setup of the HTTP server if they don't run their own. I think the question is more why wouldn't you want to compress? One should only need to verify their sitemaps the first time, and after that be confident that everything is okay. So doing a sanity check every couple months doesn't seem like a big deal to me.

Karl Varga
Owner

Hi all. This feature has been implemented now. I've got it in beta right now, but I'll release it soon. You can get more info from this thread #124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.