Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

renamed & restructured to massive_sitemap

  • Loading branch information...
commit a50babe896c71606f4f84756c650f962bb7f411a 1 parent c7ced2f
@rngtng authored
Showing with 528 additions and 1,020 deletions.
  1. +4 −2 .gitignore
  2. +25 −0 CHANGELOG.md
  3. +2 −3 Gemfile
  4. +9 −18 Gemfile.lock
  5. +0 −104 History.txt
  6. +0 −22 LICENSE
  7. +11 −0 README.md
  8. +0 −110 README.rdoc
  9. +1 −8 Rakefile
  10. +0 −1  VERSION
  11. +0 −22 big_sitemap.gemspec
  12. +0 −100 lib/big_sitemap.rb
  13. +0 −107 lib/big_sitemap/builder.rb
  14. +0 −89 lib/big_sitemap/writer.rb
  15. +51 −0 lib/massive_sitemap.rb
  16. +75 −0 lib/massive_sitemap/builder/base.rb
  17. +19 −0 lib/massive_sitemap/builder/index.rb
  18. +33 −0 lib/massive_sitemap/builder/rotating.rb
  19. +2 −2 lib/{big_sitemap → massive_sitemap}/ping.rb
  20. +3 −0  lib/massive_sitemap/version.rb
  21. +61 −0 lib/massive_sitemap/writer/file.rb
  22. +21 −0 lib/massive_sitemap/writer/gzip_file.rb
  23. +27 −0 lib/massive_sitemap/writer/locking_file.rb
  24. +21 −0 lib/massive_sitemap/writer/string.rb
  25. +24 −0 massive_sitemap.gemspec
  26. +0 −41 spec/big_sitemap_spec.rb
  27. +51 −32 spec/builder_spec.rb
  28. +78 −0 spec/massive_sitemap_spec.rb
  29. +1 −1  spec/spec_helper.rb
  30. +9 −6 spec/writer_spec.rb
  31. +0 −10 test/big_sitemap_ping_test.rb
  32. +0 −224 test/big_sitemap_test.rb
  33. +0 −48 test/fixtures/test_model.rb
  34. +0 −70 test/test_helper.rb
View
6 .gitignore
@@ -1,3 +1,5 @@
-.rvmrc
-._*
+*.gem
+.bundle
+Gemfile.lock
pkg
+.rvmrc
View
25 CHANGELOG.md
@@ -0,0 +1,25 @@
+# Changes
+
+## vx.x.x - ???
+
+ * Amazon S3 integration
+ * manifest handling
+
+## v2.0.x - ???
+
+ * added Index generation
+ * updated/fixed Ping
+ * updated Docu
+ * Writer overwrite detection
+
+## v2.0.1 - 9-02-2012
+ _inital release_
+
+ * restructured gem completely based on BigSitemap gem
+ * seperated logic in two major parts:
+ * Builder -> creates content
+ * Writer -> stores content
+ * added several implementations/specifiaction of builder/writer
+ * added generator for default setup
+ * added specs
+
View
5 Gemfile
@@ -1,5 +1,4 @@
-source :rubygems
+source "http://rubygems.org"
-# Specify your gem's dependencies in big_sitemap.gemspec
+# Specify your gem's dependencies in massive_sitemap.gemspec
gemspec
-
View
27 Gemfile.lock
@@ -1,33 +1,24 @@
PATH
remote: .
specs:
- big_sitemap (1.0.0)
+ massive_sitemap (0.0.1)
GEM
remote: http://rubygems.org/
specs:
diff-lcs (1.1.3)
- mocha (0.9.10)
- rake
- nokogiri (1.4.4)
- rake (0.8.7)
- rspec (2.7.0)
- rspec-core (~> 2.7.0)
- rspec-expectations (~> 2.7.0)
- rspec-mocks (~> 2.7.0)
- rspec-core (2.7.1)
- rspec-expectations (2.7.0)
+ rspec (2.8.0)
+ rspec-core (~> 2.8.0)
+ rspec-expectations (~> 2.8.0)
+ rspec-mocks (~> 2.8.0)
+ rspec-core (2.8.0)
+ rspec-expectations (2.8.0)
diff-lcs (~> 1.1.2)
- rspec-mocks (2.7.0)
- shoulda (2.11.3)
+ rspec-mocks (2.8.0)
PLATFORMS
ruby
DEPENDENCIES
- big_sitemap!
- bundler
- mocha
- nokogiri
+ massive_sitemap!
rspec
- shoulda
View
104 History.txt
@@ -1,104 +0,0 @@
-=== 1.0.0 / 2011-10-24
-
-* API Change: Sitemaps are now generated using a block syntax. Find methods are no longer the responsibility of BigSitemap. Instead, sitemaps are generated using a block, in which you call your own find methods, passing the results to BigSitemap with the 'add' method. See the README for details.
-* BigSitemapRails and BigSitemapMerb are now BigSitemap::Rails and BigSitemap::Merb, respectively.
-* Sitemap files are now placed in the document root by default
-* Sitemaps are now automatically cleaned before generating the new set
-* Search engines are now pinged automatically when the sitemap is generated
-* Lock files are now generated automatically
-* Sitemap files are no longer split amongst your models
-
-=== 0.8.5 / 2011-10-20
-
-* Gzipped files now include indents and newlines
-
-=== 0.8.4 / 2011-10-20
-
-* Fixes an issue where joins where causing ambiguous "id" column
- (https://github.com/alexrabarts/big_sitemap/pull/17)
-* Fixes an issue with empty <loc> nodes
- (https://github.com/alexrabarts/big_sitemap/pull/20)
-
-=== 0.8.3 / 2011-03-08
-
-* Separate URL and file paths are now supported via the :document_path
- and :url_path options
-* Fixes an issue when initializing in Rails 3
-
-=== 0.8.2 / 2011-01-25
-
-* Fixes an issue where sitemap files were not being generated if the same model
- was added more than once (fixes issue #5: https://github.com/alexrabarts/big_sitemap/issues/#issue/5)
-
-=== 0.8.1 / 2011-01-25
-
-* API change: Rails/Merb are no longer automatically detected - use BigSitemapRails and BigSitemapMerb instead
-* API change: Rails' polymorphic_url helper is no longer used to generate URLs (use a lambda with the new :location option instead)
-* Static resources can now be added using the add_static method
-* Incremental updates are now available via the :partial_update option
-* "loc" URL values can now be generated with lambdas
-* Sitemap files can now be locked while being generated using the with_lock method
-* Several bug fixes
-
-=== 0.5.1 / 2009-09-07
-
-* Fixes an issue with the :last_modified key being passed into the find method options
-
-=== 0.5.0 / 2009-09-07
-
-* Add support for lambdas when specifying lastmod
-
-=== 0.4.0 / 2009-08-09
-
-* Use Bing instead of Live/MSN. Note, this breaks backwards compatibility as
- the old :ping_msn option is now :ping_bing.
-
-=== 0.3.5 / 2009-08-05
-
-* Fixed bugs in root_url generation and url_for_sitemap generation
-
-=== 0.3.4 / 2009-07-02
-
-* BigSitemap-specific options are no longer passed through to the ORM's find method
-
-=== 0.3.2 / 2009-06-09
-
-* Better handling of URLs when Rails' polymorphic_url isn't available in the model
-
-=== 0.3.2 / 2009-06-09
-
-* Fixes "uninitialized constant ActionController" error
-* Fixes "Unknown key(s): path" error
-
-=== 0.3.1 / 2009-04-18
-
-* Fixes broken gemspec
-
-=== 0.3.0 / 2009-04-06
-
-* API change: Pass model through as first argument to add method, e.g.sitemap.add(Posts, {:path => 'articles'})
-* API change: Use Rails' polymorphic_url helper to generate URLs if Rails is being used
-* API change: Only ping search engines when ping_search_engines is explicitly called
-* Add support for passing options through to the model's find method, e.g. :conditions
-* Allow base URL to be specified as a hash as well as a string
-* Add support for changefreq and priority
-* Pluralize sitemap model filenames
-* GZipping may optionally be turned off
-
-=== 0.2.1 / 2009-03-12
-
-* Normalize path arguments so it no longer matters whether a leading slash is used or not
-
-=== 0.2.0 / 2009-03-11
-
-* Methods are now chainable
-
-=== 0.1.4 / 2009-03-11
-
-* Add clean method to clear out Sitemaps directory
-* Make methods chainable
-
-=== 0.1.3 / 2009-03-10
-
-* Initial release
-
View
22 LICENSE
@@ -1,22 +0,0 @@
-(The MIT License)
-
-Copyright (c) 2009 Stateless Systems (http://statelesssystems.com)
-
-Permission is hereby granted, free of charge, to any person obtaining
-a copy of this software and associated documentation files (the
-'Software'), to deal in the Software without restriction, including
-without limitation the rights to use, copy, modify, merge, publish,
-distribute, sublicense, and/or sell copies of the Software, and to
-permit persons to whom the Software is furnished to do so, subject to
-the following conditions:
-
-The above copyright notice and this permission notice shall be
-included in all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
-EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
-IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
-CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
-TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
-SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
View
11 README.md
@@ -0,0 +1,11 @@
+# MassiveSitemap
+
+[![](http://travis-ci.org/rngtng/massive_sitemap.png)](http://travis-ci.org/rngtng/massive_sitemap)
+
+Build painfree sitemaps for webpages with millions of pages
+
+MassiveSitemap is a successor project of [BigSitemap](https://github.com/alexrabarts/big_sitemap), a
+a [Sitemap](http://sitemaps.org) generator for webpages with millions of pages.
+It implements various generation stategies, e.g. to split large Sitemaps into multiple files, gzip files to minimize bandwidth usage, or incremental updates. Its Api is very similar to _BigSitemap_ and therefor can be set up with just a few lines of code and is compatible with just about any framework.
+
+
View
110 README.rdoc
@@ -1,110 +0,0 @@
-= BigSitemap
-
-<!-- [![](http://travis-ci.org/rngtng/big_sitemap.png)](http://travis-ci.org/rngtng/big_sitemap) -->
-
-{Travis}[http://travis-ci.org/rngtng/big_sitemap]
-
-BigSitemap is a {Sitemap}[http://sitemaps.org] generator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, supports increment updates, can be set up with just a few lines of code and is compatible with just about any framework.
-
-BigSitemap is best run periodically through a Rake/Thor task.
-
- require 'big_sitemap'
-
- include Rails.application.routes.url_helpers # Allows access to Rails routes
-
- BigSitemap.generate(:url_options => {:host => 'example.com'}, :document_root => "#{APP_ROOT}/public") do
- # Add a static page
- add '/about'
-
- # Add some URLs from your Rails application
- Post.find(:all).each do |post|
- add post_path(post)
- end
-
- # Add some URLs with additional options
- Product.find(:all).each do |product|
- add product_path(product), :change_frequency => 'daily', :priority => 0.5
- end
- end
-
-The code above will create a minimum of two files:
-
-1. public/sitemaps/sitemap_index.xml.gz
-2. public/sitemaps/sitemap.xml.gz
-
-If your sitemaps grow beyond 50,000 URLs (this limit can be overridden with the <code>:max_per_sitemap</code> option), the sitemap files will be partitioned into multiple files (<code>sitemap_1.xml.gz</code>, <code>sitemap_2.xml.gz</code>, ...).
-
-=== Framework-specific Classes
-
-Use the framework-specific classes to take advantage of built-in shortcuts.
-
-==== Rails
-
-<code>BigSiteMapRails</code> deals with setting the <code>:document_root</code> and <code>:url_options</code> initialization options.
-
-==== Merb
-
-<code>BigSitemapMerb</code> deals with setting the <code>:document_root</code> initialization option.
-
-== Install
-
-Via gem:
-
- sudo gem install big_sitemap
-
-== Advanced
-
-=== Initialization Options
-
-* <code>:url_options</code> -- hash with <code>:host</code>, optionally <code>:port</code> and <code>:protocol</code>
-* <code>:base_url</code> -- string alternative to <code>:url_options</code>, e.g. <code>'https://example.com:8080/'</code>
-* <code>:url_path</code> -- string path_name to sitemaps folder, defaults to <code>:document_path</code>
-* <code>:document_root</code> -- string
-* <code>:document_path</code> -- string document path for sitemaps, relative to :document_root, defaults to empty string (putting sitemap files in the document root directory)
-* <code>:document_full</code> -- string absolute document path to generation folder - defaults to <code>:document_root/:document_path</code>
-* <code>:max_per_sitemap</code> -- <code>50000</code>, which is the limit dictated by Google but can be less
-* <code>:gzip</code> -- <code>true</code>
-* <code>:ping_google</code> -- <code>true</code>
-* <code>:ping_yahoo</code> -- <code>false</code>, needs <code>:yahoo_app_id</code>
-* <code>:ping_bing</code> -- <code>false</code>
-* <code>:ping_ask</code> -- <code>false</code>
-* <code>:partial_update</code> -- <code>false</code>
-
-=== Change Frequency, Priority and Last Modified
-
-You can control "changefreq", "priority" and "lastmod" values for each record individually by passing them as optional arguments when adding URLs:
-
- add(product_path(product), {
- :change_frequency => 'daily',
- :priority => 0.5,
- :last_modified => product.updated_at
- })
-
-=== Partial Update
-
-If you enable <code>:partial_update</code>, the filename will include the id of the first entry. This is perfect to update just the last file with new entries without the need to re-generate files being already there. You must pass the entry's id in when adding the URL. For example:
-
-BigSitemap.generate(:base_url => 'http://example.com', :partial_update => true) do
- Widget.find_in_batches(:conditions => "id > #{get_last_id}").each do |widget|
- add widget_path(widget), :id => widget.id
- end
-end
-
-== TODO
-
-Tests for framework-specific components.
-
-== Credits
-
-Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
-
-Thanks also to those who have contributed patches:
-
-* Mislav Marohnić
-* Jeff Schoolcraft
-* Dalibor Nasevic
-* Tobias Bielohlawek (http://www.rngtng.com)
-
-== Copyright
-
-Copyright (c) 2010 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
View
9 Rakefile
@@ -1,11 +1,4 @@
-require 'bundler/gem_tasks'
-
-require 'rake/testtask'
-Rake::TestTask.new(:test) do |t|
- t.libs << 'lib' << 'test' << Rake.original_dir
- t.pattern = 'test/**/*_test.rb'
- t.verbose = false
-end
+require "bundler/gem_tasks"
require 'rspec/core/rake_task'
RSpec::Core::RakeTask.new(:spec) do |t|
View
1  VERSION
@@ -1 +0,0 @@
-1.0.0
View
22 big_sitemap.gemspec
@@ -1,22 +0,0 @@
-# -*- encoding: utf-8 -*-
-$:.push File.expand_path("../lib", __FILE__)
-
-Gem::Specification.new do |s|
- s.name = "big_sitemap"
- s.version = File.read('VERSION').strip
- s.authors = ["Alex Rabarts", "Tobias Bielohlawek"]
- s.email = ["alexrabarts@gmail.com", "tobi@soundcloud.com"]
- s.homepage = %q{http://github.com/alexrabarts/big_sitemap}
- s.summary = %q{A Sitemap generator specifically designed for large sites (although it works equally well with small sites)}
- s.description = %q{BigSitemap is a Sitemapgenerator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries to minimize memory usage, supports increment updates, can be set up with just a few lines of code and is compatible with just about any framework.}
-
- s.files = `git ls-files`.split("\n")
- s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
- s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
- s.require_paths = ["lib"]
-
- ["bundler", "rspec", "shoulda", "mocha", "nokogiri"].each do |gem|
- s.add_development_dependency *gem.split(' ')
- end
-end
-
View
100 lib/big_sitemap.rb
@@ -1,100 +0,0 @@
-require 'uri'
-require 'fileutils'
-
-require 'big_sitemap/builder'
-require 'big_sitemap/writer'
-
-# Page at -> <base_url>
-# http://example.de/dir/
-
-# Index at
-# http://sitemap.example.de/index-dir/
-
-# Save at -> <document_full>
-# /root/dir/ -> <document_root>/<document_path>
-
-class BigSitemap
- DEFAULTS = {
- # :max_per_sitemap => RotatingBuilder::MAX_URLS,
- # :indent => Builder::OPTS[:indent_by],
-
- :document_root => '.',
- :document_path => '/',
- #:index_path => '/',
-
- :gzip => true,
-
- # Opinionated
- :ping => [:google]
- }
-
- attr_reader :options
-
- class << self
- def generate(options={}, &block)
- self.new(options).tap do |sitemap|
- @builder = RotatingBuilder.new(sitemap.options[:writer])# do |builder| #TODO opts: indent, max_per_sitemap
- instance_eval(&block) if block
- @builder.close!
- end
-
- #sitemap.generate_index
- # BigSitemap::ping_search_engines(url, options[:ping])
- end
-
- private
- def add(path, options={})
- #url = File.join @options[:base_url], path
- @builder.add_url! path, options
- end
- end
-
- def initialize(options={})
- @options = DEFAULTS.merge options
-
- #gets prefixed to url if 'http' is missing
- unless @options[:base_url]
- raise ArgumentError, 'you must specify ":base_url" string'
- end
-
- @options[:url_path] ||= @options[:document_path]
-
- @options[:document_full] ||= File.join(@options[:document_root], @options[:document_path])
- unless @options[:document_full]
- raise ArgumentError, 'Document root must be specified with the ":document_root" option, the full path with ":document_full"'
- end
-
- Dir.mkdir(@options[:document_full]) unless File.exists?(@options[:document_full])
-
- @options[:writer] = FileWriter.new File.join(@options[:document_full], "sitemap.xml").to_s
- end
-
- # Create a sitemap index document
- def generate_index(files = Dir[sitemap_files])
- FileWriter.new(@options[:document_full] + "sitemap_index.xml") do |writer|
- IndexBuilder.new('sitemap_index') do |builder|
- files.each do |path|
- next if path =~ /index/
- builder.add_url! url_for_sitemap(path), :last_modified => File.stat(path).mtime
- end
- end
- end
- end
-
-
- def sitemap_files
- File.join(@options[:document_full], "*.{xml,xml.gz}")
- end
-
- def url_for_sitemap(path)
- File.join @options[:base_url], @options[:url_path], path
- end
-
- def clean!
- Dir[sitemap_files].each do |file|
- FileUtils.rm file
- end
- end
-
-
-end
View
107 lib/big_sitemap/builder.rb
@@ -1,107 +0,0 @@
-
-# writer only has print and puts as interface
-
-class BigSitemap
- class Builder
- HEADER_NAME = 'urlset'
- HEADER_ATTRIBUTES = {
- 'xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9',
- 'xmlns:xsi' => "http://www.w3.org/2001/XMLSchema-instance",
- 'xsi:schemaLocation' => "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
- }
- OPTS = {
- :indent_by => 2
- }
-
- def initialize(writer, opts = {}, &block)
- @opt = OPTS.merge(opts)
- @writer = writer
- init!(&block)
- end
-
- def init!(&block) #_init_document
- @writer.init!
- @opened_tags = []
- @writer.print '<?xml version="1.0" encoding="UTF-8"?>'
- tag! self.class::HEADER_NAME, self.class::HEADER_ATTRIBUTES, &block
- end
-
- def add_url!(location, options = {})
- tag! 'url' do
- tag! 'loc', location
- tag! 'lastmod', options[:last_modified].utc.strftime('%Y-%m-%dT%H:%M:%S+00:00') if options[:last_modified]
- tag! 'changefreq', options[:change_frequency] if options[:change_frequency]
- tag! 'priority', options[:priority].to_s if options[:priority]
- end
- end
-
- def tag!(name, content = nil, attrs = {}, &block) # _tag
- attrs = content if content.is_a? Hash
- open!(name, attrs)
- if content.is_a? String
- @writer.print content.gsub('&', '&amp;')
- close!(false)
- else
- if block
- instance_eval(&block)
- close!
- end
- end
- end
-
- def open!(name, attrs = {}) #_open_tag
- attrs = attrs.map { |attr, value| %Q( #{attr}="#{value}") }.join('')
- @writer.print "\n" + ' ' * @opt[:indent_by] * @opened_tags.size
- @opened_tags << name
- @writer.print "<#{name}#{attrs}>"
- end
-
- def close!(indent = true) #_close_tag / #_close_document
- name = @opened_tags.pop
- @writer.print "\n" + ' ' * @opt[:indent_by] * @opened_tags.size if indent
- @writer.print "</#{name}>"
- @writer.close! if @opened_tags.size == 0
- end
- end
-
- class IndexBuilder < Builder
- HEADER_NAME = 'sitemapindex'
- HEADER_ATTRIBUTES = {
- 'xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9'
- }
-
- def add_url!(location, options={})
- tag! 'sitemap' do
- tag! 'loc', location
- tag! 'lastmod', options[:last_modified].utc.strftime('%Y-%m-%dT%H:%M:%S+00:00') if options[:last_modified]
- end
- end
- end
-
- class RotatingBuilder < Builder
- NUM_URLS = 1..50_000
-
- def initialize(writer, opt = {}, &block)
- @max_urls = opt[:max_per_sitemap] || NUM_URLS.max
- unless NUM_URLS.member?(@max_urls)
- raise ArgumentError, %Q(":max_per_sitemap" must be greater than #{NUM_URLS.min} and smaller than #{NUM_URLS.max})
- end
-
- super
- end
-
- def init!(&block)
- @urls = 0
- super
- end
-
- def add_url!(location, options={})
- if @urls >= @max_urls
- close!
- init!
- end
- super
- @urls += 1
- end
- end
-end
View
89 lib/big_sitemap/writer.rb
@@ -1,89 +0,0 @@
-require 'fileutils'
-require 'zlib'
-require 'stringio'
-
-class BigSitemap
-
- # Write into String
- # Perfect for testing porpuses
- class StringWriter < StringIO
- def init! # do noting
- end
-
- def close! # do noting
- end
- end
-
- # Write into File
- # On rotation, close current file, and reopen a new one
- # with same file name but -<counter> appendend
- #
- # TODO what if file exists?, overwrite flag??
- class FileWriter
-
- def initialize(file_name_template)
- @stream_name_template = file_name_template
- @stream_names = []
- end
-
- # API
- def init!
- close! if @stream
- @stream = File.open(tmp_file_name, 'w+:ASCII-8BIT')
- end
-
- def close!
- @stream.close
- @stream = nil
- # Move from tmp_file into acutal file
- File.delete(file_name) if File.exists?(file_name)
- File.rename(tmp_file_name, file_name)
- @stream_names << file_name
- end
-
- def print(string)
- @stream.print(string)
- end
-
- private
- def file_name
- cnt = @stream_names.size == 0 ? "" : "-#{@stream_names.size}"
- ext = File.extname(@stream_name_template)
- @stream_name_template.gsub(ext, cnt + ext)
- end
-
- def tmp_file_name
- file_name + ".tmp"
- end
- end
-
- # Write into GZipped File
- class GzipFileWriter < FileWriter
- def initialize(file_name_template)
- super(file_name_template + ".gz")
- end
-
- def init!
- super
- @stream = ::Zlib::GzipWriter.new(@stream)
- end
- end
-
- class LockingFileWriter < FileWriter
- LOCK_FILE = 'generator.lock'
-
- def init!
- close! if @stream
- File.open(LOCK_FILE, 'w', File::EXCL) #lock!
- super
- rescue Errno::EACCES => e
- raise 'Lockfile exists'
- end
-
- def close!
- super
- FileUtils.rm LOCK_FILE #unlock!
- end
- end
-
-end
View
51 lib/massive_sitemap.rb
@@ -0,0 +1,51 @@
+# require 'uri'
+# require 'fileutils'
+
+require "massive_sitemap/version"
+
+require 'massive_sitemap/writer/file'
+require 'massive_sitemap/builder/rotating'
+
+# Page at -> <base_url>
+# http://example.de/dir/
+
+# Index at
+# http://sitemap.example.de/index-dir/
+
+# Save at -> <document_full>
+# /root/dir/ -> <document_root>/<document_path>
+
+# require 'massive_sitemap/builder/index'
+
+module MassiveSitemap
+ DEFAULTS = {
+ # writer
+ :document_full => '.',
+
+ # builder
+ :base_url => nil,
+ :indent_by => 2,
+ }
+
+ def generate(options = {}, &block)
+ @options = DEFAULTS.merge options
+
+ unless @options[:base_url]
+ raise ArgumentError, 'you must specify ":base_url" string'
+ end
+ @options[:base_url] = beauty_url(@options[:base_url])
+
+ Dir.mkdir(@options[:document_full]) unless ::File.exists?(@options[:document_full])
+
+ writer = Writer::File.new "sitemap.xml", @options
+ Builder::Rotating.new(writer, @options, &block)
+ end
+ module_function :generate
+
+ # move to builder???
+ def beauty_url(url)
+ schema, host = url.scan(/^(https?:\/\/)?(.+?)\/?$/).flatten
+ "#{schema || 'http://'}#{host}/"
+ end
+ module_function :beauty_url
+end
View
75 lib/massive_sitemap/builder/base.rb
@@ -0,0 +1,75 @@
+module MassiveSitemap
+ module Builder
+
+ class Base
+ OPTS = {
+ :base_url => nil,
+ :indent_by => 2
+ }
+
+ HEADER_NAME = 'urlset'
+ HEADER_ATTRIBUTES = {
+ 'xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9',
+ 'xmlns:xsi' => "http://www.w3.org/2001/XMLSchema-instance",
+ 'xsi:schemaLocation' => "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
+ }
+
+ attr_reader :options
+
+ def initialize(writer, options = {}, &block)
+ @writer = writer
+ @options = OPTS.merge(options)
+ @builder = self
+ init!(&block)
+ end
+
+ def add(path, attrs = {})
+ add_url! File.join(options[:base_url], path), attrs
+ end
+
+ def init!(&block) #_init_document
+ @writer.init!
+ @opened_tags = []
+ @writer.print '<?xml version="1.0" encoding="UTF-8"?>'
+ tag! self.class::HEADER_NAME, self.class::HEADER_ATTRIBUTES, &block
+ end
+
+ def add_url!(location, attrs = {})
+ tag! 'url' do
+ tag! 'loc', location
+ tag! 'lastmod', attrs[:last_modified].utc.strftime('%Y-%m-%dT%H:%M:%S+00:00') if attrs[:last_modified]
+ tag! 'changefreq', attrs[:change_frequency] if attrs[:change_frequency]
+ tag! 'priority', attrs[:priority].to_s if attrs[:priority]
+ end
+ end
+
+ def tag!(name, content = nil, attrs = {}, &block) # _tag
+ attrs = content if content.is_a? Hash
+ open!(name, attrs)
+ if content.is_a? String
+ @writer.print content.gsub('&', '&amp;')
+ close!(false)
+ else
+ if block
+ instance_eval(&block)
+ close!
+ end
+ end
+ end
+
+ def open!(name, attrs = {}) #_open_tag
+ attrs = attrs.map { |attr, value| %Q( #{attr}="#{value}") }.join('')
+ @writer.print "\n" + ' ' * options[:indent_by] * @opened_tags.size
+ @opened_tags << name
+ @writer.print "<#{name}#{attrs}>"
+ end
+
+ def close!(indent = true) #_close_tag / #_close_document
+ name = @opened_tags.pop
+ @writer.print "\n" + ' ' * options[:indent_by] * @opened_tags.size if indent
+ @writer.print "</#{name}>"
+ @writer.close! if @opened_tags.size == 0
+ end
+ end
+ end
+end
View
19 lib/massive_sitemap/builder/index.rb
@@ -0,0 +1,19 @@
+require "massive_sitemap/builder/base"
+
+module MassiveSitemap
+ module Builder
+ class Index < Base
+ HEADER_NAME = 'sitemapindex'
+ HEADER_ATTRIBUTES = {
+ :xmlns => 'http://www.sitemaps.org/schemas/sitemap/0.9'
+ }
+
+ def add_url!(location, attrs = {})
+ tag! 'sitemap' do
+ tag! 'loc', location
+ tag! 'lastmod', attrs[:last_modified].utc.strftime('%Y-%m-%dT%H:%M:%S+00:00') if attrs[:last_modified]
+ end
+ end
+ end
+ end
+end
View
33 lib/massive_sitemap/builder/rotating.rb
@@ -0,0 +1,33 @@
+require "massive_sitemap/builder/base"
+# writer only has print and puts as interface
+
+module MassiveSitemap
+ module Builder
+ class Rotating < Base
+ NUM_URLS = 1..50_000
+
+ def initialize(writer, options = {}, &block)
+ @max_urls = options[:max_per_sitemap] || NUM_URLS.max
+ unless NUM_URLS.member?(@max_urls)
+ raise ArgumentError, %Q(":max_per_sitemap" must be greater than #{NUM_URLS.min} and smaller than #{NUM_URLS.max})
+ end
+
+ super
+ end
+
+ def init!(&block)
+ @urls = 0
+ super
+ end
+
+ def add_url!(location, attrs = {})
+ if @urls >= @max_urls
+ close!
+ init!
+ end
+ super
+ @urls += 1
+ end
+ end
+ end
+end
View
4 lib/big_sitemap/ping.rb → lib/massive_sitemap/ping.rb
@@ -1,4 +1,4 @@
-class BigSitemap
+module MassiveSitemap
class Ping
PING = {
:google => 'http://www.google.comwebmasters/tools/ping?sitemap=%s';
@@ -18,4 +18,4 @@ def self.ping_search_engines(sitemap_uri, engines = [])
end
end
end
-end
+end
View
3  lib/massive_sitemap/version.rb
@@ -0,0 +1,3 @@
+module MassiveSitemap
+ VERSION = "2.0.1"
+end
View
61 lib/massive_sitemap/writer/file.rb
@@ -0,0 +1,61 @@
+require 'fileutils'
+
+# Write into File
+# On rotation, close current file, and reopen a new one
+# with same file name but -<counter> appendend
+#
+# TODO what if file exists?, overwrite flag??
+
+module MassiveSitemap
+ module Writer
+ class File
+ OPTS = {
+ :document_full => '.',
+ }
+
+ attr_reader :options
+
+ def initialize(file_name_template, options = {})
+ @stream_name_template = file_name_template
+ @options = OPTS.merge(options)
+ @stream_names = []
+ end
+
+ def document_full
+ ::File.dirname (@stream_name_template)
+ end
+
+ # API
+ def init!
+ close! if @stream
+ #if File.exists?(file_name)
+ @stream = ::File.open(tmp_file_name, 'w+:ASCII-8BIT')
+ end
+
+ def close!
+ @stream.close
+ @stream = nil
+ # Move from tmp_file into acutal file
+ ::File.delete(file_name) if ::File.exists?(file_name)
+ ::File.rename(tmp_file_name, file_name)
+ @stream_names << file_name
+ end
+
+ def print(string)
+ @stream.print(string)
+ end
+
+ private
+ def file_name
+ cnt = @stream_names.size == 0 ? "" : "-#{@stream_names.size}"
+ ext = ::File.extname(@stream_name_template)
+ ::File.join options[:document_full], @stream_name_template.gsub(ext, cnt + ext)
+ end
+
+ def tmp_file_name
+ file_name + ".tmp"
+ end
+ end
+
+ end
+end
View
21 lib/massive_sitemap/writer/gzip_file.rb
@@ -0,0 +1,21 @@
+require 'zlib'
+
+require "massive_sitemap/writer/file"
+# Write into GZipped File
+
+module MassiveSitemap
+ module Writer
+
+ class GzipFile < File
+ def initialize(file_name_template, options = {})
+ super(file_name_template + ".gz", options)
+ end
+
+ # API
+ def init!
+ super
+ @stream = ::Zlib::GzipWriter.new(@stream)
+ end
+ end
+ end
+end
View
27 lib/massive_sitemap/writer/locking_file.rb
@@ -0,0 +1,27 @@
+require 'zlib'
+
+require "massive_sitemap/writer/file"
+# Create Lock before writing to file
+
+module MassiveSitemap
+ module Writer
+
+ class LockingFile < File
+ LOCK_FILE = 'generator.lock'
+
+ def init!
+ close! if @stream
+ ::File.open(LOCK_FILE, 'w', ::File::EXCL) #lock!
+ super
+ rescue Errno::EACCES => e
+ raise 'Lockfile exists'
+ end
+
+ def close!
+ super
+ FileUtils.rm LOCK_FILE #unlock!
+ end
+ end
+
+ end
+end
View
21 lib/massive_sitemap/writer/string.rb
@@ -0,0 +1,21 @@
+require 'stringio'
+
+# Write into String
+# Perfect for testing porpuses
+
+module MassiveSitemap
+ module Writer
+
+ class String < StringIO
+ attr_reader :options
+
+ # API
+ def init! # do noting
+ end
+
+ def close! # do noting
+ end
+ end
+
+ end
+end
View
24 massive_sitemap.gemspec
@@ -0,0 +1,24 @@
+# -*- encoding: utf-8 -*-
+$:.push File.expand_path("../lib", __FILE__)
+require "massive_sitemap/version"
+
+Gem::Specification.new do |s|
+ s.name = "massive_sitemap"
+ s.version = MassiveSitemap::VERSION
+ s.authors = ["Tobias Bielohlawek"]
+ s.email = ["tobi@soundcloud.com"]
+ s.homepage = "http://github.com/rngtng/massive_sitemap"
+ s.summary = %q{Build painfree sitemaps for webpages with millions of pages}
+ s.description = %q{MassiveSitemap - allows you to generate. Differential updates keeps generation time short and reduces load on DB. It's inspired and party based on BigSitemaps}
+
+ s.rubyforge_project = "massive_sitemap"
+
+ s.files = `git ls-files`.split("\n")
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
+ s.require_paths = ["lib"]
+
+ %w(rspec).each do |gem|
+ s.add_development_dependency *gem.split(' ')
+ end
+end
View
41 spec/big_sitemap_spec.rb
@@ -1,41 +0,0 @@
-require "spec_helper"
-
-describe BigSitemap do
-
- describe "#initalize" do
- it 'fail if no base_url given' do
- expect do
- BigSitemap.generate
- end.to raise_error(ArgumentError)
- end
-
- it 'initalize' do
- expect do
- BigSitemap.generate(:base_url => 'test.de/')
- end.to_not raise_error
- end
- end
-
- describe "#generate" do
- after do
- FileUtils.rm('sitemap.xml') rescue nil
- end
-
- it 'adds url' do
- expect do
- BigSitemap.generate(:base_url => 'test.de/') do
- add "as"
- end
- end.to_not raise_error
- expect = <<-OUT
-<?xml version="1.0" encoding="UTF-8"?>
-<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
- <url>
- <loc>as</loc>
- </url>
-</urlset>
-OUT
- `cat 'sitemap.xml'`.should == expect.strip
- end
- end
-end
View
83 spec/builder_spec.rb
@@ -1,67 +1,73 @@
require "spec_helper"
-describe BigSitemap::Builder do
+require "massive_sitemap/builder/base"
+require "massive_sitemap/writer/string"
+
+describe MassiveSitemap::Builder::Base do
HEADER = %Q(<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">)
+ let(:writer) { MassiveSitemap::Writer::String.new }
+ let(:builder) { MassiveSitemap::Builder::Base.new(writer) }
+
describe "#arguments" do
- it 'fail if writer given' do
+ it 'fail if no writer given' do
expect do
- BigSitemap::Builder.new
+ MassiveSitemap::Builder::Base.new
end.to raise_error(ArgumentError)
end
end
context "in sequence" do
it 'seq: generate basic skeleton, opened' do
- BigSitemap::Builder.new(is = BigSitemap::StringWriter.new)
- is.string.should == HEADER
+ builder
+ writer.string.should == HEADER
end
it 'generate basic skeleton' do
- bs = BigSitemap::Builder.new(is = BigSitemap::StringWriter.new)
- bs.close!
- is.string.should == %Q(#{HEADER}\n</urlset>)
+ builder.close!
+ writer.string.should == %Q(#{HEADER}\n</urlset>)
end
it 'seq: generate one url' do
- bs = BigSitemap::Builder.new(is = BigSitemap::StringWriter.new)
- bs.add_url! 'test'
- bs.close!
- is.string.should == %Q(#{HEADER}\n <url>\n <loc>test</loc>\n </url>\n</urlset>)
+ builder.add_url! 'test'
+ builder.close!
+ writer.string.should == %Q(#{HEADER}\n <url>\n <loc>test</loc>\n </url>\n</urlset>)
end
end
context "as block" do
it 'generate basic skeleton' do
- BigSitemap::Builder.new(is = BigSitemap::StringWriter.new) do
+ MassiveSitemap::Builder::Base.new(writer) do
end
- is.string.should == %Q(#{HEADER}\n</urlset>)
+ writer.string.should == %Q(#{HEADER}\n</urlset>)
end
it 'generate one url' do
- BigSitemap::Builder.new(is = BigSitemap::StringWriter.new) do
+ MassiveSitemap::Builder::Base.new(writer) do
add_url! 'test'
end
- is.string.should == %Q(#{HEADER}\n <url>\n <loc>test</loc>\n </url>\n</urlset>)
+ writer.string.should == %Q(#{HEADER}\n <url>\n <loc>test</loc>\n </url>\n</urlset>)
end
it 'generate one url, no indent' do
- BigSitemap::Builder.new(is = BigSitemap::StringWriter.new, :indent_by => 0) do
+ MassiveSitemap::Builder::Base.new(writer, :indent_by => 0) do
add_url! 'test'
end
- is.string.should == %Q(#{HEADER}\n<url>\n<loc>test</loc>\n</url>\n</urlset>)
+ writer.string.should == %Q(#{HEADER}\n<url>\n<loc>test</loc>\n</url>\n</urlset>)
end
it 'generate two url' do
- BigSitemap::Builder.new(is = BigSitemap::StringWriter.new) do
+ writer.should_receive(:init!).once
+ writer.should_receive(:close!).once
+ MassiveSitemap::Builder::Base.new(writer) do
add_url! 'test'
add_url! 'test2'
end
- is.string.should == %Q(#{HEADER}\n <url>\n <loc>test</loc>\n </url>\n <url>\n <loc>test2</loc>\n </url>\n</urlset>)
+ writer.string.should == %Q(#{HEADER}\n <url>\n <loc>test</loc>\n </url>\n <url>\n <loc>test2</loc>\n </url>\n</urlset>)
end
it 'generate one url with attrs' do
- BigSitemap::Builder.new(is = BigSitemap::StringWriter.new) do
+ MassiveSitemap::Builder::Base.new(writer) do
add_url! 'test', :change_frequency => 'weekly', :priority => 0.8
end
expect = <<-XML
@@ -73,41 +79,54 @@
</url>
</urlset>
XML
- is.string.should == expect.strip
+ writer.string.should == expect.strip
end
end
end
-describe BigSitemap::IndexBuilder do
+####
+
+require "massive_sitemap/builder/index"
+describe MassiveSitemap::Builder::Index do
INDEX_HEADER = %Q(<?xml version="1.0" encoding="UTF-8"?>\n<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n<sitemap>)
+ let(:writer) { MassiveSitemap::Writer::String.new }
+
it 'Index: generate one url' do
- BigSitemap::IndexBuilder.new(is = BigSitemap::StringWriter.new, :indent_by => 0) do
+ MassiveSitemap::Builder::Index.new(writer, :indent_by => 0) do
add_url! 'test'
end
- is.string.should == %Q(#{INDEX_HEADER}\n<loc>test</loc>\n</sitemap>\n</sitemapindex>)
+ writer.string.should == %Q(#{INDEX_HEADER}\n<loc>test</loc>\n</sitemap>\n</sitemapindex>)
end
end
-describe BigSitemap::RotatingBuilder do
+####
+
+require "massive_sitemap/builder/rotating"
+describe MassiveSitemap::Builder::Rotating do
+
+ let(:writer) { MassiveSitemap::Writer::String.new }
+
it 'raises error when max_per_sitemap > MAX_URLS' do
expect do
- BigSitemap::RotatingBuilder.new(BigSitemap::StringWriter.new, :max_per_sitemap => BigSitemap::RotatingBuilder::NUM_URLS.max + 1)
+ MassiveSitemap::Builder::Rotating.new(writer, :max_per_sitemap => MassiveSitemap::Builder::Rotating::NUM_URLS.max + 1)
end.to raise_error(ArgumentError)
end
it 'generates one url' do
- BigSitemap::RotatingBuilder.new(is = BigSitemap::StringWriter.new) do
+ MassiveSitemap::Builder::Rotating.new(writer) do
add_url! 'test'
end
- is.string.should == %Q(#{HEADER}\n <url>\n <loc>test</loc>\n </url>\n</urlset>)
+ writer.string.should == %Q(#{HEADER}\n <url>\n <loc>test</loc>\n </url>\n</urlset>)
end
it 'generates two url' do
- BigSitemap::RotatingBuilder.new(is = BigSitemap::StringWriter.new, :max_per_sitemap => 1) do
+ writer.should_receive(:init!).twice
+ writer.should_receive(:close!).twice
+ MassiveSitemap::Builder::Rotating.new(writer, :max_per_sitemap => 1) do
add_url! 'test'
add_url! 'test2'
end
- is.string.should == %Q(#{HEADER}\n <url>\n <loc>test</loc>\n </url>\n</urlset>#{HEADER}\n <url>\n <loc>test2</loc>\n </url>\n</urlset>)
+ writer.string.should == %Q(#{HEADER}\n <url>\n <loc>test</loc>\n </url>\n</urlset>#{HEADER}\n <url>\n <loc>test2</loc>\n </url>\n</urlset>)
end
-end
+end
View
78 spec/massive_sitemap_spec.rb
@@ -0,0 +1,78 @@
+require "spec_helper"
+
+require "massive_sitemap/writer/file"
+
+describe MassiveSitemap do
+
+ describe "#initalize" do
+ it 'fail if no base_url given' do
+ expect do
+ MassiveSitemap::generate
+ end.to raise_error(ArgumentError)
+ end
+
+ it 'initalize' do
+ expect do
+ MassiveSitemap.generate(:base_url => 'test.de/')
+ end.to_not raise_error
+ end
+ end
+
+ describe "#generate" do
+ let(:output) { `cat 'sitemap.xml'` }
+ let(:output2) { `cat 'sitemap2.xml'` }
+
+ after do
+ FileUtils.rm('sitemap.xml') rescue nil
+ FileUtils.rm('sitemap2.xml') rescue nil
+ end
+
+ it 'adds url' do
+ MassiveSitemap.generate(:base_url => 'test.de') do
+ add "track/name"
+ end
+ output.should include("<loc>http://test.de/track/name</loc>")
+ end
+
+ it 'adds url with root slash' do
+ MassiveSitemap.generate(:base_url => 'test.de/') do
+ add "/track/name"
+ end
+ output.should include("<loc>http://test.de/track/name</loc>")
+ end
+
+ it 'adds url' do
+ MassiveSitemap.generate(:base_url => 'test.de/') do
+ writer = MassiveSitemap::Writer::File.new "sitemap2.xml", @writer.options
+ MassiveSitemap::Builder::Rotating.new(writer, @builder.options) do
+ add "/set/name"
+ end
+ end
+ output2.should include("<loc>http://test.de/set/name</loc>")
+ end
+
+ end
+end
+
+
+describe MassiveSitemap do
+
+ describe ".beauty_url" do
+ URLS = %w(
+ http://test.de/
+ test.de/
+ test.de
+ )
+
+ URLS.each do |url|
+ it "transforms to valid url" do
+ MassiveSitemap.beauty_url(url).should == "http://test.de/"
+ end
+ end
+
+ it "transforms to valid url with https" do
+ MassiveSitemap.beauty_url("https://test.de/").should == "https://test.de/"
+ end
+ end
+
+end
View
2  spec/spec_helper.rb
@@ -1,2 +1,2 @@
$:.unshift File.expand_path("../../lib", __FILE__)
-require "big_sitemap"
+require "massive_sitemap"
View
15 spec/writer_spec.rb
@@ -1,9 +1,11 @@
require "spec_helper"
-describe BigSitemap::FileWriter do
+require "massive_sitemap/writer/file"
+
+describe MassiveSitemap::Writer::File do
let(:file_name) { 'test.txt' }
let(:file_name2) { 'test-1.txt' }
- let(:writer) { BigSitemap::FileWriter.new(file_name).tap { |w| w.init! } }
+ let(:writer) { MassiveSitemap::Writer::File.new(file_name).tap { |w| w.init! } }
after do
FileUtils.rm(file_name) rescue nil
@@ -12,7 +14,7 @@
it 'wrong template' do
file_name = 'test'
- BigSitemap::FileWriter.new(file_name)
+ MassiveSitemap::Writer::File.new(file_name)
end
it 'create file' do
@@ -43,10 +45,11 @@
end
end
-describe BigSitemap::LockingFileWriter do
+require "massive_sitemap/writer/locking_file"
+describe MassiveSitemap::Writer::LockingFile do
let(:file_name) { 'test.txt' }
- let(:lock_file) { BigSitemap::LockingFileWriter::LOCK_FILE }
- let(:writer) { BigSitemap::LockingFileWriter.new(file_name).tap { |w| w.init! } }
+ let(:lock_file) { MassiveSitemap::Writer::LockingFile::LOCK_FILE }
+ let(:writer) { MassiveSitemap::Writer::LockingFile.new(file_name).tap { |w| w.init! } }
after do
FileUtils.rm(file_name) rescue nil
View
10 test/big_sitemap_ping_test.rb
@@ -1,10 +0,0 @@
-require File.dirname(__FILE__) + '/test_helper'
-
-class BigSitemapPingTest < Test::Unit::TestCase
-
- #should ping list of serves
- #should ping one service
- #should ignore unknown
- #should add url into ping url
- #yahaoo: should include yahoo id
-end
View
224 test/big_sitemap_test.rb
@@ -1,224 +0,0 @@
-require File.dirname(__FILE__) + '/test_helper'
-require 'nokogiri'
-
-class BigSitemapTest < Test::Unit::TestCase
- def setup
- delete_tmp_files
- end
-
- def teardown
- delete_tmp_files
- end
-
- should 'raise an error if the :base_url option is not specified' do
- assert_nothing_raised { BigSitemap.new(:base_url => 'http://example.com', :document_root => tmp_dir) }
- assert_raise(ArgumentError) { BigSitemap.new(:document_root => tmp_dir) }
- end
-
- should 'generate the same base URL with :base_url option' do
- options = {:document_root => tmp_dir}
- url = 'http://example.com'
- sitemap = BigSitemap.new(options.merge(:base_url => url))
-
- assert_equal url, sitemap.instance_variable_get(:@options)[:base_url]
- end
-
- should 'should add paths' do
- generate_sitemap do
- add '/', {:last_modified => Time.now, :change_frequency => 'weekly', :priority => 0.5}
- add '/about', {:last_modified => Time.now, :change_frequency => 'weekly', :priority => 0.5}
- end
-
- elems = elements first_sitemap_file, 'loc'
- assert_equal 'http://example.com/', elems.first.text
- assert_equal 'http://example.com/about', elems.last.text
- end
-
- context 'Sitemap index file' do
- should 'contain one sitemapindex element' do
- generate_sitemap { add '/' }
- assert_equal 1, num_elements(sitemaps_index_file, 'sitemapindex')
- end
-
- should 'contain one sitemap element' do
- generate_sitemap { add '/' }
- assert_equal 1, num_elements(sitemaps_index_file, 'sitemap')
- end
-
- should 'contain one loc element' do
- generate_sitemap { add '/' }
- assert_equal 1, num_elements(sitemaps_index_file, 'loc')
- end
-
- should 'contain one lastmod element' do
- generate_sitemap { add '/' }
- assert_equal 1, num_elements(sitemaps_index_file, 'lastmod')
- end
-
- should 'contain two loc elements' do
- generate_sitemap(:max_per_sitemap => 2) do
- 4.times { add '/' }
- end
-
- assert_equal 2, num_elements(sitemaps_index_file, 'loc')
- end
-
- should 'contain two lastmod elements' do
- generate_sitemap(:max_per_sitemap => 2) do
- 4.times { add '/' }
- end
-
- assert_equal 2, num_elements(sitemaps_index_file, 'lastmod')
- end
-
- should 'not be gzipped' do
- generate_sitemap(:gzip => false) { add '/' }
- assert File.exists?(unzipped_sitemaps_index_file)
- end
- end
-
- context 'Sitemap file' do
- should 'contain one urlset element' do
- generate_sitemap { add '/' }
- assert_equal 1, num_elements(first_sitemap_file, 'urlset')
- end
-
- should 'contain several loc elements' do
- generate_sitemap do
- 3.times { add '/' }
- end
-
- assert_equal 3, num_elements(first_sitemap_file, 'loc')
- end
-
- should 'contain several lastmod elements' do
- generate_sitemap do
- 3.times { add '/', :last_modified => Time.now }
- end
-
- assert_equal 3, num_elements(first_sitemap_file, 'lastmod')
- end
-
- should 'contain several changefreq elements' do
- generate_sitemap do
- 3.times { add '/' }
- end
-
- assert_equal 3, num_elements(first_sitemap_file, 'changefreq')
- end
-
- should 'contain several priority elements' do
- generate_sitemap do
- 3.times { add '/', :priority => 0.2 }
- end
-
- assert_equal 3, num_elements(first_sitemap_file, 'priority')
- end
-
- should 'have a change frequency of weekly by default' do
- generate_sitemap do
- 3.times { add '/' }
- end
-
- assert_equal 'weekly', elements(first_sitemap_file, 'changefreq').first.text
- end
-
- should 'have a change frequency of daily' do
- generate_sitemap { add '/', :change_frequency => 'daily' }
- assert_equal 'daily', elements(first_sitemap_file, 'changefreq').first.text
- end
-
- should 'have a priority of 0.2' do
- generate_sitemap { add '/', :priority => 0.2 }
- assert_equal '0.2', elements(first_sitemap_file, 'priority').first.text
- end
-
- should 'contain two loc element' do
- generate_sitemap(:max_per_sitemap => 2) do
- 4.times { add '/' }
- end
-
- assert_equal 2, num_elements(first_sitemap_file, 'loc')
- assert_equal 2, num_elements(second_sitemap_file, 'loc')
- end
-
- should 'contain two changefreq elements' do
- generate_sitemap(:max_per_sitemap => 2) do
- 4.times { add '/' }
- end
-
- assert_equal 2, num_elements(first_sitemap_file, 'changefreq')
- assert_equal 2, num_elements(second_sitemap_file, 'changefreq')
- end
-
- should 'contain two priority element' do
- generate_sitemap(:max_per_sitemap => 2) do
- 4.times { add '/', :priority => 0.2 }
- end
-
- assert_equal 2, num_elements(first_sitemap_file, 'priority')
- assert_equal 2, num_elements(second_sitemap_file, 'priority')
- end
-
- should 'not be gzipped' do
- generate_sitemap(:gzip => false) { add '/' }
- assert File.exists?(unzipped_first_sitemap_file)
- end
- end
-
- context 'sanatize XML chars' do
- should 'should transform ampersands' do
- generate_sitemap { add '/something&else' }
- elems = elements(first_sitemap_file, 'loc')
-
- assert Zlib::GzipReader.open(first_sitemap_file).read.include?("/something&amp;else")
- assert_equal 'http://example.com/something&else', elems.first.text
- end
- end
-
- context 'clean method' do
- should 'be chainable' do
- sitemap = generate_sitemap { add '/' }
- assert_equal BigSitemap, sitemap.clean.class
- end
-
- should 'clean all sitemap files' do
- sitemap = generate_sitemap { add '/' }
- assert Dir["#{sitemaps_dir}/sitemap*"].size > 0, "#{sitemaps_dir} has sitemap files"
- sitemap.clean
- assert_equal 0, Dir["#{sitemaps_dir}/sitemap*"].size, "#{sitemaps_dir} is empty of sitemap files"
- end
- end
-
- context 'sitemap index' do
- should 'generate for all xml files in directory' do
- sitemap = generate_sitemap {}
- File.open("#{sitemaps_dir}/sitemap_file1.xml", 'w')
- File.open("#{sitemaps_dir}/sitemap_file2.xml.gz", 'w')
- File.open("#{sitemaps_dir}/sitemap_file3.txt", 'w')
- File.open("#{sitemaps_dir}/file4.xml", 'w')
- File.open(unzipped_sitemaps_index_file, 'w')
- sitemap.send :generate_sitemap_index
-
- elem = elements(sitemaps_index_file, 'loc')
- assert_equal 2, elem.size #no index and file3 and file4 found
- assert_equal "http://example.com/sitemap_file1.xml", elem.first.text
- assert_equal "http://example.com/sitemap_file2.xml.gz", elem.last.text
- end
-
- should 'generate for all for given file' do
- sitemap = generate_sitemap {}
- File.open("#{sitemaps_dir}/sitemap_file1.xml", 'w')
- File.open("#{sitemaps_dir}/sitemap_file2.xml.gz", 'w')
- files = ["#{sitemaps_dir}/sitemap_file1.xml", "#{sitemaps_dir}/sitemap_file2.xml.gz"]
- sitemap.send :generate_sitemap_index, files
-
- elem = elements(sitemaps_index_file, 'loc')
- assert_equal 2, elem.size
- assert_equal "http://example.com/sitemap_file1.xml", elem.first.text
- assert_equal "http://example.com/sitemap_file2.xml.gz", elem.last.text
- end
- end
-
-
-end
View
48 test/fixtures/test_model.rb
@@ -1,48 +0,0 @@
-class TestModel
- def to_param
- id
- end
-
- def id
- @id ||= TestModel.current_id += 1
- end
-
- def change_frequency
- 'monthly'
- end
-
- def priority
- 0.8
- end
-
- def updated_at
- Time.at(1000000000)
- end
-
- class << self
- def table_name
- 'test_models'
- end
-
- def count_for_sitemap
- self.find_for_sitemap.size
- end
-
- def num_items
- 10
- end
-
- def find_for_sitemap(options={})
- instances = []
- num_times = options.delete(:limit) || self.num_items
- num_times.times { instances.push(self.new) }
- instances
- end
-
- attr_writer :current_id
-
- def current_id
- @current_id ||= 0
- end
- end
-end
View
70 test/test_helper.rb
@@ -1,70 +0,0 @@
-require 'rubygems'
-require 'bundler/setup'
-
-require 'test/unit'
-require 'shoulda'
-require 'mocha'
-require 'test/fixtures/test_model'
-
-require 'big_sitemap'
-
-class Test::Unit::TestCase
-
- #TestHelper
- private
- def generate_sitemap(options={}, &block)
- BigSitemap.generate(options.merge(:base_url => 'http://example.com', :document_root => tmp_dir), &block)
- end
-
- def delete_tmp_files
- Dir["#{sitemaps_dir}/sitemap*"].each do |f|
- FileUtils.rm_rf f
- end
- end
-
- def sitemaps_index_file
- "#{unzipped_sitemaps_index_file}.gz"
- end
-
- def unzipped_sitemaps_index_file
- "#{sitemaps_dir}/sitemap_index.xml"
- end
-
- def unzipped_first_sitemap_file
- "#{sitemaps_dir}/sitemap.xml"
- end
-
- def first_sitemap_file
- "#{sitemaps_dir}/sitemap.xml.gz"
- end
-
- def second_sitemap_file
- "#{sitemaps_dir}/sitemap_1.xml.gz"
- end
-
- def third_sitemap_file
- "#{sitemaps_dir}/sitemap_2.xml.gz"
- end
-
- def sitemaps_dir
- tmp_dir
- end
-
- def tmp_dir
- '/tmp'
- end
-
- def ns
- {'s' => 'http://www.sitemaps.org/schemas/sitemap/0.9'}
- end
-
- def elements(filename, el)
- file_class = filename.include?('.gz') ? Zlib::GzipReader : File
- data = Nokogiri::XML.parse(file_class.open(filename).read)
- data.search("//s:#{el}", ns)
- end
-
- def num_elements(filename, el)
- elements(filename, el).size
- end
-end
Please sign in to comment.
Something went wrong with that request. Please try again.