Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subdomain routing support? #43

Closed
kjvarga opened this issue Sep 19, 2011 · 14 comments
Closed

subdomain routing support? #43

kjvarga opened this issue Sep 19, 2011 · 14 comments

Comments

@kjvarga
Copy link
Owner

kjvarga commented Sep 19, 2011

Chris writes:

I've been using sitemap_generator for a while now, and I like it.
I recently changed my Rails app from a traditional subdirectory routing architecture to a subdomain-based routing (like Basecamp and others. See 37-Signals: How to do Basecamp-style subdomains in Rails)
I was wondering if you would consider adding support for generating sitemaps for subdomain-based routing. I realize that subdomains kind of defeat the purpose of sitemaps.

Thoughts?

@kjvarga
Copy link
Owner Author

kjvarga commented Sep 19, 2011

Ok so your user links would be like:

karl.xyz.com
karl.xyz.com/profile
karl.xyz.com/posts

or whatever. When generating your sitemap you can actually set the host on a per-link basis, so you could do something like:

SitemapGenerator.create do
  User.find_each do |user|
    add '/', :host => "#{user.username}.xyz.com"
    add user_profile_path(user), :host => "#{user.username}.xyz.com"
    add user_posts_path(user), :host => "#{user.username}.xyz.com"
  end
end

It would be nice to have a :url option so you could just do:
add :url => user_posts_url(user)
Or I could just detect that the URL already has a host and not use the default_host in that case. Then it would just be:
add user_posts_url(user)

Is that what you were looking for?

@hurl
Copy link

hurl commented Sep 19, 2011

No, the links would be for subdomains. For example:

subdomain.example.com
company-1.example.com
company-2.example.com

and so on, where "example.com" is the root domain.

So, the host would remain the same. What would be needed is the ability to add the subdomain. Such as:

Subdomain.find_each do |sd|
  SitemapGenerator.create "#{sd}.{host}"
end

I think you would have to create a sitemap for each subdomain, as per Google's guidelines.

@kjvarga
Copy link
Owner Author

kjvarga commented Sep 19, 2011

Can you take a look at this thread and see if it helps? #24

From what I understand you just need to create a sitemap for each domain, so you can set your default_host to each full domain e.g. http://subdomain.example.com and add links to it like you would any other sitemap.

@kjvarga
Copy link
Owner Author

kjvarga commented Sep 19, 2011

See if something like this works...

%w[en fr ru].each do |domain|
  SitemapGenerator::Sitemap.sitemaps_path = "#{domain}/"
  SitemapGenerator::Sitemap.default_host = "http://www.#{domain}.example.com"
  SitemapGenerator::Sitemap.create do
    add '/whatever'
  end
end

@hurl
Copy link

hurl commented Sep 19, 2011

This works, partially. The individual site maps are generated correctly, but the index only lists one of the site maps. BTW, this seems to be the problem that user chamnap had, and it does not seem to have been resolved.

Here is my code:

SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
SitemapGenerator::Sitemap.include_index = false

Listing.find_each do |listing|
  SitemapGenerator::Sitemap.default_host = "https://#{listing.subdomain}.mysite.com"
  SitemapGenerator::Sitemap create do
    add ''
  end
end

@kjvarga
Copy link
Owner Author

kjvarga commented Sep 19, 2011

Ok try this with v2.1.1. There were a couple issues. First is that what was happening is the index file was being overwritten. So we have to generate the sitemaps into separate folders or using different names. Also I fixed some issues with multiple calls to create() in a single sitemap config.

SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
# SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new # just generate into tmp/
# SitemapGenerator::Sitemap.include_index = false # turned off for you in v2.1.1

%w(google yahoo apple).each do |subdomain|
  SitemapGenerator::Sitemap.default_host = "https://#{subdomain}.mysite.com"
  SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/#{subdomain}"
  SitemapGenerator::Sitemap.create do
    add '/home'
  end
end

Now works as expected and produces:

+ sitemaps/google/sitemap1.xml.gz             2 links /  822 Bytes /  328 Bytes gzipped
+ sitemaps/google/sitemap_index.xml.gz          1 sitemaps /  389 Bytes /  217 Bytes gzipped
Sitemap stats: 2 links / 1 sitemaps / 0m00s
+ sitemaps/yahoo/sitemap1.xml.gz             2 links /  820 Bytes /  330 Bytes gzipped
+ sitemaps/yahoo/sitemap_index.xml.gz          1 sitemaps /  388 Bytes /  217 Bytes gzipped
Sitemap stats: 2 links / 1 sitemaps / 0m00s
+ sitemaps/apple/sitemap1.xml.gz             2 links /  820 Bytes /  330 Bytes gzipped
+ sitemaps/apple/sitemap_index.xml.gz          1 sitemaps /  388 Bytes /  214 Bytes gzipped
Sitemap stats: 2 links / 1 sitemaps / 0m00s

Check out the namer options if you would rather generate all files in the root of the directory.

@hurl
Copy link

hurl commented Sep 21, 2011

Hi Karl, sorry for the delay getting back to you on this.

This solution works, and is better. I haven't tried "namer," but it sounds like that will allow me to have all of the sitemaps in a single directory.

The main thing I would like to see is a single index file that points to all of the sitemap files.

Ideally, it would like like this

aws bucket
      |
      stuff (currently I have many image files here)
      sitemaps directory
           |
           index (single file containing all of the sitemap addresses)
           site maps (any number. I need thousands now, with the ability to scale much larger)

@kjvarga
Copy link
Owner Author

kjvarga commented Sep 22, 2011

Ok yeah I wasn't sure about how you intended to structure your sitemaps (everyone seems to need to do it differently :)

The only issue with having all the sitemaps using a single index file is that according to the sitemap specs, all links in the sitemap(s) should have the same domain.

There is a way to do it using the group feature, which would have been perfect but there's an issue with the evaluation scope within create() that is an issue in this case.

If you don't care about separating each domain into it's own file, then you can just add all the links to the sitemap as per usual. I'll see if I can fix this scoping issue.

@kjvarga
Copy link
Owner Author

kjvarga commented Sep 22, 2011

Good news, there was no problem using groups :D

SitemapGenerator::Sitemap.verbose = true
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"

SitemapGenerator::Sitemap.create do
  %w(google yahoo apple).each do |subdomain|
    group(:filename => subdomain, :default_host => "https://#{subdomain}.mysite.com") do
      add '/home'
    end
  end
end

+ sitemaps/google1.xml.gz             1 links /  676 Bytes /  308 Bytes gzipped
+ sitemaps/yahoo1.xml.gz             1 links /  675 Bytes /  311 Bytes gzipped
+ sitemaps/apple1.xml.gz             1 links /  675 Bytes /  310 Bytes gzipped
+ sitemaps/sitemap_index.xml.gz          3 sitemaps /  549 Bytes /  232 Bytes gzipped
Sitemap stats: 3 links / 3 sitemaps / 0m00s

@hurl
Copy link

hurl commented Sep 22, 2011

Thanks again for your quick response.

Are you sure that you may not include subdomains from the same root (host) domain in the same index? According to the spec,

**Note**: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.example.com or http://yourhost.yoursite.com.

All of my subdomains are hosted from the same root domain (a single Heroku app).

In the end, it does not really matter; I can have an index for each subdomain, if that is necessary. My robot.txt file will have to grow to accommodate.

@kjvarga
Copy link
Owner Author

kjvarga commented Sep 22, 2011

Yeah reading it again it's a bit confusing because they compare www.yoursite.com to yourhost.yoursite.com.

This post seems to say it's possible: http://www.google.com/support/forum/p/Webmasters/thread?fid=5ba122cf102db3c500046c02075d9f80&tid=5ba122cf102db3c5&hl=en. You just have to prove ownership of each subdomain by adding the Sitemap line to the robots.txt file for each subdomain. So I guess that would point to your main sitemap index. Seems pretty simple since all the robots.txt files would then be the same? You just have to make sure it's accessible on each subdomain.

Keep me posted on how it works out.

@hurl
Copy link

hurl commented Sep 22, 2011

Karl,

That last construct did not work at all. It produced a single index inside which all the urls were mangled. The approach that is working best for me is:

SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
SitemapGenerator::Sitemap.include_index = false

index = 1
Listing.active_set.find_each do |listing|
  SitemapGenerator::Sitemap.default_host = "https://#{listing.subdomain}.mysite.com"
  SitemapGenerator::Sitemap.filename = ('sitemap_' + index.to_s).to_sym
  SitemapGenerator::Sitemap.create do
  end
  index += 1
end

This is very close, and is could work. The sitemaps and index file contents are correct, and they are all together in a single sitemaps directory, like so:

sitemaps
     |
     sitemap_11.xml.gz
     sitemap_1_index.xml.gz
     sitemap_21.xml.gz
     sitemap_2_.index.xml.gz
     sitemap_31.xml.gz
     sitemap_3_index.xml.gz
        "
        "
     and so on...

The problem now is the naming convention for the sitemaps themselves, with the 1 appended. What I'd like to be able to
do is override the name of the sitemap. I've tried the namer method, but can not get it to work.

Bottom line: this will work for me as-is. Getting the namer method to work would be icing on the cake.

@kjvarga
Copy link
Owner Author

kjvarga commented Sep 23, 2011

I can give you an example of using a namer, but it would help if you let me know how you want to name them.

Also, how were the URLs "mangled" in the index from the group example I posted above?

If I run exactly this:

SitemapGenerator::Sitemap.verbose = true
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"

SitemapGenerator::Sitemap.create do
  %w(google yahoo apple).each do |subdomain|
    group(:filename => subdomain, :default_host => "https://#{subdomain}.mysite.com") do
      add '/home'
    end
  end
end

My index looks like this:

<?xml version="1.0" encoding="UTF-8"?><sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><sitemap><loc>https://s3.amazonaws.com/mysite/sitemaps/google1.xml.gz</loc></sitemap><sitemap><loc>https://s3.amazonaws.com/mysite/sitemaps/yahoo1.xml.gz</loc></sitemap><sitemap><loc>https://s3.amazonaws.com/mysite/sitemaps/apple1.xml.gz</loc></sitemap></sitemapindex>

@kjvarga
Copy link
Owner Author

kjvarga commented Sep 23, 2011

So there was a small bug in the code when both the filename and sitemaps_namer options are used. That's probably why you had issues. It's fixed in v2.1.3.

Here's an example using the namer. Working under 2.1.3. You could use the listing.id in place of i when you generate your sitemaps.

SitemapGenerator::Sitemap.verbose = true
SitemapGenerator::Sitemap.sitemaps_host = "https://s3.amazonaws.com/mysite/"
SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/"

i = 0
%w(google yahoo apple).each do |subdomain|
  basename = "sitemap#{i+=1}"
  SitemapGenerator::Sitemap.create(
      :default_host   => "https://#{subdomain}.mysite.com",
      :filename       => basename,
      :sitemaps_namer => SitemapGenerator::SitemapNamer.new("#{basename}_")) do
  end
end

+ sitemaps/sitemap1_1.xml.gz             1 links /  671 Bytes /  305 Bytes gzipped
+ sitemaps/sitemap1_index.xml.gz          1 sitemaps /  384 Bytes /  212 Bytes gzipped
Sitemap stats: 1 links / 1 sitemaps / 0m00s
+ sitemaps/sitemap2_1.xml.gz             1 links /  670 Bytes /  308 Bytes gzipped
+ sitemaps/sitemap2_index.xml.gz          1 sitemaps /  384 Bytes /  213 Bytes gzipped
Sitemap stats: 1 links / 1 sitemaps / 0m00s
+ sitemaps/sitemap3_1.xml.gz             1 links /  670 Bytes /  307 Bytes gzipped
+ sitemaps/sitemap3_index.xml.gz          1 sitemaps /  384 Bytes /  212 Bytes gzipped
Sitemap stats: 1 links / 1 sitemaps / 0m00s

@kjvarga kjvarga closed this as completed Sep 23, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants