-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sitemaps with less than 50,000 urls still use sitemap index #60
Comments
I read up on sitemap submissions and it seems like this is okay to do. So I can add an option for it. Something like |
kjvarga, The (or at least my) preferred way would be to default to no sitemap index, unless you have more than 50,000 pages, then it automatically creates an index (like dynamic_sitemaps) Of course you could always add a You could even allow users to specify less than 50,000 the pagination like dynamic_sitemaps does. |
I'm not a fan of making this the default mainly because you would have to modify your robots.txt file to point to either the sitemap or the sitemap index depending on how many links you have. Most webmasters I think simply don't care if there is the extra file when they have < 50 000 urls and if you have dynamic numbers of links it's quite feasible to go back and forth below and above 50000 urls which means you would have to constantly change your robots.txt file. So for those reasons I would like to keep it as an option. |
I guess I forgot about the very important aspect of legacy support. I was thinking that it would just be /sitemap.xml.gz as the sitemap if <= 50,000 pages and /sitemap.xml.gz would be the index if it was more than 50,000. You could keep the _index.html as default for legacy support... |
Heya, Thank you for creating such a great plugin, btw =). Best,
|
I also would like option to not create index files |
Ok I'll get to this hopefully this week. I'm thinking: SitemapGenerator::Sitemap.create_index = (:auto, :always, :never, true, false) :always and true are the same thing. :auto means create as needed. :never suppresses index creation altogether. I'm just not decided on what false should be. I think it should be the same as :auto i.e. don't create until needed. |
Sure, that works for me. Maybe all you really need is? true - Forces an index (current / default) |
Just pinging here to see if there's any update... Also, thanks for taking my PR on autoplay (and cleaning it up). All warnings gone for me... |
- Tweak finalisation: - add a check that we don't finalise an already finalised sitemap in the SitemapIndexFile - In LinkSet don't try to add default links to a finalised sitemap.
Hey Eric, I've released the feature in v3.3. Thanks again for generously paying for this feature! I'm sure everyone that uses it will appreciate that. I've added a shout-out to you on the readme. And I'm glad that you are enjoying using the gem! The naming convention hasn't changed. So the first sitemap will be sitemap1.xml.gz. However, I could add a simple namer that will generate sitemap.xml.gz, sitemap1.xml.gz etc but I wanted to get the main stuff out first. |
Soo... you're going to hate me. But now I'm going into client mode. This wasn't 100% how I envisioned and doesn't quite accomplish my goals due to the naming. I need a static place that I can always point my robots / 302 (which I use on my root to point to where they are on my s3 bucket) to where I don't need to worry if I'm above or below 50k. So I envisioned create_sitemap = :auto to create a "sitemap.xml.gz" for its first sitemap. If we get more than 50k, that sitemap.xml.gz would become the index and point to the sitemap[1-9]+.xml.gz and so on... The problem with status quo is I still need to know where I'm pointing, /sitemap1.xml.gz or /sitemap_index.xml.gz |
Damn, you had to go there :/ Yeah that bit is more tricky. But I'm glad that I broke them into separate chunks because they are conceptually different features. Thinking about it I really like the idea though. In fact I would like it to be the default for the next major version. That way you always point to sitemap.xml.gz, no matter how many links your site has. I can see a way to do it, but it is tricky because of the way things happen now. The index has to be written out last. And currently each sitemap is written out as it becomes full so that it doesn't occupy memory. What would have to happen is we need to delay writing out the first sitemap until we know with certainty whether there will be two sitemaps (or more, but two is enough to trigger the index). Then there's the issue that the index and the sitemaps will share the same namer instance. So I'm thinking there needs to be another configuration option that indicates that this naming convention is being used (any ideas of what to call this convention? 'zero-based' or 'simple' is all that comes to mind). However, whether or not the special naming convention is being used it doesn't hurt to adopt the same logic for all cases, which simplifies things a bit. It just means that the first sitemap sits in memory for a teeny bit longer. So I think the sitemap index will need to keep an object reference to the sitemaps added to it, at least the first one. When those sitemaps become full they are not automatically written out. They are 'finalized' and added to the index but not written out. When the index has more than one sitemap, it reserves the first name, assigns the other names to the added sitemaps and writes them out. From then on new sitemaps get names and are written out when full as per usual. Then when the index is finalized there needs to be some handling of the different values for create_index. So there's keeping track of sitemaps added to the index and keeping track of whether a name has been reserved. As well as delaying writing out of some sitemaps, so some tracking of whether writing has occurred. And a new namer instance to handle the new sequence. |
Okay, so this is sounding like a 4.0 release. I think :simple could work. Since it's looking like it's not happening until the next major release, you could also just make this the new default and put a massive bold warning in the readme. That way, doesn't matter what the name of the configuration option is. No one will ever be specifying it... And semver clearly states you're allow to do this... I guess just give me an estimate on when you think you'll be able to do this. And hope I didn't ruin your vacation... |
Hey man, i've made some progress on this already. I think I'll be done in a week or so, haven't got too much on the go. I'll probably get out a prerelease and then we can see where we're at. Adding unit tests will take a bit longer. |
Can't wait to see it. Just holding off on some of my multi-sitemaps until it's released. Thanks again! |
Okay try out 4.0.alpha. You'll need these two config settings:
From my testing it seems to work as it should! I've not made these the defaults yet, just want to get it out and get some more people trying it out. |
It works! Beautifully!! |
Hi guys, the 4.0 release is finally out! It was more work than I anticipated making this change official. I also want to feel confident that it was working correctly and that I had covered all the cases. I think I have and it feels like it does the intuitive thing. So @ehoch you can remove the Also the ability to turn off index creation opens up some opportunities for more complex setups where one wants to generate sitemaps without affecting the index. The index can even be built "manually" by adding links to it using |
When I create a sitemap with less than 50,000 urls, it should not use a sitemap index file. I personally run multiple sitemaps for my sites, including a small news-only one, and the idea of a sitemap index file for a sitemap with 20 urls seems silly. Other than that I love this project and think it will be perfect.
Could anyone spearhead fixing? Willing to donate for this!
The text was updated successfully, but these errors were encountered: