Split from #34.
In its current form, the indexer will get other links to index from a sitemap and/or RSS feed if it finds the sitemap and/or RSS, but the chances are that it won't find the sitemap and/or RSS feed because there often won't be links to them anywhere.
This would need a new field in the database, and appropriate updates to the admin interface to allow site owners to specify their sitemap and/or RSS. There is also a separate SitemapSpider and XMLFeedSpider in Scrapy that could potentially be used.
Not sure how much more efficient this would make the indexing, but it could make it more predictable/targetted, i.e. make it more likely that the important pages will be indexed before hitting the indexing limits or timeout.
Split from #34.
In its current form, the indexer will get other links to index from a sitemap and/or RSS feed if it finds the sitemap and/or RSS, but the chances are that it won't find the sitemap and/or RSS feed because there often won't be links to them anywhere.
This would need a new field in the database, and appropriate updates to the admin interface to allow site owners to specify their sitemap and/or RSS. There is also a separate SitemapSpider and XMLFeedSpider in Scrapy that could potentially be used.
Not sure how much more efficient this would make the indexing, but it could make it more predictable/targetted, i.e. make it more likely that the important pages will be indexed before hitting the indexing limits or timeout.