Permalink
Browse files

[FIX] website_blog: Saner crawler instructions

The fixes that this patch includes are:

- The tag URL for tags printed in short and expanded blog posts had no slug, thus producing a brand new URL. Now they have the slug.
- The `<a>` elements in right column archives have now `rel="nofollow"`.
- The `<a>` elements in right column tag cloud, where more than 1 tag is used, have now `rel="nofollow"`.

Now, to know why, imagine a website where there is:

- 1 blog post per week.
- For the last 3 years.
- Using 100 different tags.
- The website admin has activated tags cloud, archives, and tags per post.

With behavior prior to this patch:

1. There have been 3*52=156 blog posts.
2. There will exist 156/20=7.8 pages of posts.
3. There will exist 12*3=36 untagged archive links.
4. There will exist 100 single-tag links which don't use the slug (only the ID).
5. There will exist 100 single-tag links with the slug.
6. There will exist 100^100=1×10²⁰⁰ multi-tag links, all with slug.
7. Summarizing last 3 points, the crawler will have to gather [100+100^100^36=Infinity][1] pages that will only add duplicated content.

The result of this was:

- If your site is interesting enough, crawlers will probably eat all your CPU resources and backend users will notice a big lag.
- Crawlers punish duplicated content, so you'd get infinit pages that penalize you.
- Weight of tags becomes exponential.

With current patch:

- All tag links contain a slug.
- All tag links with more than 1 tag are marked as `nofollow`, so an obedient crawler would only index single-tag-with-slug pages, which actually enhace SEO.
- All links with dates are not followed. The crawler will get to the post content via paginator, and will index the posts themselves, which is what actually matters.
- Adding a tag still has crawler cost, but linear to the amount of content in such tag.

[1]: https://duckduckgo.com/?q=100%2B100%5E100%5E36&ia=calculator
  • Loading branch information...
Yajo committed Feb 5, 2019
1 parent 8cb2965 commit aa544c2cb5acd0283ccca72f9b0ff3ae3bce67f6
Showing with 7 additions and 4 deletions.
  1. +3 −0 addons/website_blog/controllers/main.py
  2. +4 −4 addons/website_blog/views/website_blog_templates.xml
@@ -110,6 +110,9 @@ def blog(self, blog=None, tag=None, page=1, **opt):
# retrocompatibility to accept tag as slug
active_tag_ids = tag and map(int, [unslug(t)[1] for t in tag.split(',')]) or []
if active_tag_ids:
fixed_tag_slug = ",".join(map(slug, request.env['blog.tag'].browse(active_tag_ids)))
if fixed_tag_slug != tag:
return request.redirect(request.httprequest.full_path.replace("/tag/%s/" % tag, "/tag/%s/" % fixed_tag_slug, 1), 301)
domain += [('tag_ids', 'in', active_tag_ids)]
if blog:
domain += [('blog_id', '=', blog.id)]
@@ -161,7 +161,7 @@

<!-- To display tags //no options -->
<t t-foreach="blog_post.tag_ids" t-as="one_tag">
<a class="mr8" t-attf-href="#{blog_url(tag=one_tag.id, date_begin=False, date_end=False)}" t-esc="one_tag.name"/>
<a class="mr8" t-attf-href="#{blog_url(tag=slug(one_tag), date_begin=False, date_end=False)}" t-esc="one_tag.name"/>
</t>
<div class="o_sharing_links">
<a class="fa fa-twitter-square o_twitter"></a>
@@ -245,7 +245,7 @@
<p class="post-meta text-muted text-center" t-if="len(blog_post.tag_ids)">
<span class="fa fa-tags"/>
<t t-foreach="blog_post.tag_ids" t-as="one_tag">
<a class="label label-primary mr8" t-attf-href="#{blog_url(tag=one_tag.id)}" t-esc="one_tag.name"/>
<a class="label label-primary mr8" t-attf-href="#{blog_url(tag=slug(one_tag))}" t-esc="one_tag.name"/>
</t>
</p>
<div t-if="'cover_full' in blog_post_cover_properties.get('resize_class', '')" id="blog_angle_down">
@@ -433,12 +433,12 @@
<t t-foreach="nav_list[year]" t-as="months">
<t t-if="months['date_begin'] == date">
<li class="active">
<a t-ignore="True" t-attf-href="#{blog_url(date_begin=False, date_end=False)}"><t t-esc="months['month']"/><span class="pull-right badge" t-esc="months['post_date_count']"/></a>
<a t-ignore="True" rel="nofollow" t-attf-href="#{blog_url(date_begin=False, date_end=False)}"><t t-esc="months['month']"/><span class="pull-right badge" t-esc="months['post_date_count']"/></a>
</li>
</t>
<t t-else="1">
<li>
<a t-ignore="True" t-attf-href="#{blog_url(date_begin=months['date_begin'], date_end=months['date_end'])}"><t t-esc="months['month']"/><span class="pull-right badge" t-esc="months['post_date_count']"/></a>
<a t-ignore="True" rel="nofollow" t-attf-href="#{blog_url(date_begin=months['date_begin'], date_end=months['date_end'])}"><t t-esc="months['month']"/><span class="pull-right badge" t-esc="months['post_date_count']"/></a>
</li>
</t>
</t>

0 comments on commit aa544c2

Please sign in to comment.