New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added some meta[name=robots] markup to handler crawlers behavior (fix #771) #777
Added some meta[name=robots] markup to handler crawlers behavior (fix #771) #777
Conversation
udata/templates/macros/metadata.html
Outdated
@@ -12,6 +12,7 @@ | |||
<meta name="description" content="{{ description }}" /> | |||
<meta property="og:description" content="{{ description }}" /> | |||
<meta property="og:image" content="{{ image }}" /> | |||
{% if meta.robots %}<meta name="robots" content="{{ meta.robots }}">{% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot we merge that with DISALLOW_INDEXING
a few lines above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged: DISALLOW_INDEXING
now overrides by page setting.
'title': _('%(topic)s datasets', topic=topic.name), | ||
'description': _("%(site)s %(topic)s related datasets", site=config['SITE_TITLE'], topic=topic.name), | ||
'keywords': [_('search'), _('datasets'), _('topic')] + topic.tags, | ||
'robots': 'noindex', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about deindexing that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The topic itself is indexed, this is only the topic datasets listing which is not indexed
udata/templates/user/base.html
Outdated
'keywords': [_('user'), _('profile')], | ||
'robots': 'noindex', | ||
} %} | ||
|
||
{% block extra_head %} | ||
{{ super() }} | ||
<meta name="robots" content="noindex,follow"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove that one?
That's the moment I wonder if we should instead whitelist pages that we want to index. Pros:
Cons:
I think we already discussed that but at least if there are counter-arguments we can refer to that discussion later. Thoughts? |
Given the fact this an open data portal, I prefer the open by default.
The cons I see with a whitelist:
|
udata/templates/macros/metadata.html
Outdated
<meta name="description" content="{{ description }}" /> | ||
{% if config.DISALLOW_INDEXING %}<meta name="robots" content="noindex,nofollow" /> | ||
{% elif meta.robots %}<meta name="robots" content="{{ meta.robots }}"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing end /
to be consistent?
Alright, let's keep it indexed by default. |
This PR adds support for an optionnal
meta[name=robots]
in templatesand define some nofollow/noindex on some pages to handle crawlers behavior:
As a side effect, some pages that where missing metadata (title, description...) gains at least a title and a description (for proper preview when sharing on social network)