Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robots.txt to control indexing by Internet search engines #6449

Closed
kgjenkins opened this issue Jan 12, 2021 · 4 comments · Fixed by #8121
Closed

robots.txt to control indexing by Internet search engines #6449

kgjenkins opened this issue Jan 12, 2021 · 4 comments · Fixed by #8121

Comments

@kgjenkins
Copy link
Sponsor

kgjenkins commented Jan 12, 2021

Description

When doing general Internet searches for QGIS-related information, the results often include multiple results from different versions of the QGIS documentation, including old 2.x versions, which makes it more difficult to find the current docs.

For example, the first 3 results for this duckduckgo.com search for qgis reprojecting are the the 2.8, "latest", and 3.4 versions of the same lesson from the training manual.

The situation is worse with Google, since the results no longer shows the full URL and omits the version information. In this example search, the 2.8 docs show up first:
image

Do we want to use a robots.txt file to limit search engine web-crawlers to only the latest versions of the docs?

At a minimum, I think it would be good to block crawlers from indexing the old 2.x versions of the docs, which do have a notice at the top "Outdated version of the documentation. Find the latest one here." but the link only goes to the main page of the current docs, not to the specific article.

I'm less concerned about old 3.4 docs, since those pages have a "This documentation is for a QGIS version which has reached end of life. Visit the latest version instead." note that links directly to the latest version of the specific article. Plus, I think we should be careful, and not be too aggressive in blocking crawlers from old versions, as it may take some time for newer docs to gain the same relevance ranking in various search engines.

@DelazJ
Copy link
Collaborator

DelazJ commented Jan 13, 2021

Yes, That is an old issue; But i'm happy to see that recent versions are now in top 3, which was not the case one or two years ago. We have set up a canonical url when releasing 3.4, that points to a moving target like latest. I don't know if that helped.

The missing version in Google result is problematic imho. Is there something we can/need configure on our side?
Otherwise, I have no knowledge in this area so would let experienced people decide.

Side request: In an old version, if people manually change the version number in the url, they should fall on an updated or redirected page but, I think most of the times, they will click on the link in the message which brings to the docs main page. @jef-n (just because you added the warning at the top of 2.x pages) do you think it's possible to append the {language}/{ pagename} to the suggested link so that they are also redirected to a more relevant page?

@kgjenkins
Copy link
Sponsor Author

Do people still find it helpful for 2.x docs to show up in Google searches? (I don't but maybe others do.)

I'm not sure that anything can be done about the pseudo-breadcrumbs that have now replaced what used to be the full URL in Google search results. Although I did just send feedback to Google about this...

I initially thought it might help to put the version number in the page title, like replacing

<title>6.1. Lesson: Reprojecting and Transforming Data &mdash; QGIS Documentation  documention</title>

with:

<title>6.1. Lesson: Reprojecting and Transforming Data &mdash; QGIS 3.16 Documentation  documentation</title>

but the problem is that the Google clips the page title after about 50-60 characters, so the version would get clipped in the example above.

@tjukanovt
Copy link
Contributor

I also think this is a major issue regarding the documentation, so +1 for any ideas in how to fix this. I encounter this almost weekly and hear it also from other QGIS users. Hard to think of a case where someone needs the old (2.x) documentation. Could we "hide" it somewhere?

@pathmapper
Copy link
Contributor

#8121 would include an HTML meta tag description with the aim that search engines are using it's content with the QGIS version as snippet for search results.

We could also add a robots meta tag to prevent that search engines are indexing outdated docs -> #8121 (review).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants