Skip to content
This repository has been archived by the owner on Aug 26, 2022. It is now read-only.

robots.txt edits #3884

Merged
merged 4 commits into from Jun 9, 2016
Merged

Conversation

dchukhin
Copy link
Contributor

@dchukhin dchukhin commented Jun 6, 2016

@dchukhin dchukhin changed the title 50 list of pages being indexed robots.txt edits Jun 6, 2016
@codecov-io
Copy link

codecov-io commented Jun 6, 2016

Current coverage is 85.83%

Merging #3884 into master will not change coverage

@@             master      #3884   diff @@
==========================================
  Files           144        144          
  Lines          8566       8566          
  Methods           0          0          
  Messages          0          0          
  Branches       1136       1136          
==========================================
  Hits           7353       7353          
  Misses          977        977          
  Partials        236        236          

Powered by Codecov. Last updated by f50bffe...9b6bdda

@stephaniehobson stephaniehobson self-assigned this Jun 7, 2016
Disallow: /*docs*$vote
Disallow: /*docs.json
Disallow: /*preview-wiki-content
Disallow: /*docs/ckeditor_config.js
Disallow: /*feed*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule needs to be more precise as it will also block articles like https://developer.mozilla.org/en-US/Firefox/Releases/2/Adding_feed_readers_to_Firefox

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good 👀

@stephaniehobson
Copy link
Contributor

Thanks for putting it in alphabetical order! A few rules need to be made more precise.

Disallow: /*search*
Disallow: /skins
Disallow: /*type=feed
Disallow: /*users*signin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to block everything under users according to the spreadsheet.

@dchukhin
Copy link
Contributor Author

dchukhin commented Jun 7, 2016

@stephaniehobson thanks for the review; should be ready for re-review

@stephaniehobson
Copy link
Contributor

This works well for our regular documents and is an improvement over what we have so I'm going to merge. 👍

When I compared some of the test URLs to the file though I discovered that all the $ending urls are not blocked if they're under a zone (yay, another reason for us to stop using zones).

@dchukhin can you do a bit of research to see if the end of the URL matching syntax I saw mentioned on one blog is well supported and, if so, submit another PR using it with the $endings where appropriate (It doesn't look appropriate for $revisions for example). If it's not well supported and you don't recommend switching just comment here.

Example:

Disallow: /*$history$

Thanks.

@stephaniehobson stephaniehobson merged commit cdae3c4 into mdn:master Jun 9, 2016
@dchukhin
Copy link
Contributor Author

I have looked around and based on here and here it seems that some crawlers (Google, and a few other major ones) support using a $ for the end of the url. I'm not sure if that's enough for what we want or not, since Google is the most popular, but certainly not the only one. Thoughts? @stephaniehobson

@stephaniehobson
Copy link
Contributor

@dchukhin Thanks for doing that research. Google is what we're most concerned about.

I've done a little more thinking on this and I think we can block zones by making the current rules less specific. Could you please change the rules looking for /*docs*$ending to not have the docs directory in them? I'll send an email about this too because git notifications are easy to miss.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants