add robots.txt to endpoint #7

lockefox · 2017-04-19T17:33:52Z

In an effort to subdue crawlers, add a robots.txt to publicAPI/static

http://stackoverflow.com/questions/14048779/with-flask-how-can-i-serve-robots-txt-and-sitemap-xml-as-static-files

Notes:

need to add publicAPI/static to MANIFEST.in
may need to add publicAPI/static/* to package_data in setup.py
Any new endpoints will need to add tests to test_crest_endpoint.py

The text was updated successfully, but these errors were encountered:

lockefox · 2017-04-19T17:34:02Z

Bots to block

How can I block MJ12bot?

MJ12bot adheres to the robots.txt standard. If you want the bot to prevent website from being crawled then add the following text to your robots.txt:

User-agent: MJ12bot
Disallow: /
Please do not block our bot via IP in htaccess - we do not use any consecutive IP blocks as we are a community based distributed crawler. Please always make sure the bot can actually retrieve robots.txt itself. If it can't then it will assume that it is okay to crawl your site.

If you have reason to believe that MJ12bot did NOT obey your robots.txt commands, then please let us know via email: bot@majestic12.co.uk. Please provide URL to your website and log entries showing bot trying to retrieve pages that it was not supposed to.

Testing/validating robots.txt functionality
https://docs.python.org/3.5/library/urllib.robotparser.html

kanelarrete · 2017-04-19T23:24:45Z

To keep multiple bots from crawling the site, you can follow this.
`This example tells all robots to stay out of a website:

User-agent: *
Disallow: /

This is taken from https://en.wikipedia.org/wiki/Robots_exclusion_standard#About_the_standard

To protect the API only, it could be setup as
User-agent: * Disallow: /CREST/

lockefox · 2017-04-20T18:30:13Z

Robots.txt functionality should also be added to ProsperCookiecutter for Flask projects

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add robots.txt to endpoint #7

add robots.txt to endpoint #7

lockefox commented Apr 19, 2017 •

edited

Loading

lockefox commented Apr 19, 2017 •

edited

Loading

kanelarrete commented Apr 19, 2017 •

edited

Loading

lockefox commented Apr 20, 2017

add robots.txt to endpoint #7

add robots.txt to endpoint #7

Comments

lockefox commented Apr 19, 2017 • edited Loading

lockefox commented Apr 19, 2017 • edited Loading

kanelarrete commented Apr 19, 2017 • edited Loading

lockefox commented Apr 20, 2017

lockefox commented Apr 19, 2017 •

edited

Loading

lockefox commented Apr 19, 2017 •

edited

Loading

kanelarrete commented Apr 19, 2017 •

edited

Loading