You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
User-agent: *
# Start by disallowing everything.
Disallow: /
# Some specific things are okay, though.
Allow: /$
Allow: /hosting
Allow: /p/*/adminIntro
# Query strings are hard. We only allow ?id=N, no other parameters.
Allow: /p/*/issues/detail?id=*
Disallow: /p/*/issues/detail?id=*&*
Disallow: /p/*/issues/detail?*&id=*
# 10 second crawl delay for bots that honor it.
Crawl-delay: 10
Expect that complex robot.txt files are parsed and matched correctly by the wayback machine.
The text was updated successfully, but these errors were encountered:
Java Wayback's robots.txt parser doe's not understand wildcard and Allow: directives.
URL reported now plays back on web.archive.org (uses new robots.txt parser).
Navigate to: https://web.archive.org/web/http://bugs.chromium.org/p/project-zero/issues/detail?id=1139
see that wayback says it's blocked by robots.txt:
See that the robots.txt for that domain, while complicated, specifically allows that type of URL:
Expect that complex robot.txt files are parsed and matched correctly by the wayback machine.
The text was updated successfully, but these errors were encountered: