Skip to content

Question on robots.txt #529

Answered by ato
oschihin asked this question in Q&A
Discussion options

You must be logged in to vote

Heritrix does not currently support sitemaps (although there's a draft pull request adding it: #262) and does not support wildcards in Disallow lines (feature request #250). I haven't tested it but I would guess the rule Disallow: /*?* will be interpreted as matching paths that actually start with the literal string /*?. It will not match /index.html?foo.

Update (2022): Sitemaps are now supported.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ato
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
2 participants
Converted from issue

This discussion was converted from issue #371 on September 30, 2022 00:52.