You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The actual wildcards are not that difficult, but getting the precedence right is harder. Perhaps we can use a standard library e.g. the crawler commons code?
The text was updated successfully, but these errors were encountered:
We only support trailing
*
wildcards at present. Ideally we should support wildcards as defined in https://developers.google.com/search/reference/robots_txtThe code to modify would be:
heritrix3/modules/src/main/java/org/archive/modules/net/RobotsDirectives.java
Lines 40 to 42 in 0581170
The actual wildcards are not that difficult, but getting the precedence right is harder. Perhaps we can use a standard library e.g. the crawler commons code?
The text was updated successfully, but these errors were encountered: