Skip to content

Features

SATO Kentaro edited this page Jan 11, 2022 · 2 revisions

Features and Behaviors

The following table shows how ranvis/robots-txt-processor will behave. Other modules' result may not be accurate. See the bottom list for what each behavior means.

NOT UPDATED : Need to make it possible to specify PHP versions so that module will not emit compile error. Also want to add recently released modules...

rv=ranvis/robots-txt-processor, b4=bee4/robots.txt, dg=diggin/diggin-robotrules, m6=m6web/roboxt, t1=t1gor/robots-txt-parser, tv=tomverran/robots-txt-checker

Feature/Behavior rv b4 dg m6 t1 tv
AcceptLws X - - - - -
AcceptLwsCrlf X - -
AcceptSpaceBeforeColon X X X X
ComplementLeadingSlash X X
ComplementRecordSeparator X X X X X
IgnoreDirectiveCase X X X X X
IgnoreMiddleUserAgent X
IgnorePathCase X X
IgnorePathTrailingSpaces X X X X X
IgnoreRecordSeparator X X X X X
IgnoreURIReserved - X - - X
IgnoreUserAgentTrailingSpaces X X X X X
KeepEmptyRecord - - - - X -
KeepPathTrailingSpaces X
KeepUserAgentTrailingSpaces X
LineCr X X X
LineCrLf X X X X X
LineLf X X X X X
LongerPathFirst X X X X
PeDecodeNoMeta X - - - - -
PeDecodeWildcard - X - - X
PeDecodeWildcardDollarMiddle - X - -
PeDecodeWildcardDollarMultiple - X - - X
PeDecodeWildcardDollarTrailing - X - - X
UserAgentLeftMatch X X
Wildcard X X X X X
WildcardDollar X X X X - X
WildcardDollarMiddle X -
WildcardDollarMultiple X X X X - X
X:Yes, -:Not applicable

The following table shows features if Filter is applied before each parser.

Feature/Behavior f:rv f:b4 f:dg f:m6 f:t1 f:tv
AcceptLws X X X X - X
AcceptLwsCrlf X X X X - X
AcceptSpaceBeforeColon X X X X X
ComplementLeadingSlash X X X X X
ComplementRecordSeparator X X X X X X
IgnoreDirectiveCase X X X X X
IgnorePathCase X X
IgnorePathTrailingSpaces X X X X X
IgnoreRecordSeparator X X X X X
IgnoreURIReserved - X - - X
IgnoreUserAgentTrailingSpaces X X X X X
KeepEmptyRecord - - - - X -
KeepPathTrailingSpaces X
KeepUserAgentTrailingSpaces X
LineCr X X X X X
LineCrLf X X X X X
LineLf X X X X X
LongerPathFirst X X X X
PeDecodeWildcard X - X - - X
PeDecodeWildcardDollarMiddle - X - -
PeDecodeWildcardDollarMultiple - X - - X
PeDecodeWildcardDollarTrailing - X - - X
Wildcard X X X X X
WildcardDollar X X X X - X
WildcardDollarMiddle X -
WildcardDollarMultiple X X X X - X
X:Yes, -:Not applicable
  • AcceptLws: Can parse LWS in LF/CR/CRLF
  • AcceptLwsCrlf: Can parse LWS (line continuation by indenting)
  • AcceptSpaceBeforeColon: Allow spaces between directive and colon
  • ComplementLeadingSlash: Complements leading slash of path rule if one is missing
  • ComplementRecordSeparator: Handling of records without blank line
  • IgnoreDirectiveCase: Directives are case-insensitive
  • IgnoreMiddleUserAgent: Handling of records without blank line
  • IgnorePathCase: Treating path case-insensitive although RFC 7230 2.7.3 says case-sensitive
  • IgnorePathTrailingSpaces: Handling of trailing spaces in path
  • IgnoreRecordSeparator: Ignore blank lines
  • IgnoreURIReserved: %5B is not the same as [ according to RFC 7230 2.7.3
  • IgnoreUserAgentTrailingSpaces: Handling of trailing spaces in User-agent
  • KeepEmptyRecord: Treatment of the User-agent only record with the following record
  • KeepPathTrailingSpaces: Handling of trailing spaces in path
  • KeepUserAgentTrailingSpaces: Handling of trailing spaces in User-agent
  • LineCr: Treat CR as a line separator
  • LineCrLf: Treat sequence of CR LF as a line separator
  • LineLf: Treat LF as a line separator
  • LongerPathFirst: Longer path rule is evaluated earlier then shorter one
  • PeDecodeNoMeta: URL encoded meta characters are treated as escaped
  • PeDecodeWildcard: %2A is treated as a wildcard meta character
  • PeDecodeWildcardDollarMiddle: %24 in the middle is treated as an end-of-path meta character
  • PeDecodeWildcardDollarMultiple: Multiple %24 are treated as end-of-path meta character
  • PeDecodeWildcardDollarTrailing: %24 is treated as an end-of-path meta character
  • UserAgentLeftMatch: User-agent is start-with matched
  • Wildcard: * is treated as a wildcard character
  • WildcardDollar: $ is treated as an end-of-path meta character
  • WildcardDollarMiddle: $ in the middle is treated as an end-of-path meta character
  • WildcardDollarMultiple: Treatment of multiple $'s as an end-of-path meta character
Clone this wiki locally