Features
SATO Kentaro edited this page Jan 11, 2022
·
2 revisions
The following table shows how ranvis/robots-txt-processor will behave. Other modules' result may not be accurate. See the bottom list for what each behavior means.
NOT UPDATED : Need to make it possible to specify PHP versions so that module will not emit compile error. Also want to add recently released modules...
rv=ranvis/robots-txt-processor, b4=bee4/robots.txt, dg=diggin/diggin-robotrules, m6=m6web/roboxt, t1=t1gor/robots-txt-parser, tv=tomverran/robots-txt-checker
Feature/Behavior | rv | b4 | dg | m6 | t1 | tv |
---|---|---|---|---|---|---|
AcceptLws | X | - | - | - | - | - |
AcceptLwsCrlf | X | - | - | |||
AcceptSpaceBeforeColon | X | X | X | X | ||
ComplementLeadingSlash | X | X | ||||
ComplementRecordSeparator | X | X | X | X | X | |
IgnoreDirectiveCase | X | X | X | X | X | |
IgnoreMiddleUserAgent | X | |||||
IgnorePathCase | X | X | ||||
IgnorePathTrailingSpaces | X | X | X | X | X | |
IgnoreRecordSeparator | X | X | X | X | X | |
IgnoreURIReserved | - | X | - | - | X | |
IgnoreUserAgentTrailingSpaces | X | X | X | X | X | |
KeepEmptyRecord | - | - | - | - | X | - |
KeepPathTrailingSpaces | X | |||||
KeepUserAgentTrailingSpaces | X | |||||
LineCr | X | X | X | |||
LineCrLf | X | X | X | X | X | |
LineLf | X | X | X | X | X | |
LongerPathFirst | X | X | X | X | ||
PeDecodeNoMeta | X | - | - | - | - | - |
PeDecodeWildcard | - | X | - | - | X | |
PeDecodeWildcardDollarMiddle | - | X | - | - | ||
PeDecodeWildcardDollarMultiple | - | X | - | - | X | |
PeDecodeWildcardDollarTrailing | - | X | - | - | X | |
UserAgentLeftMatch | X | X | ||||
Wildcard | X | X | X | X | X | |
WildcardDollar | X | X | X | X | - | X |
WildcardDollarMiddle | X | - | ||||
WildcardDollarMultiple | X | X | X | X | - | X |
X:Yes, -:Not applicable |
The following table shows features if Filter
is applied before each parser.
Feature/Behavior | f:rv | f:b4 | f:dg | f:m6 | f:t1 | f:tv |
---|---|---|---|---|---|---|
AcceptLws | X | X | X | X | - | X |
AcceptLwsCrlf | X | X | X | X | - | X |
AcceptSpaceBeforeColon | X | X | X | X | X | |
ComplementLeadingSlash | X | X | X | X | X | |
ComplementRecordSeparator | X | X | X | X | X | X |
IgnoreDirectiveCase | X | X | X | X | X | |
IgnorePathCase | X | X | ||||
IgnorePathTrailingSpaces | X | X | X | X | X | |
IgnoreRecordSeparator | X | X | X | X | X | |
IgnoreURIReserved | - | X | - | - | X | |
IgnoreUserAgentTrailingSpaces | X | X | X | X | X | |
KeepEmptyRecord | - | - | - | - | X | - |
KeepPathTrailingSpaces | X | |||||
KeepUserAgentTrailingSpaces | X | |||||
LineCr | X | X | X | X | X | |
LineCrLf | X | X | X | X | X | |
LineLf | X | X | X | X | X | |
LongerPathFirst | X | X | X | X | ||
PeDecodeWildcard | X | - | X | - | - | X |
PeDecodeWildcardDollarMiddle | - | X | - | - | ||
PeDecodeWildcardDollarMultiple | - | X | - | - | X | |
PeDecodeWildcardDollarTrailing | - | X | - | - | X | |
Wildcard | X | X | X | X | X | |
WildcardDollar | X | X | X | X | - | X |
WildcardDollarMiddle | X | - | ||||
WildcardDollarMultiple | X | X | X | X | - | X |
X:Yes, -:Not applicable |
- AcceptLws: Can parse LWS in LF/CR/CRLF
- AcceptLwsCrlf: Can parse LWS (line continuation by indenting)
- AcceptSpaceBeforeColon: Allow spaces between directive and colon
- ComplementLeadingSlash: Complements leading slash of path rule if one is missing
- ComplementRecordSeparator: Handling of records without blank line
- IgnoreDirectiveCase: Directives are case-insensitive
- IgnoreMiddleUserAgent: Handling of records without blank line
- IgnorePathCase: Treating path case-insensitive although RFC 7230 2.7.3 says case-sensitive
- IgnorePathTrailingSpaces: Handling of trailing spaces in path
- IgnoreRecordSeparator: Ignore blank lines
- IgnoreURIReserved: %5B is not the same as [ according to RFC 7230 2.7.3
- IgnoreUserAgentTrailingSpaces: Handling of trailing spaces in User-agent
- KeepEmptyRecord: Treatment of the User-agent only record with the following record
- KeepPathTrailingSpaces: Handling of trailing spaces in path
- KeepUserAgentTrailingSpaces: Handling of trailing spaces in User-agent
- LineCr: Treat CR as a line separator
- LineCrLf: Treat sequence of CR LF as a line separator
- LineLf: Treat LF as a line separator
- LongerPathFirst: Longer path rule is evaluated earlier then shorter one
- PeDecodeNoMeta: URL encoded meta characters are treated as escaped
- PeDecodeWildcard: %2A is treated as a wildcard meta character
- PeDecodeWildcardDollarMiddle: %24 in the middle is treated as an end-of-path meta character
- PeDecodeWildcardDollarMultiple: Multiple %24 are treated as end-of-path meta character
- PeDecodeWildcardDollarTrailing: %24 is treated as an end-of-path meta character
- UserAgentLeftMatch: User-agent is start-with matched
- Wildcard: * is treated as a wildcard character
- WildcardDollar: $ is treated as an end-of-path meta character
- WildcardDollarMiddle: $ in the middle is treated as an end-of-path meta character
- WildcardDollarMultiple: Treatment of multiple $'s as an end-of-path meta character