Optimize robotparser for long list of rules

Previously, `robotparser` implemented old pre-standard specification which nobody uses now. It returned the result after finding the first matching rule, which worked incorrectly in many cases (see #83368). After #138907 it follows the longest path rule.

The code can be optimized, for example by sorting rules by the path length, matching them from longest to shorter and stopping if the match is longer than the remaining paths. This can only be used for paths which do not contain metacharacters `*` and `$`.

Other optimizations can also be used, for example a trie-like structure, which could also be used for paths with metacharacters. But this will significantly complicate the code.

I am not actually sure that such optimization is necessary. In most cases the number of rules should not be too large. This is why I did not include it in the previous PR. We need to collect some data first. So I publish my code as a draft.


### Linked PRs
* gh-149382

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize robotparser for long list of rules #149381

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Optimize robotparser for long list of rules #149381

Description

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions