Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WWW::RobotRules/LWP::RobotUA Does Not Respect Crawl-delay: [rt.cpan.org #19539] #128

Open
oalders opened this issue Mar 30, 2017 · 0 comments

Comments

@oalders
Copy link
Member

oalders commented Mar 30, 2017

Migrated from rt.cpan.org#19539 (status was 'new')

Requestors:

From imacat@cpan.org on 2006-05-28 13:38:41:

Hi.  This is imacat from Taiwan.  I was trying LWP::RobotUA, and
found that WWW::RobotRules does not respect Crawl-delay:.  The test
script is (an exact copy in WWW::RobotRules's POD):

==========
#! /usr/bin/perl -w

use WWW::RobotRules;
my $rules = WWW::RobotRules->new('MOMspider/1.0');

use LWP::Simple qw(get);

my $url = "http://sourceforge.net/robots.txt";
my $robots_txt = get $url;
$rules->parse($url, $robots_txt) if defined $robots_txt;
==========

    The result I got is:

==========
imacat@rinse ~/tmp % ./test.pl
RobotRules <http://sourceforge.net/robots.txt>: Unexpected line:
Crawl-delay: 10
RobotRules <http://sourceforge.net/robots.txt>: Unexpected line:
Crawl-delay: 2
RobotRules <http://sourceforge.net/robots.txt>: Unexpected line:
Crawl-delay: 2
imacat@rinse ~/tmp %
==========

    Crawl-delay: is a popular instruction that is used all over the
world, and is obeyed by Yahoo, MSN and many robots.  A package written
with LWP::RobotUA with such a warning all the time could not be used. 
This would make LWP::RobotUA quite useless.  Besides, if a website has
specified Crawl-delay:, LWP::RobotUA should respect it instead of its
own $ua->delay().  Could you look into this and fix this soon?  Thank you.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant