GitHub - IanBod/WWW-Crawl: Perl module to crawl a single website

IanBod / WWW-Crawl Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

Perl module to crawl a single website

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
lib/WWW		lib/WWW
t		t
01-html.t		01-html.t
Changes		Changes
LICENCE		LICENCE
MANIFEST		MANIFEST
Makefile.PL		Makefile.PL
README		README

Repository files navigation

WWW-Crawl

The WWW::Crawl module provides a simple web crawling utility for extracting links and other resources from web pages within a single domain. It can be used to recursively explore a website and retrieve URLs, including those found in HTML href attributes, form actions, external JavaScript files, and JavaScript window.open links.

WWW::Crawl will not stray outside the supplied domain.

INSTALLATION & TESTING

To run author tests, set the environment variable RELEASE_TESTING

Installation tests are only run if Test::Mock::HTTP::Tiny in installed.  If you wish to run a full set of tests, ensure this module is installed before installing WWW::Crawl.

To install this module, run the following commands:

	perl Makefile.PL
	make
	make test
	make install

SUPPORT AND DOCUMENTATION

After installing, you can find documentation for this module with the
perldoc command.

    perldoc WWW::Crawl

You can also look for information at:

    RT, CPAN's request tracker (report bugs here)
        https://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Crawl

    Search CPAN
        https://metacpan.org/release/WWW-Crawl


LICENSE AND COPYRIGHT

This software is Copyright (c) 2023 by Ian Boddison.

This program is released under the following license:

  Perl