The BehatCrawler is a Behat, MinkExtension and Selenium2Driver extension that crawls a given URL and executes user-defined functions in each crawled page.
Multiple options for crawling are available, see available options.
composer require piopi/behatcrawler
Start by importing the extension, to your Feature Context (or any of your Context):
use Behat\Crawler\Crawler;
Create your Crawler object with the default configuration:
The crawler is only compatible at this time with Selenium2Driver
//$crawler=New Crawler(BehatSession);
$crawler= New Crawler($this->getSession());
For custom settings (passed as an array), see the following table for all the available options.
$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20]);
Option | Description | Default Value |
---|---|---|
Depth | Maximum depth that can be crawled from URL | 0 (unlimited) |
MaxCrawl | Maximum number of crawls | 0 (unlimited) |
HTMLOnly | Will only crawl HTML/xHTML pages | true |
internalLinksOnly | Will crawl internal links only (links with same Domaine name as the initial URL) | true |
waitForCrawl | Will wait for the crawler to finish crawling before throwing any exception originating from the user defined functions. (Compile a list of all exceptions found with their respective location) | false |
Option can either be set in the constructor or with the appropriate getters/setters:
$crawler= New Crawler($this->getSession(),["MaxCrawl"=>10]);
//or
$crawler->setMaximumCrawl(10);
After creating and setting up the crawler, you can start crawling by passing your function as an argument:
Please refer to the PHP Callables documentation for more details.
Examples:
Closure::fromCallable is used to pass by parameter private function
//function 1 is a private function
$crawler->startCrawling(Closure::fromCallable([$this, 'function1']));
//function 2 is a public class function
$crawler->startCrawling([$this, 'function1']);
For functions with one or more arguments, they can be passed as the following:
$crawler->startCrawling(Closure::fromCallable([$this, 'function3']),[arg1]);
$crawler->startCrawling(Closure::fromCallable([$this, 'function4']),[arg1,arg2]);
use Behat\Crawler\Crawler;
//Crawler with different settings
$crawler= New Crawler($this->getSession(),["internalLinksOnly"=>true,"HTMLOnly"=>true,'MaxCrawl'=>20,"waitForCrawl"=>true]);
//Function without arguments
$crawler->startCrawling(Closure::fromCallable([$this, 'function1'])); //Will start crawling
//Function with one or more argument
$crawler->startCrawling(Closure::fromCallable([$this, 'function2']),[arg1,arg2]);
In a Behat step function:
/**
* @Given /^I crawl the website with a maximum of (\d+) level$/
*/
public function iCrawlTheWebsiteWithAMaximumOfLevel($arg1)
{
$crawler= New Crawler($this->getSession(),["Depth"=>$arg1]);
$crawler->startCrawling([$this, 'test']);
}
Copyright (c) 2020 Mostapha El Sabah elsabah.mostapha@gmail.com
Mostapha El Sabah Piopi