Simple and extendable web scraper using css selectors
Branch: master
Clone or download
jerodev Merge pull request #4 from jerodev/analysis-Xl2Bgw
Apply fixes from StyleCI. Punctuation matters!
Latest commit a557c04 Jan 29, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src Apply fixes from StyleCI Jan 29, 2019
tests Parse selectors to object Jan 29, 2019
.gitignore Add simple webclient Oct 6, 2018
.travis.yml Add phpstan code analysis Jan 9, 2019
LICENSE
README.md
composer.json Add phpstan code analysis Jan 9, 2019
composer.lock Add phpstan code analysis Jan 9, 2019
phpstan.neon Add phpstan code analysis Jan 9, 2019
phpunit.xml string-returning functions Oct 6, 2018

README.md

Diglett Web Scraper

Build Status Scrutinizer Code Quality StyleCI

Diglett is an extended web crawler based on the Symfony DomCrawler Component. It allows to use extended and custom css selectors to easily get data from a web page.

Requirements

  • PHP 7.1.18 or higher

How to use

Diglett includes a webclient that returns a Diglett instance, but you can also inject your own Symfony Crawler object into the Diglett class. From your Diglett object, you can start using the different functions that implement the specialized css filter functions.

$diglett = \Jerodev\Diglett\WebClient::get('https://www.tabletopfinder.eu/');
$firstParagraph = $diglett->getText("p:first()");

Built-in selector functions

Function Description Example
:containsregex(str) Get the elements where the text content matches a regular expression div p:containsregex([Hh]el+o)
:containstext(str) Get the elements where the text content contain this substring div p:containstext(Hello World)
:first() Get the first element in a collection ul li:first()
:last() Get the last element in a collection ul li:last()
:next() Get the first sibling to the current element if available ul.test:next() li
:nth(x) Get the nth element in a collection (starting at 1) ul li:nth(3)
:prev() Get previous sibling to the current element if available ul li:last():prev()
:text(str) Get elements that exactly have this innerText ul li:text(Hello World)