Skip to content
Goutte, a simple PHP Web Scraper
PHP
Find file
Pull request Compare This branch is 8 commits ahead, 263 commits behind FriendsOfPHP:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
src
vs_ruby_mechanize
.gitignore
LICENSE
README.md
compile.php
goutte.phar
install_vendors.sh
sasezaki_forked_sample1.php
update_vendors.sh

README.md

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.

Requirements

Goutte works with PHP 5.3.

Installation

Installing Goutte is as easy as it can get. Download the Goutte.phar file and you're done!

Usage

Require the Goutte phar file to use Goutte in a script:

require_once '/path/to/goutte.phar';

Create a Goutte Client instance (which extends Symfony\Components\BrowserKit\Client):

use Goutte\Client;

$client = new Client();

Make requests with the request() method:

$crawler = $client->request('GET', 'http://www.symfony-project.org/');

The method returns a Crawler object (Symfony\Components\DomCrawler\Crawler).

Click on links:

$link = $crawler->selectLink('Plugins')->link();
$crawler = $client->click($link);

Submit forms:

$form = $crawler->selectButton('sign in')->form();
$crawler = $client->submit($form, array('signin[username]' => 'fabien', 'signin[password]' => 'xxxxxx'));

Extract data:

$nodes = $crawler->filter('.error_list');
if ($nodes->count())
{
  die(sprintf("Authentification error: %s\n", $nodes->text()));
}

printf("Nb tasks: %d\n", $crawler->filter('#nb_tasks')->text());

More Information

Read the documentation of the BrowserKit and DomCrawler Symfony Components for more information about what you can do with Goutte.

Technical Information

Goutte is a thin wrapper around the following fine PHP libraries:

  • Symfony Components: BrowserKit, DomCrawler, CssSelector, and Process

  • Zend libraries: Date, Uri, Http, and Validate

License

Goutte is licensed under the MIT license.

A part of this package (Diggin_Http_Response_Charset_Detector_Html & Diggin_Scraper_Adapter_Htmlscraping) is borrowed from HTMLScraping (LGPL) http://www.rcdtokyo.com/etc/htmlscraping/

Something went wrong with that request. Please try again.