Skip to content

Readability / Html Content / Article Extractor & Web Scrapping library written in PHP

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
license.md

iSerter/php-goose

Repository files navigation

iserter/php-goose

scotteh/php-goose is no longer maintained, so I created this alternative that supports the recent PHP versions.

There may be some issues, but so far, it's working 'ok'. Feel free to contribute.

  • Extracts title, description, canonical URL, main image, and cleaned article text
  • Minimal dependencies; works in any PHP app (framework-agnostic)
  • DOMDocument + XPath heuristics similar to Goose/Readability techniques

Quick start

use Iserter\Goose\Goose;

$goose = new Goose();
$article = $goose->extract('https://example.com/some-article');

echo $article->getTitle();

You can also pass raw HTML:

$article = $goose->extract($html, 'https://iserter.com');

Installation

Add the path repository to your root composer.json and require dev-main while developing locally.

composer require iserter/php-goose dev-main

License

MIT

About

Readability / Html Content / Article Extractor & Web Scrapping library written in PHP

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
license.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages