Fork of sleeping-owl/apist with updated dependencies.
Glook Apist is a small library which allows you to access any site in api-like style, based on html parsing.
From Packagist:
composer require glook/apist
Create a class that extends Apist and define your API methods using blueprints — arrays that map keys to CSS selectors
with extraction chains:
use glook\apist\Apist;
class WikiApi extends Apist
{
public function getBaseUrl(): ?string
{
return 'https://en.wikipedia.org';
}
public function index()
{
return $this->get('/wiki/Main_Page', [
'welcome_message' => Apist::filter('#mp-topbanner div:first')->text()->mb_substr(0, -1),
'portals' => Apist::filter('a[title^="Portal:"]')->each([
'link' => Apist::current()->attr('href')->call(function ($href)
{
return $this->getBaseUri() . $href;
}),
'label' => Apist::current()->text()
]),
'languages' => Apist::filter('#p-lang li a[title]')->each([
'label' => Apist::current()->text(),
'lang' => Apist::current()->attr('title'),
'link' => Apist::current()->attr('href')->call(function ($href)
{
return 'https:' . $href;
})
]),
'sister_projects' => Apist::filter('#mp-sister b a')->each()->text(),
'featured_article' => Apist::filter('#mp-tfa')->html()
]);
}
}Result:
{
"welcome_message": "Welcome to Wikipedia",
"portals": [
{
"link": "https:\/\/en.wikipedia.org\/wiki\/Portal:Arts",
"label": "Arts"
},
{
"link": "https:\/\/en.wikipedia.org\/wiki\/Portal:Biography",
"label": "Biography"
}
],
"languages": [
{
"label": "Simple English",
"lang": "Simple English",
"link": "https:\/\/simple.wikipedia.org\/wiki\/"
}
],
"sister_projects": [
"Commons",
"MediaWiki"
],
"featured_article": "<div style=\"float: left;\">...</div>"
}A blueprint is an array (or a single selector) that describes how to extract structured data from HTML. Each value
in the blueprint is an ApistSelector — a chain of CSS selector + extraction methods.
// Array blueprint — returns associative array
$this->get('/page', [
'title' => Apist::filter('h1')->text(),
'content' => Apist::filter('.body')->html(),
]);
// Single selector blueprint — returns a single value
$this->get('/page', Apist::filter('h1')->text());
// No blueprint — returns raw HTML content
$this->get('/page');Apist::filter('.selector')— creates anApistSelectorbound to a CSS selectorApist::current()— references the current element insideeach()iterations
All standard HTTP methods are supported:
$this->get($url, $blueprint, $options);
$this->post($url, $blueprint, $options);
$this->put($url, $blueprint, $options);
$this->patch($url, $blueprint, $options);
$this->delete($url, $blueprint, $options);
$this->head($url, $blueprint, $options);The $options array is passed directly to Guzzle's request options.
Extraction methods available in selector chains:
| Method | Description |
|---|---|
text() |
Get text content |
html() |
Get inner HTML |
attr('name') |
Get attribute value |
hasAttr('name') |
Check if attribute exists |
exists() |
Check if element exists |
Navigation:
| Method | Description |
|---|---|
first() |
First matched element |
last() |
Last matched element |
eq($index) |
Element at index |
next() |
Next sibling |
prev() |
Previous sibling |
children() |
Child elements |
closest($selector) |
Closest ancestor matching selector |
Iteration:
| Method | Description |
|---|---|
each($blueprint) |
Iterate over matched elements |
each() |
Returns collection for chaining |
Transformation:
| Method | Description |
|---|---|
call($callback) |
Apply custom callback |
check($condition, $then) |
Conditional logic |
then($blueprint) |
Apply blueprint if truthy |
trim() |
Trim whitespace |
intval() |
Cast to integer |
floatval() |
Cast to float |
str_replace($search, $replace) |
String replacement |
mb_substr($start, $length) |
Multibyte substring |
Any global PHP function can also be used in the chain (e.g., strtoupper, strip_tags).
By default, HTTP errors are suppressed and returned as structured error responses:
$api->setSuppressExceptions(false); // throw exceptions on HTTP errorsAfter a request, you can access the underlying Guzzle response:
$result = $api->index();
$response = $api->getLastMethod()->getResponse();
$statusCode = $response->getStatusCode();composer testThat package has been tested on the following PHP versions:
- PHP 7.4
- PHP 8.0
- PHP 8.1
- PHP 8.2
- PHP 8.3
- PHP 8.4
- PHP 8.5
Originally written by Sleeping Owl and released under the MIT License.
Fork maintained by Andrey Polyakov.
See the LICENSE file for details.