You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Readme.md
+39-17Lines changed: 39 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,23 +23,25 @@ npm install x-ray
23
23
24
24
-**Flexible schema:** Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.
25
25
26
+
-**Composable:** The API is entirely composable, giving you great flexibility in how you scrape each page.
27
+
26
28
-**Pagination support:** Paginate through websites, scraping each page. X-ray also supports a request `delay` and a pagination `limit`. Scraped pages can be streamed to a file, so if there's an error on one page, you won't lose what you've already scraped.
27
29
28
30
-**Crawler support:** Start on one page and move to the next easily. The flow is predictable, following
29
31
a breadth-first crawl through each of the pages.
30
32
31
33
-**Responsible:** X-ray has support for concurrency, throttles, delays, timeouts and limits to help you scrape any page responsibly.
32
34
33
-
-**Composable:** The API is entirely composable, giving you great flexibility in how you scrape each page.
34
-
35
35
-**Pluggable drivers:** Swap in different scrapers depending on your needs. Currently supports HTTP and [PhantomJS driver](http://github.com/lapwinglabs/x-ray-phantom) drivers. In the future, I'd like to see a Tor driver for requesting pages through the Tor network.
36
36
37
37
## Selector API
38
38
39
39
### xray(url, selector)(fn)
40
40
41
41
Scrape the `url` for the following `selector`, returning an object in the callback `fn`.
42
-
The `selector` takes an enhanced jQuery-like string that is also able to select on attributes. The syntax for selecting on attributes is `selector@attribute`. If you do not supply an attribute, the default is selecting the `innerText`. Here are a few examples:
42
+
The `selector` takes an enhanced jQuery-like string that is also able to select on attributes. The syntax for selecting on attributes is `selector@attribute`. If you do not supply an attribute, the default is selecting the `innerText`.
0 commit comments