Skip to content

Commit 8ce66a6

Browse files
small cleanup
1 parent 2603c0b commit 8ce66a6

File tree

1 file changed

+39
-17
lines changed

1 file changed

+39
-17
lines changed

Readme.md

Lines changed: 39 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -23,23 +23,25 @@ npm install x-ray
2323

2424
- **Flexible schema:** Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.
2525

26+
- **Composable:** The API is entirely composable, giving you great flexibility in how you scrape each page.
27+
2628
- **Pagination support:** Paginate through websites, scraping each page. X-ray also supports a request `delay` and a pagination `limit`. Scraped pages can be streamed to a file, so if there's an error on one page, you won't lose what you've already scraped.
2729

2830
- **Crawler support:** Start on one page and move to the next easily. The flow is predictable, following
2931
a breadth-first crawl through each of the pages.
3032

3133
- **Responsible:** X-ray has support for concurrency, throttles, delays, timeouts and limits to help you scrape any page responsibly.
3234

33-
- **Composable:** The API is entirely composable, giving you great flexibility in how you scrape each page.
34-
3535
- **Pluggable drivers:** Swap in different scrapers depending on your needs. Currently supports HTTP and [PhantomJS driver](http://github.com/lapwinglabs/x-ray-phantom) drivers. In the future, I'd like to see a Tor driver for requesting pages through the Tor network.
3636

3737
## Selector API
3838

3939
### xray(url, selector)(fn)
4040

4141
Scrape the `url` for the following `selector`, returning an object in the callback `fn`.
42-
The `selector` takes an enhanced jQuery-like string that is also able to select on attributes. The syntax for selecting on attributes is `selector@attribute`. If you do not supply an attribute, the default is selecting the `innerText`. Here are a few examples:
42+
The `selector` takes an enhanced jQuery-like string that is also able to select on attributes. The syntax for selecting on attributes is `selector@attribute`. If you do not supply an attribute, the default is selecting the `innerText`.
43+
44+
Here are a few examples:
4345

4446
- Scrape a single tag
4547

@@ -84,21 +86,33 @@ x(html, 'body', 'h2', function(err, header) {
8486

8587
## API
8688

87-
### xray.driver(fn)
88-
89+
### xray.driver(driver)
8990

91+
Specify a `driver` to make requests through.
9092

9193
### xray.paginate(selector)
9294

9395
Select a `url` from an `selector` and visit that page.
9496

9597
### xray.limit(n)
9698

97-
Limits the amount of pagination to `n`
99+
Limit the amount of pagination to `n` requests.
100+
101+
### xray.delay(ms)
102+
103+
Delay the next crawl to `ms` milliseconds
104+
105+
### xray.concurrency(n)
106+
107+
Set a concurrency to `n`. Defaults to `Infinity`.
108+
109+
### xray.throttle(n, ms)
110+
111+
Throttle the requests to `n` requests per `ms` milliseconds.
98112

99-
### xray.delay(n)
113+
### xray.timeout (ms)
100114

101-
Delay the next crawl to `n` milliseconds
115+
Specify a timeout of `ms` milliseconds for each request.
102116

103117
## Collections
104118

@@ -120,7 +134,12 @@ x('http://google.com', {
120134
main: 'title',
121135
image: x('#gbar a@href', 'title'),
122136
})(function(err, obj) {
123-
obj // => { main: 'Google', image: 'Google Images' }
137+
/*
138+
{
139+
main: 'Google',
140+
image: 'Google Images'
141+
}
142+
*/
124143
})
125144
```
126145

@@ -137,14 +156,17 @@ x('http://mat.io', {
137156
description: '.item-content section'
138157
}])
139158
})(function(err, obj) {
140-
obj // => { title: 'mat.io',
141-
// => items: [
142-
// => {
143-
// => title: 'The 100 Best Children\'s Books of All Time',
144-
// => description: 'Relive your childhood with TIME\'s list...'
145-
// => }
146-
// => ]
147-
// => }
159+
/*
160+
{
161+
title: 'mat.io',
162+
items: [
163+
{
164+
title: 'The 100 Best Children\'s Books of All Time',
165+
description: 'Relive your childhood with TIME\'s list...'
166+
}
167+
]
168+
}
169+
*/
148170
})
149171
```
150172

0 commit comments

Comments
 (0)