-
Notifications
You must be signed in to change notification settings - Fork 7
Closed as not planned
Description
Goals
This module should make handling pagination simpler across all engines.
Context
Currently, pagination needs to be handled manually. One approach is the following:
import { config, getJson } from "serpapi";
config.api_key = process.env.API_KEY;
const num = 10; // Number of results per page
let start = 0; // Results offset
const links = [];
while (start < 50) { // Get up to 50 results
const json = await getJson("google", {
q: "coffee",
location: "Austin, Texas",
start,
num,
});
const pageLinks = json["organic_results"].map((r) => r.link);
links.push(...pageLinks);
start += num;
}
console.log(links);
This works for engines that support the fetching of results by an offset + size. For example,
However, not all engines rely on this offset + size concept. For example,
- Ebay -
_pgn
refers to the page number and not the results offset. - Google Play Store -
next_page_token
is used as a token for the next page. - Some others have been mentioned here: Pagination iterator doesn't work for APIs with token-based pagination google-search-results-python#22
For these less common approaches, users will need to be aware of it and update their code accordingly.
Full list of engines that support pagination
There are 7 types:
Offset only, e.g. google_jobs, yahoo
Engine | Param | Type | Description |
---|---|---|---|
google_jobs | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.). |
google_reverse_image | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.). |
google_maps | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 20 is the 2nd page of results, 40 is the 3rd page of results, etc.). |
google_events | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.). |
yahoo | b | string | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 1 (default) is the first page of results, 11 is the 2nd page of results, 21 is the 3rd page of results, etc.). |
yahoo_images | b | string | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 1 (default) starts from the first result, 61 starts from the 61st result, 121 starts from the 121st result, etc.). |
yahoo_videos | b | string | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 1 (default) starts from the first result, 61 starts from the 61st result, 121 starts from the 121st result, etc.). |
duckduckgo | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. When pagination is not being used (initial search request), number of organic_results can vary between 26 and 30. When pagination is being used (value of start parameter is bigger then 0), organic_results return 50 results. |
yelp | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.). |
yelp_reviews | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.). |
Page only, e.g. yandex, apple_reviews
Engine | Param | Type | Description |
---|---|---|---|
google (google images only) | ijn | string | Parameter defines the page number for Google Images. There are 100 images per page. This parameter is equivalent to start (offset) = ijn * 100. This parameter works only for Google Images (set tbm to isch). |
yandex | p | string | Parameter defines page number. Pagination starts from 0. |
yandex_images | p | string | Parameter defines the page number. Pagination starts from 0, and it can return up to 30 results. |
yandex_videos | p | string | Parameter defines the page number. Pagination starts from 0, and it can return up to 30 results. |
walmart_product_reviews | page | string | Value is used to get the reviews on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.). |
apple_reviews | page | string | Parameter is used to get the items on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.). |
Offset + size, e.g. google, bing, baidu
Engine | Param | Type | Description |
---|---|---|---|
start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.). | |
num | string | Parameter defines the maximum number of results to return. (e.g., 10 (default) returns 10 results, 40 returns 40 results, and 100 returns 100 results). | |
google_scholar | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.). |
google_scholar | num | string | Parameter defines the maximum number of results to return, limited to 20. (e.g., 10 (default) returns 10 results, 20 returns 20 results). |
google_scholar_author | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 20 is the 2nd page of results, 40 is the 3rd page of results, etc.). |
google_scholar_author | num | string | Parameter defines the number of results to return. (e.g., 20 (default) returns 20 results, 40 returns 40 results, etc.). Maximum number of results to return is 100. |
bing | first | string | Parameter controls the offset of the organic results. This parameter defaults to 1. (e.g., first=10 will move the 10th organic result to the first position). |
bing | count | string | Parameter controls the number of results per page. Minimum: 1, Maximum: 50. This parameter is only a suggestion and might not reflect actual results returned. |
bing_news | first | string | Parameter controls the offset of the organic results. This parameter defaults to 1. (e.g., first=10 will move the 10th organic result to the first position). |
bing_news | count | string | Parameter controls the number of results per page. This parameter is only a suggestion and might not reflect actual results returned. |
bing_images | first | string | Parameter controls the offset of the organic results. This parameter defaults to 1. (e.g., first=10 will move the 10th organic result to the first position). |
bing_images | count | string | Parameter controls the number of results per page. This parameter is only a suggestion and might not reflect the returned results. |
baidu | pn | string | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.). |
baidu | rn | string | Parameter defines the maximum number of results to return, limited to 50. (e.g., 10 (default) returns 10 results, 30 returns 30 results, and 50 returns 50 results). This parameter is only available for desktop and tablet searches. |
baidu_news | pn | string | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.). |
baidu_news | rn | string | Parameter defines the maximum number of results to return, limited to 50. (e.g., 10 (default) returns 10 results, 30 returns 30 results, and 50 returns 50 results). |
Offset + page, e.g. google_product
Engine | Param | Type | Description |
---|---|---|---|
google_product | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.) This parameter works only for Google Online Sellers and Reviews. |
google_product | page | string | Parameter defines the page number for Google Online Sellers and Reviews. There are 10 results per page. This parameter is equivalent to start (offset) = page * 10. This parameter works only for Google Online Sellers and Reviews. |
Page + size, e.g. ebay, walmart
Engine | Param | Type | Description |
---|---|---|---|
ebay | _pgn | string | Parameter defines the page number. It’s used for pagination. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.). |
ebay | _ipg | string | Parameter defines the maximum number of results to return. There are total of four options: 25, 50 (default), 100 and 200 results. |
walmart | page | string | Value is used to get the items on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.). Maximum page value is 100. |
walmart | ps | number | Determines the number of items per page. There are scenarios where Walmart overrides the ps value. By default Walmart returns 40 results. |
apple_app_store | num | string | Parameter defines the number of results you want to get per each page. It defaults to 10. Maximum number of results you can get per page is 200. Any number greater than maximum number will default to 200. |
apple_app_store | page | string | Parameter is used to get the items on a specific page. (e.g., 0 (default) is the first page of results, 1 is the 2nd page of results, 2 is the 3rd page of results, etc.). |
Offset + page + size, e.g. yahoo_shopping, home_depot
Engine | Param | Type | Description |
---|---|---|---|
yahoo_shopping | start | number | Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 1 (default) is the first page of results, 60 is the 2nd page of results, 120 is the 3rd page of results, etc.). |
yahoo_shopping | limit | number | Parameter defines the maximum number of results to return. (e.g., 10 (default) returns 10 results, 40 returns 40 results, and 100 returns 100 results). |
yahoo_shopping | page | string | The page parameter does the start parameter math for you! Just define the page number you want. Pagination starts from 1. |
home_depot | nao | string | Defines offset for products result. A single page contains 24 products. First page offset is 0, second -> 24, third -> 48 and so on. |
home_depot | page | string | Value is used to get the items on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.). |
home_depot | ps | number | Determines the number of items per page. There are scenarios where Home depot overrides the ps value. By default Home depot returns 24 results. |
naver | start | number | Parameter controls the offset of the organic results. This parameter defaults to 1 (except for the web). (e.g. The formula for all searches except the web is start = (page number * 10) - 9 e.g. Page number 3 (3 * 10) - 9 = 21) The formula for the web will be start = (page number * 15) - 29 e.g. Page number 3 (3 * 15) - 29 = 16. |
naver | num | string | Parameter defines the maximum number of results to return. 50 (default) returns 50 results. Maximum number of results to return is 100.Parameter can only be used with Naver Images API. |
naver | page | string | The page parameter does the start parameter math for you! Just define the page number you want. Pagination starts from 1. |
Token only, e.g. google_scholar_profiles, google_play
Engine | Parameter | Type | Description |
---|---|---|---|
google_scholar_profiles | after_author | string | Parameter defines the next page token. It is used for retrieving the next page results. The parameter has the precedence over before_author parameter. |
google_scholar_profiles | before_author | string | Parameter defines the previous page token. It is used for retrieving the previous page results. |
google_maps_photos | next_page_token | string | Parameter defines the next page token. It is used for retrieving the next page results. 20 results are returned per page. |
google_maps_reviews | next_page_token | string | Parameter defines the next page token. It is used for retrieving the next page results.Usage of start parameter (results offset) has been deprecated by Google. |
google_play | next_page_token | string | Parameter defines the next page token. It is used for retrieving the next page results. |
Possible approaches
The key question is how we might abstract the pagination logic in a manner that makes using SerpApi simpler and more ergonomic.
Approach 1: New function
- New function
getPaginatedJson
that can be looped over.
const organicResults = [];
for await (const page of getPaginatedJson("google", { q: "coffee", start: 15 })) {
organicResults.push(...page.organic_results);
if (organicResults.length >= 50) break;
}
Pros
- Types are clean.
- Iterating over the function to get multiple page results is nice.
Cons
- New function, might be confusing.
- Not very ergonomic since if you want to get the next page, you need to call a different function.
- Does not support callbacks.
Approach 2: Next method
getJson
returns the results object that includes anext
method.
const organicResults = [];
let page = await getJson("google", { q: "coffee", start: 15 });
while (page) {
organicResults.push(...page.organic_results);
if (organicResults.length >= 50) break;
page = await page.next();
}
Pros
- Ergonomic since if you want to get the next page, you can just call the
next
method on the result object. - Not a breaking change to existing implementations that use
getJson
. - Simpler to understand than using a brand new function.
- Supports callbacks.
Cons
- Cannot iterate over it to get multiple page results.
- Though technically you can create an async iterable wrapper
Approach 3: Magic?
getJson
returns the results object as per normal.- If looped over, then it returns each page's results.
// calling once works
await getJson("google", { q: "coffee", start: 15 });
// calling within a loop works too
for await (const page of await getJson("google", { q: "coffee", start: 15 })) {
organicResults.push(...page.organic_results);
if (organicResults.length >= 50) break;
}
Pros
- Works for single calls or when called in a loop.
- Iterating over the function to get multiple page results is nice.
- Not a breaking change to existing implementations that use
getJson
. - Simpler to understand than using a brand new function.
Cons
- Does not support callbacks.
- There are 2
await
s in the loop, might be confusing.- This is required because
getJson
returns a Promise that needs to be awaited to return an object that contains the fetched results and also the instructions necessary to continue the async loop. i.e. returns an async iterable object
- This is required because
- Types are a little strange as it includes a
[Symbol.asyncIterator]
key which is required for the loop to work.
aciddjus, hartator and ilyazub
Metadata
Metadata
Assignees
Labels
No labels