Skip to content

Better support for pagination #2

@sebastianquek

Description

@sebastianquek

Goals

This module should make handling pagination simpler across all engines.

Context

Currently, pagination needs to be handled manually. One approach is the following:

import { config, getJson } from "serpapi";

config.api_key = process.env.API_KEY;

const num = 10; // Number of results per page
let start = 0; // Results offset

const links = [];

while (start < 50) { // Get up to 50 results
  const json = await getJson("google", {
    q: "coffee",
    location: "Austin, Texas",
    start,
    num,
  });
  const pageLinks = json["organic_results"].map((r) => r.link);
  links.push(...pageLinks);
  start += num;
}

console.log(links);

This works for engines that support the fetching of results by an offset + size. For example,

However, not all engines rely on this offset + size concept. For example,

For these less common approaches, users will need to be aware of it and update their code accordingly.

Full list of engines that support pagination

There are 7 types:

Offset only, e.g. google_jobs, yahoo
Engine Param Type Description
google_jobs start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.).
google_reverse_image start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.).
google_maps start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 20 is the 2nd page of results, 40 is the 3rd page of results, etc.).
google_events start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.).
yahoo b string Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 1 (default) is the first page of results, 11 is the 2nd page of results, 21 is the 3rd page of results, etc.).
yahoo_images b string Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 1 (default) starts from the first result, 61 starts from the 61st result, 121 starts from the 121st result, etc.).
yahoo_videos b string Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 1 (default) starts from the first result, 61 starts from the 61st result, 121 starts from the 121st result, etc.).
duckduckgo start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. When pagination is not being used (initial search request), number of organic_results can vary between 26 and 30. When pagination is being used (value of start parameter is bigger then 0), organic_results return 50 results.
yelp start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.).
yelp_reviews start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.).
Page only, e.g. yandex, apple_reviews
Engine Param Type Description
google (google images only) ijn string Parameter defines the page number for Google Images. There are 100 images per page. This parameter is equivalent to start (offset) = ijn * 100. This parameter works only for Google Images (set tbm to isch).
yandex p string Parameter defines page number. Pagination starts from 0.
yandex_images p string Parameter defines the page number. Pagination starts from 0, and it can return up to 30 results.
yandex_videos p string Parameter defines the page number. Pagination starts from 0, and it can return up to 30 results.
walmart_product_reviews page string Value is used to get the reviews on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.).
apple_reviews page string Parameter is used to get the items on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.).
Offset + size, e.g. google, bing, baidu
Engine Param Type Description
google start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.).
google num string Parameter defines the maximum number of results to return. (e.g., 10 (default) returns 10 results, 40 returns 40 results, and 100 returns 100 results).
google_scholar start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.).
google_scholar num string Parameter defines the maximum number of results to return, limited to 20. (e.g., 10 (default) returns 10 results, 20 returns 20 results).
google_scholar_author start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 20 is the 2nd page of results, 40 is the 3rd page of results, etc.).
google_scholar_author num string Parameter defines the number of results to return. (e.g., 20 (default) returns 20 results, 40 returns 40 results, etc.). Maximum number of results to return is 100.
bing first string Parameter controls the offset of the organic results. This parameter defaults to 1. (e.g., first=10 will move the 10th organic result to the first position).
bing count string Parameter controls the number of results per page. Minimum: 1, Maximum: 50. This parameter is only a suggestion and might not reflect actual results returned.
bing_news first string Parameter controls the offset of the organic results. This parameter defaults to 1. (e.g., first=10 will move the 10th organic result to the first position).
bing_news count string Parameter controls the number of results per page. This parameter is only a suggestion and might not reflect actual results returned.
bing_images first string Parameter controls the offset of the organic results. This parameter defaults to 1. (e.g., first=10 will move the 10th organic result to the first position).
bing_images count string Parameter controls the number of results per page. This parameter is only a suggestion and might not reflect the returned results.
baidu pn string Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.).
baidu rn string Parameter defines the maximum number of results to return, limited to 50. (e.g., 10 (default) returns 10 results, 30 returns 30 results, and 50 returns 50 results). This parameter is only available for desktop and tablet searches.
baidu_news pn string Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.).
baidu_news rn string Parameter defines the maximum number of results to return, limited to 50. (e.g., 10 (default) returns 10 results, 30 returns 30 results, and 50 returns 50 results).
Offset + page, e.g. google_product
Engine Param Type Description
google_product start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 0 (default) is the first page of results, 10 is the 2nd page of results, 20 is the 3rd page of results, etc.) This parameter works only for Google Online Sellers and Reviews.
google_product page string Parameter defines the page number for Google Online Sellers and Reviews. There are 10 results per page. This parameter is equivalent to start (offset) = page * 10. This parameter works only for Google Online Sellers and Reviews.
Page + size, e.g. ebay, walmart
Engine Param Type Description
ebay _pgn string Parameter defines the page number. It’s used for pagination. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.).
ebay _ipg string Parameter defines the maximum number of results to return. There are total of four options: 25, 50 (default), 100 and 200 results.
walmart page string Value is used to get the items on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.). Maximum page value is 100.
walmart ps number Determines the number of items per page. There are scenarios where Walmart overrides the ps value. By default Walmart returns 40 results.
apple_app_store num string Parameter defines the number of results you want to get per each page. It defaults to 10. Maximum number of results you can get per page is 200. Any number greater than maximum number will default to 200.
apple_app_store page string Parameter is used to get the items on a specific page. (e.g., 0 (default) is the first page of results, 1 is the 2nd page of results, 2 is the 3rd page of results, etc.).
Offset + page + size, e.g. yahoo_shopping, home_depot
Engine Param Type Description
yahoo_shopping start number Parameter defines the result offset. It skips the given number of results. It’s used for pagination. (e.g., 1 (default) is the first page of results, 60 is the 2nd page of results, 120 is the 3rd page of results, etc.).
yahoo_shopping limit number Parameter defines the maximum number of results to return. (e.g., 10 (default) returns 10 results, 40 returns 40 results, and 100 returns 100 results).
yahoo_shopping page string The page parameter does the start parameter math for you! Just define the page number you want. Pagination starts from 1.
home_depot nao string Defines offset for products result. A single page contains 24 products. First page offset is 0, second -> 24, third -> 48 and so on.
home_depot page string Value is used to get the items on a specific page. (e.g., 1 (default) is the first page of results, 2 is the 2nd page of results, 3 is the 3rd page of results, etc.).
home_depot ps number Determines the number of items per page. There are scenarios where Home depot overrides the ps value. By default Home depot returns 24 results.
naver start number Parameter controls the offset of the organic results. This parameter defaults to 1 (except for the web). (e.g. The formula for all searches except the web is start = (page number * 10) - 9 e.g. Page number 3 (3 * 10) - 9 = 21) The formula for the web will be start = (page number * 15) - 29 e.g. Page number 3 (3 * 15) - 29 = 16.
naver num string Parameter defines the maximum number of results to return. 50 (default) returns 50 results. Maximum number of results to return is 100.Parameter can only be used with Naver Images API.
naver page string The page parameter does the start parameter math for you! Just define the page number you want. Pagination starts from 1.
Token only, e.g. google_scholar_profiles, google_play
Engine Parameter Type Description
google_scholar_profiles after_author string Parameter defines the next page token. It is used for retrieving the next page results. The parameter has the precedence over before_author parameter.
google_scholar_profiles before_author string Parameter defines the previous page token. It is used for retrieving the previous page results.
google_maps_photos next_page_token string Parameter defines the next page token. It is used for retrieving the next page results. 20 results are returned per page.
google_maps_reviews next_page_token string Parameter defines the next page token. It is used for retrieving the next page results.Usage of start parameter (results offset) has been deprecated by Google.
google_play next_page_token string Parameter defines the next page token. It is used for retrieving the next page results.

Possible approaches

The key question is how we might abstract the pagination logic in a manner that makes using SerpApi simpler and more ergonomic.

Approach 1: New function

  • New function getPaginatedJson that can be looped over.
const organicResults = [];
for await (const page of getPaginatedJson("google", { q: "coffee", start: 15 })) {
  organicResults.push(...page.organic_results);
  if (organicResults.length >= 50) break;
}

Pros

  • Types are clean.
  • Iterating over the function to get multiple page results is nice.

Cons

  • New function, might be confusing.
  • Not very ergonomic since if you want to get the next page, you need to call a different function.
  • Does not support callbacks.

Approach 2: Next method

  • getJson returns the results object that includes a next method.
const organicResults = [];
let page = await getJson("google", { q: "coffee", start: 15 });
while (page) {
  organicResults.push(...page.organic_results);
  if (organicResults.length >= 50) break;
  page = await page.next();
}

Pros

  • Ergonomic since if you want to get the next page, you can just call the next method on the result object.
  • Not a breaking change to existing implementations that use getJson.
  • Simpler to understand than using a brand new function.
  • Supports callbacks.

Cons

  • Cannot iterate over it to get multiple page results.

Approach 3: Magic?

  • getJson returns the results object as per normal.
  • If looped over, then it returns each page's results.
// calling once works
await getJson("google", { q: "coffee", start: 15 });

// calling within a loop works too
for await (const page of await getJson("google", { q: "coffee", start: 15 })) {
  organicResults.push(...page.organic_results);
  if (organicResults.length >= 50) break;
}

Pros

  • Works for single calls or when called in a loop.
  • Iterating over the function to get multiple page results is nice.
  • Not a breaking change to existing implementations that use getJson.
  • Simpler to understand than using a brand new function.

Cons

  • Does not support callbacks.
  • There are 2 awaits in the loop, might be confusing.
    • This is required because getJson returns a Promise that needs to be awaited to return an object that contains the fetched results and also the instructions necessary to continue the async loop. i.e. returns an async iterable object
  • Types are a little strange as it includes a [Symbol.asyncIterator] key which is required for the loop to work.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions