Skip to content

Support pagination for getJson#3

Merged
sebastianquek merged 11 commits intomasterfrom
support-pagination
Jan 18, 2023
Merged

Support pagination for getJson#3
sebastianquek merged 11 commits intomasterfrom
support-pagination

Conversation

@sebastianquek
Copy link
Copy Markdown
Contributor

@sebastianquek sebastianquek commented Jan 11, 2023

Handles #2

This PR implements "Approach 2: Next method" as mentioned in the above issue. Main reasons are that it supports callbacks and it can be used to create an approach similar to Approach 3.

Note that this PR doesn't cover pagination support for getJsonBySearchId.

Tests

I've added tests for 6 of the 7 types of pagination approaches:

  • Offset only: google_maps
  • Page only: apple_reviews
  • Offset + size: baidu
  • Page + size: ebay
  • Offset + page + size: home_depot
  • Token only: google_scholar_profiles
  • Offset + page: The only engine that uses this approach, google_product, relies on an approach that is similar to the token only approach.

Additional notes

  • Pagination using the .next() method is currently only supported by engines that respond with a serpapi_pagination.next property.

    • E.g. google_jobs allows for pagination via the start offset param, but currently does not return a serpapi_pagination.next property. For these cases, you need to send in the start offset param manually.
    • There is 1 edge case where pagination.next is used instead. This is for the google_scholar_profiles engine.
  • Although yahoo_shopping returns a serpapi_pagination.next property, I've excluded it as there are currently some issues with it

  • There is a check if the next parameters are equal to the current parameters. This check is based on this issue: [Ebay Search API] Pagination returns the same results  public-roadmap#144.

    • I couldn't replicate it for Ebay, but nonetheless, will keep it in as a safeguard.
    • I found that duckduckgo has a similar behaviour and have added tests for it.
    • For duckduckgo, there's a quirk as it has a default parameter. This causes one extra call to be made. Notice the initial params are { q: "coffee", start: 30 } but the next params extracted from page1 are { kl: "us-en", q: "coffee", start: 30}. This difference means that page1.next is returned. It's only after obtaining page2 that the current params (which includes kl) and next params match, implying there's no next page.
    • Sometimes duckduckgo oscillates between different next parameters:
      { q: "coffee", start: 30 }
      { kl: "us-en", q: "coffee", start: "27" }
      { kl: "us-en", q: "coffee", start: "50" }
      { kl: "us-en", q: "coffee", start: "25" }
      { kl: "us-en", q: "coffee", start: "33" }
      { kl: "us-en", q: "coffee", start: "27" }
      { kl: "us-en", q: "coffee", start: "50" }
      { kl: "us-en", q: "coffee", start: "25" }
      { kl: "us-en", q: "coffee", start: "33" }
      

@sebastianquek sebastianquek self-assigned this Jan 11, 2023
@sebastianquek sebastianquek force-pushed the support-pagination branch 29 times, most recently from f8a618e to 9f4a96d Compare January 17, 2023 09:16
@sebastianquek sebastianquek force-pushed the support-pagination branch 2 times, most recently from fd971a9 to b453ac7 Compare January 17, 2023 10:11
@sebastianquek sebastianquek merged commit 02e1539 into master Jan 18, 2023
@sebastianquek sebastianquek deleted the support-pagination branch January 18, 2023 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant