Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DuckDuckGo Search API] serpapi_pagination.next doesn't take into account the current offset of results #619

Open
ilyazub opened this issue Feb 2, 2023 · 1 comment
Labels
status: freezer Something we don't want to work on yet type: bug Something is broken

Comments

@ilyazub
Copy link

ilyazub commented Feb 2, 2023

serpapi_pagination.next doesn't take into account the current offset of results. (Search Inspect page.)

curl -s --compressed https://serpapi.com/search?engine=duckduckgo&q=Coffee&kl=us-en&start=76&api_key=$API_KEY | jq -r '.serpapi_pagination.next'
https://serpapi.com/search.json?engine=duckduckgo&kl=us-en&q=Coffee&start=29

Expected result

serpapi_pagination.next contains query parameter start=105.

Actual result

serpapi_pagination.next contains query parameter start=29.

@ilyazub ilyazub added status: queued Ready to work on status: freezer Something we don't want to work on yet and removed status: queued Ready to work on labels Feb 2, 2023
ilyazub added a commit to serpapi/google-search-results-python that referenced this issue Feb 2, 2023
DuckDuckGo tests are failing because DuckDuckGo pagination doesn't take
into account an offset of current results:
serpapi/public-roadmap#619

Co-authored-by: Dimitry <dimitry@serpapi.com>
jvmvik pushed a commit to serpapi/google-search-results-python that referenced this issue May 1, 2023
…client (#30)

* Use pagination parameters from SerpApi instead of calculating on the client

`start` and `num` parameters are not suitable for token-based
pagination. Such pagination is used on Google Maps, YouTube, Google
Scholar Authors, and other search engines.

This commit consumes URL query parameters for the next page. It stops
paginating when parameters not change.

Details: #22

Some tests are failing because `start` and `num` parameters are not supported
anymore. These tests will be fixed in the following commits.

* Add pagination tests for Bing, Baidu, and DuckDuckGo search API clients

* Fix typo in SerpApi name in documentation

* Add more pagination tests

All of the tests follow the same pattern. Limit number of pages,
iterate, and check for duplicates in the results. This is to make sure
that pagination actually changes pages.

* Test pagination for Naver and HomeDepot

* Stop pagination when SerpApi backend doesn't update parameters

* Fix flake8 linting errors

Example errors: https://github.com/serpapi/google-search-results-python/runs/6659757610?check_suite_focus=true#step:5:37

* Lint code via `make lint`

Currently linting script exists only in GitHub Action: `.github/workflows/python-package.yml`.

This commit wraps that script in Makefile and invokes in an Action.

* fix(tests): fix failing integration tests

DuckDuckGo tests are failing because DuckDuckGo pagination doesn't take
into account an offset of current results:
serpapi/public-roadmap#619

Co-authored-by: Dimitry <dimitry@serpapi.com>

* perf: run pytest in parallel

Sample output:

  platform linux -- Python 3.10.9, pytest-7.2.1, pluggy-1.0.0
  rootdir: /home/ilyazub/Workspace/google-search-results-python
  plugins: parallel-0.1.1
  collected 48

  pytest-parallel: 8 workers (processes), 6 tests per worker (threads)

`py` dependency is used because pytest-parallel depends on it but
doesn't require 😕
kevlened/pytest-parallel#118

Co-authored-by: Dimitry <dimitry@serpai.com>

* style: don't lint vendor packages with Flake8

Co-authored-by: Dimitry <dimitry@serpapi.com>

* docs: fix minor typos in documentation

Co-authored-by: Dimitry <dimitry@serpapi.com>

* ci: cache pip dependencies

Support Python 3.7+ based on the readme:
https://github.com/serpapi/google-search-results-python/blob/35e51c94e7243c29650ed7b630db4e4e6d0c61aa/README.md#L18

Co-authored-by: "dimitryzub <dmitriy@serpapi.com>"

---------

Co-authored-by: Dimitry <dimitry@serpapi.com>
Co-authored-by: Dimitry <dimitry@serpai.com>
@hilmanski
Copy link

I recently tested this.

The first page, shows the wrong total number of organic_results:
Inspect 1st page
It displayed start=27 on the pagination link, while there were only 17 results.

The second page shows 50 results but shows start=50 on the next_pagination. Where I assume it should be 50+27
Inspect 2nd page

@aciddjus aciddjus added the type: bug Something is broken label Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: freezer Something we don't want to work on yet type: bug Something is broken
Projects
None yet
Development

No branches or pull requests

3 participants