Skip to content

Remove 'secure' attribute from cookies in ProxyGenerator to ensure co…#581

Open
Luen wants to merge 5 commits intoscholarly-python-package:developfrom
Luen:unexpected-keyword-arg
Open

Remove 'secure' attribute from cookies in ProxyGenerator to ensure co…#581
Luen wants to merge 5 commits intoscholarly-python-package:developfrom
Luen:unexpected-keyword-arg

Conversation

@Luen
Copy link

@Luen Luen commented Feb 25, 2026

Fix cookie transfer after CAPTCHA: strip secure before httpx Cookies.set()

When Google Scholar returns a CAPTCHA, the flow uses a browser to solve it, then copies cookies from the browser (Selenium) back into the httpx session so later requests use the same session. That copy step was raising:
TypeError: Cookies.set() got an unexpected keyword argument 'secure'
After that, the code retried with a new session, so the solved CAPTCHA session was effectively lost.

In _proxy_generator.py, _handle_captcha2() does:
Wait for the user to solve the CAPTCHA in the browser.
Call self._get_webdriver().get_cookies() (Selenium), which returns dicts that include browser-specific keys such as httpOnly, expiry, sameSite, and secure.
Strip only httpOnly, expiry, and sameSite, then call self._session.cookies.set(**cookie) (httpx).
httpx’s Cookies.set() does not accept a secure keyword. Passing the Selenium cookie dict as **cookie therefore included secure and triggered the TypeError.

Before calling self._session.cookies.set(**cookie), remove the secure key from each cookie dict, in the same way as the other non-httpx fields:
In _handle_captcha2(), add: cookie.pop("secure", None) after the existing pop calls for httpOnly, expiry, and sameSite.
With this, only supported arguments are passed to Cookies.set(), the cookie transfer completes, and the session continues to work after CAPTCHA solving without retrying due to this error.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes cookie handoff from Selenium → httpx after CAPTCHA solving by removing unsupported cookie fields before calling httpx.Cookies.set(), preventing the solved CAPTCHA session from being discarded due to an exception.

Changes:

  • Strip the secure attribute from Selenium cookie dicts before passing them to httpx.Cookies.set().
  • Update test setup to skip proxy-dependent tests when no working free proxy is available.
  • Adjust proxy initialization logic in ProxyGenerator._new_session() to derive an httpx.Client(proxy=...) value from the proxy dict.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
test_module.py Adds exception handling to skip tests when free proxies cannot be established; adds imports for proxy-related exceptions.
scholarly/_proxy_generator.py Filters Selenium cookies to avoid passing unsupported kwargs to httpx; updates proxy initialization behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 481 to +486
if self._proxy_works:
init_kwargs["proxies"] = proxies #.get("http", None)
self._proxies = proxies
# httpx uses proxy= (single URL), not proxies= (dict)
proxy_url = proxies.get("https://") or proxies.get("http://")
if proxy_url:
init_kwargs["proxy"] = proxy_url
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _new_session(), proxy handling still relies on converting a proxies dict into a single proxy URL. Be careful about callers that pass only {'http://': ...} (e.g., ScraperAPI()), because the later conversion logic indexes the https:// entry and can raise KeyError. Suggested fix: normalize/validate the dict here (ensure both schemes exist) or choose https:// with a fallback to http:// when deriving the proxy URL.

Copilot uses AI. Check for mistakes.
Comment on lines +483 to +486
# httpx uses proxy= (single URL), not proxies= (dict)
proxy_url = proxies.get("https://") or proxies.get("http://")
if proxy_url:
init_kwargs["proxy"] = proxy_url
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description focuses on stripping the secure cookie attribute, but this hunk also changes how proxy configuration is passed into httpx.Client (switching behavior around proxies vs proxy). Please either update the PR description to cover this additional behavior change or split it into a separate PR so it can be reviewed/validated independently.

Copilot uses AI. Check for mistakes.
ipeirotis and others added 2 commits March 23, 2026 23:40
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@ipeirotis
Copy link
Collaborator

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a563e1861

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +454 to +455
if "expiry" in cookie:
cookie_kwargs["expires"] = cookie["expiry"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unsupported expires mapping from captcha cookie copy

In _handle_captcha2, the new expiry -> expires mapping is passed into self._session.cookies.set(**cookie_kwargs), but httpx.Cookies.set only supports name, value, domain, and path. When Selenium returns cookies that include expiry (common for persistent cookies), this still raises TypeError (now for expires) and prevents transferring the solved CAPTCHA session back to httpx.

Useful? React with 👍 / 👎.

@ipeirotis
Copy link
Collaborator

@claude review

@ipeirotis
Copy link
Collaborator

@copilot open a new pull request to apply changes based on the comments in this thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants