Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force specific User Agent per crawl #461

Closed
paulgirard opened this issue May 9, 2022 · 0 comments
Closed

Force specific User Agent per crawl #461

paulgirard opened this issue May 9, 2022 · 0 comments

Comments

@paulgirard
Copy link
Member

paulgirard commented May 9, 2022

Hyphe get a random user agent for each crawl task from a webservice.
For some websites one might need to fix the user agent used by the crawler;
For instance website protected by cloudflare needs a cookie which is only valid for the user agent used to generate it.
Therefore for such websites, one needs to :

  • visit the website on a web browser and solve the potential captcha
  • get the cookie created and the user agent of the web browser used
  • set both the cookie and the user agent in the crawl config panel of this web entity in hyphe

So far setting the cookie is possible but not the User Agent.
One enhancement would be to add this parameter by crawl the same way than cookie.
The user agent settings at the crawl level would have precedence on the automatic random mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant