Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add extra useragent. #152

Closed
fazialnjd opened this issue Nov 14, 2022 · 4 comments
Closed

add extra useragent. #152

fazialnjd opened this issue Nov 14, 2022 · 4 comments

Comments

@fazialnjd
Copy link

fazialnjd commented Nov 14, 2022

Hi.
My crawling process includes many requests and despite using a fake, my IP is still blocked.
Please add more fake users like iPhone and Android devices fake user agent.
For example, look at the fake useragents on this site:

And please add the ability to delete a fake useragent from list of fake useragents; in order to prevent this fake user from being used again;
and to avoid being blocked.
Thankful

@melroy89
Copy link
Collaborator

melroy89 commented Nov 14, 2022

Maybe you should also try to limit the requests / seconds / minutes you do. Since your IP is banned now, no fake useragent strings will help you with that.

If you are using scrapy framework for example, you have an option like DOWNLOAD_DELAY:

The amount of time (in secs) that the downloader should wait before downloading consecutive pages from the same website. This can be used to throttle the crawling speed to avoid hitting servers too hard.

See also another scrapy option called CONCURRENT_REQUESTS_PER_DOMAIN.

If however you use your own scripting without scrapy, consider adding sleeps to your crawling process.

@melroy89
Copy link
Collaborator

Also are you using Amazon AWS?

@fazialnjd
Copy link
Author

Also are you using Amazon AWS?No. I am not.

I use the googlesearch library python, which is based on requests and beautifulsoup; And I have also used time.sleep.
Actually; I have an API that receives almost 200 Google page links per request, and I get blocked with more requests.
The IP will be blocked for a few hours, and after that you can request it again.(The duration of the blocking is not known)

I am trying to prevent IP banning by using fake useragent and proxy.
The number of your fake useragent is 260, (and I choose them randomly); while some fake useagents may be used several times, so I need more fake useragent;
I wish the number could be increased to 500.

thanks for the help.

@melroy89
Copy link
Collaborator

Related to: #109
and: #61

We want to switch to another source and also add mobile platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants