-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many public pages are not returning posts even when they are available #195
Comments
hi, i think there might be something wrong with your network. I'm getting the data with this code.
|
I had the same problem. When I change my ip, the posts are downloaded. |
Try pass cookies as per #28 (comment) |
Is there a limit to the number of requests that can be made? I have added proxy and user agent rotation to the HTML request made in the get function in facebook_scraper.py. But still, it returns an empty response for some time and after about an hour it begins to give back the response on the same pages. I am not passing any credentials with the requests. |
It would seem so, yes. If you scrape too hard, Facebook starts to serve the message "You're Temporarily Blocked" in the HTML. I even prepped some code like if "Temporarily Blocked" in raw_page.text:
logger.error("Temporarily blocked by Facebook") But I realised it wouldn't do much good for the average user as the default log handler is |
this problem could solving by ip proxy? |
Yes |
Even after using HTTP IP proxy rotation + user agent rotations(fake user agent), I am still not able to prevent "You're Temporarily Blocked" in the HTML. I am not using any login credentials or cookies. Above 22 proxy addresses are valid and tested with the proxy checker package and verified by sending a request to "https://ipinfo.io" to verify if the proxy is successfully applied which I am using for rotation. |
I think you must be scraping too hard anonymously. Either reduce the number of requests you're making per hour or pass cookies |
I have same issue: #245 I guess it's how FB works, if you visit some page with a new browser or incognito mode it may work but after some time it stops working and requires to login to view the page, even public pages. |
Could you please explain how did you add the proxy ? I have no idea why it doesn't work for me. I'm getting this error : HTTPConnectionPool(host='178.212.54.137', port=8080): Max retries exceeded with url: http://ifconfig.co/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f800ab2de48>, 'Connection to 178.212.54.137 timed out. (connect timeout=5)')) Many thanks, |
This proxy seems to trip some sort of cloudflare protection on ifconfig.co - give this a try - 43cecdd |
Hi adarsh2104, So, in general what you should do is something like this:
|
Many public pages are returning empty post lists even when many recent posts are published on official page.
Eg:
1.reidandtaylor (https://m.facebook.com/reidandtaylor/posts/)
2.zodiacclothing
3.TataHitachiCorporate (https://m.facebook.com/TataHitachiCorporate/posts/)
The text was updated successfully, but these errors were encountered: