Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many public pages are not returning posts even when they are available #195

Closed
adarsh2104 opened this issue Apr 1, 2021 · 13 comments
Closed

Comments

@adarsh2104
Copy link

Many public pages are returning empty post lists even when many recent posts are published on official page.
Eg:
1.reidandtaylor (https://m.facebook.com/reidandtaylor/posts/)
2.zodiacclothing
3.TataHitachiCorporate (https://m.facebook.com/TataHitachiCorporate/posts/)
image

@DerekChia
Copy link

hi, i think there might be something wrong with your network. I'm getting the data with this code.

from facebook_scraper import get_posts

for post in get_posts("reidandtaylor", pages=10):
	print(post)
{'post_id': '1976083782410298', 'text': 'Presenting the Autumn/Winter 18 Collection by Reid & Taylor.', 'post_text': 'Presenting the Autumn/Winter 18 Collection by Reid & Taylor.', 'shared_text': '', 'time': datetime.datetime(2018, 6, 29, 14, 57, 3), 'image': None, 'images': None, 'video': 'https://scontent.fsin2-1.fna.fbcdn.net/v/t42.9040-4/36367831_265117740909488_355914310502842368_n.mp4?_nc_cat=100&ccb=1-3&_nc_sid=985c63&efg=eyJ2ZW5jb2RlX3RhZyI6InN2ZV9zZCJ9&_nc_ohc=ASqzkokET7UAX_dbujb&_nc_ht=scontent.fsin2-1.fna&oh=ac7a849020aa1e28426931662e2712a1&oe=60661AE6', 'video_thumbnail': 'https://scontent.fsin2-1.fna.fbcdn.net/v/t15.5256-10/cp0/e15/q65/s320x320/34835304_1976086299076713_5308380875489017856_n.jpg?_nc_cat=105&ccb=1-3&_nc_sid=ccf8b3&_nc_ohc=xu-7D_uRJEkAX_vpev9&_nc_ht=scontent.fsin2-1.fna&tp=9&oh=a93bc9814976995baa438dfc4b64f2e7&oe=6089E2E4', 'video_id': '1976053362413340', 'likes': 496, 'comments': 11, 'shares': 0, 'post_url': 'https://facebook.com/reidandtaylor/videos/1976053362413340', 'link': None, 'user_id': '163066417045386', 'username': 'Reid & Taylor', 'is_live': False, 'factcheck': None, 'shared_post_id': None, 'shared_time': None, 'shared_user_id': None, 'shared_username': None, 'shared_post_url': None, 'available': True, 'comments_full': None}
{'post_id': '4077918145560174', 'text': 'Comfortable, lightweight neutrals', 'post_text': 'Comfortable, lightweight neutrals', 'shared_text': '', 'time': datetime.datetime(2021, 3, 23, 20, 44, 13), 'image': 'https://scontent.fsin2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164072753_4077917345560254_1277089394854544074_n.jpg?_nc_cat=102&ccb=1-3&_nc_sid=8024bb&_nc_ohc=YejSQ1pHEyYAX-_uLf0&_nc_ht=scontent.fsin2-1.fna&tp=14&oh=2a65a92ebed753fe227d9eaa1239b43f&oe=608D6380', 'images': ['https://scontent.fsin2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/164072753_4077917345560254_1277089394854544074_n.jpg?_nc_cat=102&ccb=1-3&_nc_sid=8024bb&_nc_ohc=YejSQ1pHEyYAX-_uLf0&_nc_ht=scontent.fsin2-1.fna&tp=14&oh=2a65a92ebed753fe227d9eaa1239b43f&oe=608D6380'], 'video': None, 'video_thumbnail': None, 'video_id': None, 'likes': 19, 'comments': 1, 'shares': 0, 'post_url': 'https://facebook.com/reidandtaylor/posts/4077918145560174', 'link': None, 'user_id': '163066417045386', 'username': 'Reid & Taylor', 'is_live': False, 'factcheck': None, 'shared_post_id': None, 'shared_time': None, 'shared_user_id': None, 'shared_username': None, 'shared_post_url': None, 'available': True, 'comments_full': None}
...

@xobius
Copy link

xobius commented Apr 2, 2021

I had the same problem. When I change my ip, the posts are downloaded.

@neon-ninja
Copy link
Collaborator

Try pass cookies as per #28 (comment)

@adarsh2104
Copy link
Author

adarsh2104 commented Apr 7, 2021

Is there a limit to the number of requests that can be made? I have added proxy and user agent rotation to the HTML request made in the get function in facebook_scraper.py. But still, it returns an empty response for some time and after about an hour it begins to give back the response on the same pages. I am not passing any credentials with the requests.

@neon-ninja
Copy link
Collaborator

neon-ninja commented Apr 7, 2021

It would seem so, yes. If you scrape too hard, Facebook starts to serve the message "You're Temporarily Blocked" in the HTML. I even prepped some code like

if "Temporarily Blocked" in raw_page.text:
   logger.error("Temporarily blocked by Facebook")

But I realised it wouldn't do much good for the average user as the default log handler is NullHandler

@sunboy123
Copy link

It would seem so, yes. If you scrape too hard, Facebook starts to serve the message "You're Temporarily Blocked" in the HTML. I even prepped some code like

if "Temporarily Blocked" in raw_page.text:
   logger.error("Temporarily blocked by Facebook")

But I realised it wouldn't do much good for the average user as the default log handler is NullHandler

this problem could solving by ip proxy?

@neon-ninja
Copy link
Collaborator

It would seem so, yes. If you scrape too hard, Facebook starts to serve the message "You're Temporarily Blocked" in the HTML. I even prepped some code like

if "Temporarily Blocked" in raw_page.text:
   logger.error("Temporarily blocked by Facebook")

But I realised it wouldn't do much good for the average user as the default log handler is NullHandler

this problem could solving by ip proxy?

Yes

@adarsh2104
Copy link
Author

adarsh2104 commented Apr 22, 2021

Even after using HTTP IP proxy rotation + user agent rotations(fake user agent), I am still not able to prevent "You're Temporarily Blocked" in the HTML. I am not using any login credentials or cookies. Above 22 proxy addresses are valid and tested with the proxy checker package and verified by sending a request to "https://ipinfo.io" to verify if the proxy is successfully applied which I am using for rotation.

@neon-ninja
Copy link
Collaborator

Even after using HTTP IP proxy rotation + user agent rotations(fake user agent), I am still not able to prevent "You're Temporarily Blocked" in the HTML. I am not using any login credentials or cookies. Above 22 proxy addresses are valid and tested with the proxy checker package and verified by sending a request to "https://ipinfo.io" to verify if the proxy is successfully applied which I am using for rotation.

I think you must be scraping too hard anonymously. Either reduce the number of requests you're making per hour or pass cookies

@ghost
Copy link

ghost commented May 9, 2021

I have same issue: #245
Changing IP helps but after same time it starts returning nothing again,

I guess it's how FB works, if you visit some page with a new browser or incognito mode it may work but after some time it stops working and requires to login to view the page, even public pages.

@enaserianhanzaei
Copy link

Above 22 proxy addresses are valid and tested with the proxy checker package and verified by sending a request to
@adarsh2104

Could you please explain how did you add the proxy ? I have no idea why it doesn't work for me. I'm getting this error :

HTTPConnectionPool(host='178.212.54.137', port=8080): Max retries exceeded with url: http://ifconfig.co/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f800ab2de48>, 'Connection to 178.212.54.137 timed out. (connect timeout=5)'))

Many thanks,

@neon-ninja
Copy link
Collaborator

This proxy seems to trip some sort of cloudflare protection on ifconfig.co - give this a try - 43cecdd

@vcuspinera
Copy link

Hi adarsh2104,
I had the same problem until I notice that the FacebookScraper Class has a function for loging into your facebook account: login(self, email: str, password: str). Look at the row 959 of the Python script of the facebook_scraper library

So, in general what you should do is something like this:

# call Class from library
from facebook_scraper import FacebookScraper

# Create a class object
my_scrapy = FacebookScraper()

# login on facebook
my_scrapy.login(email="write_your_email_here", password="write_your_password_here")

# now we do not have problems to get posts from "reidandtaylor"
posts_bk = my_scrapy.get_posts("reidandtaylor", pages=3)

i = 0
for post in posts_bk:
    if i<3:
        print(post,"\n")
        i = i+1
    else:
        break

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants