Unable to collect posts beyond a certain number #285

chenxingyuzealken · 2021-05-24T04:54:57Z

Hi wondering how can i solve this issue where I get only 2000+ posts from doing this. I know there are more posts, but I'm not getting them.

opt = {}
opt['daterange'] = False #set to True if you want to limit your search by startDate and endDate
opt['startDate'] = datetime.strptime('01/05/19 00:00:00', '%m/%d/%y %H:%M:%S') #change the time ranges to what you want
opt['endDate'] = datetime.strptime('12/06/22 00:00:00', '%m/%d/%y %H:%M:%S') #change the time ranges to what you want

page_name = 'ChannelNewsAsia'
fbcookies = { cookie details}
lst = []

for post in get_posts(page_name, cookies=fbcookies, pages=1000000,options={"allow_extra_requests": False}):

if opt['daterange'] == True:    
    #opt['demo'] = False
    
    if post['time'] < opt['startDate']:
        print('post collection is earlier than',opt['startDate'],'stopping collection' )
        if opt['demo'] == False:
            break
    
    if post['time'] > opt['endDate']:
        print('post collection is after',opt['endDate'],'stopping collection' )
        if opt['demo'] == False:
            break
    lst.append(post)

It stopped collecting stuff at 2020, which is odd for me

The text was updated successfully, but these errors were encountered:

neon-ninja · 2021-05-24T06:10:38Z

Try increasing the posts_per_page option

neon-ninja · 2021-05-25T02:37:05Z

I added some code to retry pagination requests on error (1cc8064), and with that, and this test code:

start = time.time()
posts = []
try:
    for post in get_posts("ChannelNewsAsia", cookies="cookies.txt", pages=200, timeout=60, options={"allow_extra_requests": False, "posts_per_page": 200}):
        posts.append(post)
except:
    print(f"{len(posts)} posts retrieved in {round(time.time() - start)}s. Oldest post: {posts[-1].get('time')}")

I get

14201 posts retrieved in 910s. Oldest post: 2013-12-12 13:05:00

neon-ninja · 2021-05-25T04:13:14Z

717d522 might also be useful to resume from the last cursor that errored out, see #287 (comment) for usage

chenxingyuzealken · 2021-05-25T05:51:15Z

Thanks! I think the combination of:

from facebook_scraper import *
import pandas as pd
import ast
import time
from datetime import datetime

import requests
import logging
enable_logging(logging.DEBUG)

start = time.time()
posts = []
try:
for post in get_posts("ChannelNewsAsia", cookies="cookies.txt", pages=200, timeout=60, options={"allow_extra_requests": False, "posts_per_page": 200}):
posts.append(post)
except:
print(f"{len(posts)} posts retrieved in {round(time.time() - start)}s. Oldest post: {posts[-1].get('time')}")

and this:

cursor = " some url from the loggin output."

posts = []
try:
for post in get_posts("ChannelNewsAsia", cookies="cookies.txt", pages=200, timeout=60, options={"allow_extra_requests": False, start_url=cursor, "posts_per_page": 200}):
posts.append(post)

have helped in making the process more robust.

Thanks for the help! Your project is amazing!

neon-ninja · 2021-05-26T08:51:23Z

#291 might be useful

chenxingyuzealken changed the title ~~Unable to collect beyond a certain number~~ Unable to collect posts beyond a certain number May 24, 2021

chenxingyuzealken closed this as completed May 25, 2021

neon-ninja mentioned this issue May 29, 2021

500 Server Error: Internal Server Error for url #299

Closed

neon-ninja mentioned this issue Jul 8, 2021

TemporarilyBanned exception not being caught #385

Open

neon-ninja mentioned this issue Jan 6, 2022

Download > 10.000 Posts #625

Open

blaterwolf mentioned this issue Feb 23, 2022

Scraping Posts on a Page breaks post_text resulting blank outputs #687

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to collect posts beyond a certain number #285

Unable to collect posts beyond a certain number #285

chenxingyuzealken commented May 24, 2021 •

edited

Loading

neon-ninja commented May 24, 2021

neon-ninja commented May 25, 2021 •

edited

Loading

neon-ninja commented May 25, 2021 •

edited

Loading

chenxingyuzealken commented May 25, 2021

neon-ninja commented May 26, 2021

Unable to collect posts beyond a certain number #285

Unable to collect posts beyond a certain number #285

Comments

chenxingyuzealken commented May 24, 2021 • edited Loading

neon-ninja commented May 24, 2021

neon-ninja commented May 25, 2021 • edited Loading

neon-ninja commented May 25, 2021 • edited Loading

chenxingyuzealken commented May 25, 2021

neon-ninja commented May 26, 2021

chenxingyuzealken commented May 24, 2021 •

edited

Loading

neon-ninja commented May 25, 2021 •

edited

Loading

neon-ninja commented May 25, 2021 •

edited

Loading