Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comment extraction improvements #225

Merged
merged 3 commits into from
Apr 27, 2021
Merged

Conversation

neon-ninja
Copy link
Collaborator

@neon-ninja neon-ninja commented Apr 26, 2021

This PR fixes the comment scope to prevent incorrectly extracting comments from related posts.
It also adds the ability to extract replies - threaded comments inline. These show up as a replies list within the top level comment. Sample output:

[{'available': True,
  'comments': 0,
  'comments_full': [{'comment_id': '3246783635424072',
                     'comment_text': 'You should be editing your things before '
                                     'posting',
                     'comment_time': datetime.datetime(2021, 4, 24, 0, 0),
                     'commenter_meta': None,
                     'commenter_name': 'Sebastian Moses Moyo',
                     'commenter_url': 'https://facebook.com/sebastian.mosesmoyo?fref=nf&rc=p&refid=52&__tn__=R',
                     'replies': [{'comment_id': '3247712881997814',
                                  'comment_text': 'Sebastian Moses Moyo HELO '
                                                  'MOKA',
                                  'comment_time': datetime.datetime(2021, 4, 24, 0, 0),
                                  'commenter_meta': None,
                                  'commenter_name': 'Muhammadh Fashaan',
                                  'commenter_url': 'https://facebook.com/fashanzak?fref=nf&rc=p&__tn__=R'}]},
                    {'comment_id': '3249390275163408',
                     'comment_text': 'Silver all the way',
                     'comment_time': datetime.datetime(2021, 4, 25, 0, 0),
                     'commenter_meta': None,
                     'commenter_name': 'Wakisa Nyondo',
                     'commenter_url': 'https://facebook.com/wakisa.nyondo.9?fref=nf&rc=p&refid=52&__tn__=R'},
                    {'comment_id': '3246798338755935',
                     'comment_text': '21 points from 13 games osati zanuzo',
                     'comment_time': datetime.datetime(2021, 4, 24, 0, 0),
                     'commenter_meta': None,
                     'commenter_name': 'Westom Jaguar Bika',
                     'commenter_url': 'https://facebook.com/westom.weruzani?fref=nf&rc=p&refid=52&__tn__=R'},
                    {'comment_id': '3246800085422427',
                     'comment_text': 'Mafco 21 games from 13 games what do u '
                                     'mean?',
                     'comment_time': datetime.datetime(2021, 4, 24, 0, 0),
                     'commenter_meta': None,
                     'commenter_name': 'Tsegula Time',
                     'commenter_url': 'https://facebook.com/tsegula.time.9?fref=nf&rc=p&refid=52&__tn__=R'},
                    {'comment_id': '3246908292078273',
                     'comment_text': 'Bulets ma poits angat',
                     'comment_time': datetime.datetime(2021, 4, 24, 0, 0),
                     'commenter_meta': None,
                     'commenter_name': 'Lloyd Sato',
                     'commenter_url': 'https://facebook.com/lloyd.sato.378?fref=nf&rc=p&refid=52&__tn__=R'},
                    {'comment_id': '3247896295312806',
                     'comment_text': 'Koma yea who teach u this',
                     'comment_time': datetime.datetime(2021, 4, 24, 0, 0),
                     'commenter_meta': None,
                     'commenter_name': 'Patrick Magombo',
                     'commenter_url': 'https://facebook.com/patrick.magombo.963?fref=nf&rc=p&refid=52&__tn__=R'},
                    {'comment_id': '3248769051892197',
                     'comment_text': 'Mussah Funsani',
                     'comment_time': datetime.datetime(2021, 4, 25, 0, 0),
                     'commenter_meta': None,
                     'commenter_name': '',
                     'commenter_url': 'https://facebook.com/ufi/reaction/profile/browser/?ft_ent_identifier=3246777818757987_3248769051892197&av=100022709408081&refid=52&__tn__=R'}],
  'factcheck': None,
  'image': None,
  'images': None,
  'is_live': False,
  'likes': 0,
  'link': None,
  'post_id': '3246777818757987',
  'post_text': 'TNM SUPER LEAGUE UPDATE\n'
               '\n'
               'League leaders, Silver Strikers will host Chitipa United at '
               'Silver Stadium on Sunday. The Bankers top the standings with '
               '28 points from 12 games.\n'
               '\n'
               'Chitipa United, who lost 1-0 to Kamuzu Barracks on Saturday at '
               'Civo Stadium, are on thirteenth position with 13 points from '
               '14 games.\n'
               '\n'
               'Blue Eagles, who are third from the bottom on the 16 '
               'member-log-table with 12 points from 13 games, will host Mafco '
               'FC at Nankhaka Stadium.\n'
               '\n'
               'Mafco are on 4th position with 21 games from 13 games as well. '
               '#MBCNewsLive',
  'post_url': 'https://facebook.com/story.php?story_fbid=3246777818757987&id=315802248522240',
  'reactors': None,
  'shared_post_id': None,
  'shared_post_url': None,
  'shared_text': '',
  'shared_time': None,
  'shared_user_id': None,
  'shared_username': None,
  'shares': 0,
  'text': 'TNM SUPER LEAGUE UPDATE\n'
          '\n'
          'League leaders, Silver Strikers will host Chitipa United at Silver '
          'Stadium on Sunday. The Bankers top the standings with 28 points '
          'from 12 games.\n'
          '\n'
          'Chitipa United, who lost 1-0 to Kamuzu Barracks on Saturday at Civo '
          'Stadium, are on thirteenth position with 13 points from 14 games.\n'
          '\n'
          'Blue Eagles, who are third from the bottom on the 16 '
          'member-log-table with 12 points from 13 games, will host Mafco FC '
          'at Nankhaka Stadium.\n'
          '\n'
          'Mafco are on 4th position with 21 games from 13 games as well. '
          '#MBCNewsLive',
  'time': datetime.datetime(2021, 4, 25, 5, 42),
  'user_id': '315802248522240',
  'username': 'MBC Malawi',
  'video': None,
  'video_id': None,
  'video_thumbnail': None,
  'w3_fb_url': None}]

As per #220

@neon-ninja neon-ninja changed the title Adjust comment scope to fix extracting comments from other posts when unauthenticated Comment extraction improvements Apr 27, 2021
@fashan7
Copy link
Contributor

fashan7 commented Apr 27, 2021

@neon-ninja when will it get merge
@kevinzg

@neon-ninja
Copy link
Collaborator Author

@fashandatafields if you don't want to wait, you can install this PR like so: pip install git+https://github.com/kevinzg/facebook-scraper.git@refs/pull/225/merge

@kevinzg kevinzg merged commit 0785331 into kevinzg:master Apr 27, 2021
@kevinzg
Copy link
Owner

kevinzg commented Apr 27, 2021

@fashandatafields it has been released as 0.2.30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants