# 1. Script for extracting FB data

## Set Up

In [None]:
from facebook_scraper import get_posts
import numpy as np
import pandas as pd
import requests
from time import sleep
import datetime

In [None]:
pd.set_option('display.max_colwidth', None)

## Extracting comments

### Import selected FB posts data

As mentioned in the script to extract Reddit data, I was only interested in extracting posts from Facebook and Reddit that contained the same article from Straits Times/Channel News Asia reporting on Ministry of Health (MOH) announcements about imposition/tightening of restrictions. The links to Facebook posts with matching Reddit posts were manually input into `posts_reddit_fb_selected.csv`.

In [None]:
fb_selected = pd.read_csv('../data/posts_reddit_fb_selected.csv')

In [None]:
fb_selected

Since there are repeated links in the `full_link_fb` column, I will get the set of unique links and output it as a list.

In [None]:
fb_links = list(fb_selected['full_link_fb'].unique())
print(len(fb_links))
print(fb_links[0])

### Get comments for each post

I used the [facebook-scraper Python package](https://pypi.org/project/facebook-scraper/) to extract all the comments from each post using the URL for the post.

To better understand how the results were structured, I extracted comments from the first post in the list:

In [None]:
# import json
# import logging

# from facebook_scraper import get_posts, enable_logging

# enable_logging(logging.DEBUG)
# logging.basicConfig(filename="logs.txt", filemode='w', level=logging.DEBUG)

In [None]:
posts = get_posts(post_urls=[fb_links[0]], 
                  cookies='from_browser', 
                  options={'comments': True, 'allow_extra_requests': False, 'posts_per_page': 200})

for p in posts:
    print(p)

In [None]:
posts = get_posts(post_urls=[fb_links[0]], 
                  cookies='from_browser', 
                  options={'comments': True, 'allow_extra_requests': False, 'posts_per_page': 200})

for p in posts:
    print(p['comments'])
    print(len(p['comments_full']))

From the above we can see that:
- According to the `comments` parameter, there are 502 comments for the first post.
- Data on each comment is a nested json under the `comments_full` parameter. There are only 252 comments in `comments_full`, so these are top-level comments and the rest of the comments are comment replies nested in the `replies` parameter under each comment in `comments_full`.
- Comment replies have to be extracted by iterating through each comment in `comments_full`, but this will be extremely time consuming and tricky as Facebook has tight restrictions on scraping behaviour that makes it necessary to introduce long sleep times between each comment/reply extraction to prevent account banning. As such, I will **not** extract comment replies and only extract top level comments.

The following code extracts comments from the 29 Facebook posts, 5 posts at a time.

In [None]:
comments = {'comment_id': [], 'text': []}

idx = 0
while (idx < 5):
    sample = [fb_links[idx]]
    idx = idx+1
    for post in get_posts(post_urls=sample,
                          cookies='from_browser',
                          timeout=180,
                          options={'comments': 'generator', 'progress': True, 'allow_extra_requests': False, 'posts_per_page': 200}):
        
        comments_full = post['comments_full']
        
        for comment in comments_full:
            comments['comment_id'].append(str(comment['comment_id']))
            comments['text'].append(str(comment['comment_text']))
            sleep(3)
    
    sleep(30)

In [None]:
comments_df_1 = pd.DataFrame(comments)
comments_df_1.to_csv('../data/fb_comments_1.csv', encoding='utf-8-sig')

In [None]:
comments_df_1.head()

### Concatenating all comments

In [None]:
comments_df_1 = pd.read_csv('../data/comments_fb_1.csv', index_col=0)
comments_df_2 = pd.read_csv('../data/comments_fb_2.csv', index_col=0)
comments_df_3 = pd.read_csv('../data/comments_fb_3.csv', index_col=0)
comments_df_4 = pd.read_csv('../data/comments_fb_4.csv', index_col=0)
comments_df_5 = pd.read_csv('../data/comments_fb_5.csv', index_col=0)
comments_df_6 = pd.read_csv('../data/comments_fb_6.csv', index_col=0)

In [None]:
comments_df_list = [comments_df_1, comments_df_2, comments_df_3, comments_df_4, comments_df_5, comments_df_6, comments_df_7,
                   comments_df_8, comments_df_9]
comments_dfs = pd.concat(comments_df_list, axis=0, ignore_index=True)

In [None]:
comments_dfs.shape

In [None]:
comments_dfs.to_csv('../data/comments_fb_all.csv', encoding='utf-8-sig')