<h1 style="text-align: center;">Scraping facebook group posts with <i><u>facebook_scraper</u></i> library</h1>

* Aitrouga Abdelilah
* Bastos Otmane
* Bayacine Jamal
* Ouomar Abdessamad

Data scraping, also known as web scraping, is the process of automatically extracting data from websites or other sources. This data can be used for a variety of purposes, such as **research**, **analysis**, and **machine learning**.

There are a number of tools and techniques used for data scraping, including web scraping libraries and frameworks, such as Beautiful Soup and Scrapy. In this work we're going to use **facebook_scraper** to extract data from facebook groups and use it for social network analysis.

**facebook_scraper** is a tool that allow developers to easily extract and manipulate data from Facebook.

However, it is important to keep in mind that data scraping can potentially **violate Facebook terms of service**, and it is essential to be mindful of ethical considerations when scraping data from it.

Facebook scraper link :
https://pypi.org/project/facebook-scraper/

In [None]:
from facebook_scraper import *
import numpy as np
import pandas as pd

In [None]:
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [None]:
# If the group is private, enter your email and password

em = "e_mail@email.com"
pw = "password"
use_persistent_session(email=em,password=pw, cookies_file_path="cookies.pckl")

In [None]:
# Ids or URLs of the groups that you want to scrape

url = ["ID1","ID2",..]
groups = []

# Lets get some informations about the groups

for u in url:
    group = get_group_info(group = u)
    groups.append(group)
groups_info = pd.DataFrame(groups)
groups_info

In [None]:
# Now lets get the posts of each group, this may take several minutes

posts1 = get_posts(group = groups_info['id'][0], pages=100, options={"comments":True,"progress": True,"reactors": True})
posts2 = get_posts(group = groups_info['id'][1], pages=100, options={"comments":True,"progress": True,"reactors": True})

In [None]:
# Extract posts

all_posts_grp1 = []
all_posts_grp2 = []
for pst1,pst2 in zip(posts1,posts2):
    all_posts_grp1.append(pst1)
    all_posts_grp2.append(pst2)
print("Nombre de posts trouver pour le group {} est : {}".format("ID1", len(all_posts_grp1)))
print("Nombre de posts trouver pour le group {} est : {}".format("ID2", len(all_posts_grp2)))

In [None]:
data_posts1 = pd.DataFrame(all_posts_grp1)
data_posts2 = pd.DataFrame(all_posts_grp2)

In [None]:
data_posts1

In [None]:
data_posts2

In [None]:
# Check how many null values we've got from scraping posts

print("--- Group ID : {} ---\n".format(groups_info['id'][0]))
print(data_posts1.isna().sum())
print("\n--- Group ID : {} ---\n".format(groups_info['id'][1]))
print(data_posts2.isna().sum())

In [None]:
# If you want to get information about users

profils = []
for i in data_posts['user_id']:
    profils.append(get_profile(str(i)))

In [None]:
data_profiles = pd.DataFrame(profils)
data_profiles

In [None]:
# If you want to get friends of a USER, it works only if the friends list is public

frnds = get_friends("USER_ID")
list(frnds)

### Saving data

In [None]:
from pathlib import Path  
filepath1 = Path('group1.csv')
filepath2 = Path('group2.csv')
filepath1.parent.mkdir(parents=True, exist_ok=True)
filepath2.parent.mkdir(parents=True, exist_ok=True) 
data_posts1.to_csv(filepath1)
data_posts2.to_csv(filepath2)