Analysis of Facebook recommended (related) pages

This project is concerned with extracting data from Facebook pages and mining their divergent attitudes towards climate change. The project is part of my girlfriend's thesis, where I helped with scraping the data and then processing it. As the code was not expected to be reused, not much care was given to its cleanliness and readability (tbh, it is ugly). 😥 The thesis is available here, unfortunately only in Czech. The English abstract is available here. In short, the thesis tries to indirectly analyze Facebook's page recommendation algorithm and its effect on the creation of information bubbles in relation to the climate crisis.

Summary of steps

For each initially selected site (there were 20 in total), all recommended pages were scraped
Step 1. was repeated also for the scraped pages, i.e., two rounds of scraping was done
Pages that was not relevant to climate change were removed. We used a custom tf-idf-like score and hand-picked threshold
Rest of the climate change-related pages were manually annotated, i.e., their attitude towards climate change
A simple analysis of these annotated pages and their relationships was performed to see if FB's recommendation algorithm helps with breaking information bubbles

Result

It's not that bad. FB is trying to recommend non-climate change denial pages a little more often.

Future work

There is a plenty of space for improvement. Some simplifying assumptions were used in this project, for example, each of the recommended pages is considered to be equivalent. However, to get to the last recommended page, the user has to click through. So, obviously, those are not equivalent and should be reweighted.

Repository description:

data

init_page_label.csv: Manually annotated initial 20 pages based on relation to climate change.
labeled_pages.csv: Manually annotated pages (recommended by FB from init_page_label) based on relation to climate change.
labeled_posts.csv: Posts for relevant pages downloaded using CrowdTangle and labeled by their stance to climate change.
uniq_links1.txt: List of unique initial pages (their url link)
uniq_links2.txt: List of unique recommended pages for the uniq_links1 pages (their url link)
uniq_relation_data1.csv: Initial pages and their related (recommended) pages. The first column are initial pages and the rest of the columns are recommended.
uniq_relation_data2.csv: Recommended pages for initial pages and recommended for that recommended pages. The first column are recommended for initial pages and the rest of the columns are recommended of recommended. For better understaing see the image Sber dat below.

src

notebooks: directory with various experiments and calculations

10-data-wrangling-analysis.ipynb: various data wrangling, simple analysis and csv files preparation
20-page-classification.ipynb: calculation of relevancy scores and choosing only the climate change relevant pages
30-visualize.ipynb: plotly graph visualizations initial attempts
3*-visualize-labeled-pages*.ipynb: plotly graph visualizations final
experimental-crowdtangle-posts.ipynb: experimental notebook - looking at the data downdloaded from CrowdTangle
experimental-text-analysis.ipynb: experimental notebook - text analysis playground (not used at the end)

scrapers: directory with scrapers

pages_content.py: script for scraping the posts for given pages
related_pages.py: script for scraping the relations (recommendations) between the pages

text: directory with text and figures of diploma thesis
interactive_graph_labeled.html: several versions of interactive plotly graph visualization of pages relations

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
img		img
src		src
text		text
README.md		README.md
interactive_graph_labeled.html		interactive_graph_labeled.html
interactive_graph_labeled_v2.1.html		interactive_graph_labeled_v2.1.html
interactive_graphs.html		interactive_graphs.html
requirements.yml		requirements.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

src

src

text

text

README.md

README.md

interactive_graph_labeled.html

interactive_graph_labeled.html

interactive_graph_labeled_v2.1.html

interactive_graph_labeled_v2.1.html

interactive_graphs.html

interactive_graphs.html

requirements.yml

requirements.yml

Repository files navigation

Analysis of Facebook recommended (related) pages

Summary of steps

Result

Future work

Repository description:

Data collection scheme

About

Releases

Packages

Languages

ryparmar/fb-related-pages

Folders and files

Latest commit

History

Repository files navigation

Analysis of Facebook recommended (related) pages

Summary of steps

Result

Future work

Repository description:

Data collection scheme

About

Resources

Stars

Watchers

Forks

Languages