#  Fact-Checking Facebook Politics Pages — Analysis

This notebook is replicated from [the original GitHub repo here][1] to help you get started with this dataset. Click "Fork Notebook" to add or modify any part.

See [this page](https://www.kaggle.com/buzzfeed/fact-checking-facebook-politics-pages) for additional context.


  [1]: https://github.com/BuzzFeedNews/2016-10-facebook-fact-check/blob/master/notebooks/facebook-fact-check.ipynb

## Prepare data

In [None]:
import pandas as pd

In [None]:
percentify = lambda x: (x * 100).round(1).astype(str) + "%"

In [None]:
posts = pd.read_csv("../input/facebook-fact-check.csv")

In [None]:
len(posts)

In [None]:
ENGAGEMENT_COLS = [
    "share_count",
    "reaction_count",
    "comment_count"
]

In [None]:
RATINGS = ["mostly false", "mixture of true and false", "mostly true", "no factual content"]
FACTUAL_RATINGS = ["mostly false", "mixture of true and false", "mostly true"]

In [None]:
category_grp = posts.groupby("Category")
page_grp = posts.groupby([ "Category", "Page" ])
type_grp = posts.groupby([ "Category", "Page", "Post Type" ])

## Rating by category

Counts:

In [None]:
rating_by_category = category_grp["Rating"].value_counts().unstack()[RATINGS].fillna(0)
rating_by_category["total"] = rating_by_category.sum(axis=1)
rating_by_category

Percentages, of all posts:

In [None]:
(rating_by_category[RATINGS].T / rating_by_category[RATINGS].sum(axis=1)).pipe(percentify)

Percentages, of posts not rated "no factual content":

In [None]:
(rating_by_category[FACTUAL_RATINGS].T / rating_by_category[FACTUAL_RATINGS].sum(axis=1)).pipe(percentify)

## Rating by page

Counts:

In [None]:
rating_by_page = page_grp["Rating"].value_counts().unstack()[RATINGS].fillna(0)
rating_by_page["total"] = rating_by_page.sum(axis=1)
rating_by_page

Percentages, of all posts:

In [None]:
(rating_by_page[RATINGS].T / rating_by_page[RATINGS].sum(axis=1)).pipe(percentify)

Percentages, of posts not rated "no factual content":

In [None]:
(rating_by_page[FACTUAL_RATINGS].T / rating_by_page[FACTUAL_RATINGS].sum(axis=1)).pipe(percentify)

## Number of posts by date

Counts:

In [None]:
posts_by_date_by_category = category_grp["Date Published"].value_counts().unstack()
posts_by_date_by_category["Avg. Per Day"] = posts_by_date_by_category.mean(axis=1).round(0)
posts_by_date_by_category

In [None]:
posts_by_date_by_page = page_grp["Date Published"].value_counts().unstack()
posts_by_date_by_page["Avg. Per Day"] = posts_by_date_by_page.mean(axis=1).round(0)
posts_by_date_by_page

## Rating by post type

In [None]:
rating_by_post_type = type_grp["Rating"].value_counts().unstack()[RATINGS].fillna(0)
rating_by_post_type["total"] = rating_by_post_type.sum(axis=1)
rating_by_post_type

# Engagement

Count of missing engagement figures:

In [None]:
posts[ENGAGEMENT_COLS].isnull().sum()

## Median engagement by page

In [None]:
page_grp[ENGAGEMENT_COLS].median().round()

## Average engagement by page

In [None]:
page_grp[ENGAGEMENT_COLS].mean().round()

## Engagement by truthfulness

In [None]:
grp = posts.groupby([ "Category", "Page", "Rating" ])

Counts:

In [None]:
grp[ENGAGEMENT_COLS].size().unstack().fillna(0)

Medians:

In [None]:
grp[ENGAGEMENT_COLS].median().round()

Averages:

In [None]:
grp[ENGAGEMENT_COLS].mean().round()

## Engagement by post type

Medians:

In [None]:
type_grp[ENGAGEMENT_COLS].median().round()

Averages:

In [None]:
type_grp[ENGAGEMENT_COLS].mean().round()

## Shares by factual vs. no factual content

In [None]:
grp = posts.groupby([ "Category", "Page", posts["Rating"] == "no factual content" ])
pd.DataFrame({
    "median": grp["share_count"].median(),
    "average": grp["share_count"].mean()
}).round().unstack().stack(level=0).rename(columns={True: "no factual content", False: "factual content"})

## Shares for mostly-true vs. others for partisan pages

In [None]:
grp = posts.groupby([ "Category", "Page", posts["Rating"] == "mostly true" ])
pd.DataFrame({
    "median": grp["share_count"].median(),
    "average": grp["share_count"].mean()
}).round().unstack().stack(level=0).rename(columns={True: "mostly true", False: "everything else"})\
    [[ "mostly true", "everything else" ]].loc[["left", "right"]]

---

---

---