<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Importing-data" data-toc-modified-id="Importing-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Importing data</a></span></li><li><span><a href="#Manipulating-dates" data-toc-modified-id="Manipulating-dates-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Manipulating dates</a></span></li><li><span><a href="#Personal-stats-(digital-wellbeing)" data-toc-modified-id="Personal-stats-(digital-wellbeing)-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Personal stats (digital wellbeing)</a></span><ul class="toc-item"><li><span><a href="#Time-spent-using-Facebook" data-toc-modified-id="Time-spent-using-Facebook-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Time spent using Facebook</a></span></li><li><span><a href="#Time-spent-watching-ads" data-toc-modified-id="Time-spent-watching-ads-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Time spent watching ads</a></span></li><li><span><a href="#Most-seen-sources" data-toc-modified-id="Most-seen-sources-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Most seen sources</a></span></li></ul></li><li><span><a href="#Exploratory-Data-Analysis" data-toc-modified-id="Exploratory-Data-Analysis-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Exploratory Data Analysis</a></span><ul class="toc-item"><li><span><a href="#Trending-posts-on-your-feed-based-on-likes" data-toc-modified-id="Trending-posts-on-your-feed-based-on-likes-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Trending posts on your feed based on likes</a></span></li><li><span><a href="#Posts-seen-more-often-and-how-did-they-rank-in-your-feed" data-toc-modified-id="Posts-seen-more-often-and-how-did-they-rank-in-your-feed-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Posts seen more often and how did they rank in your feed</a></span></li><li><span><a href="#Wordclouds" data-toc-modified-id="Wordclouds-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Wordclouds</a></span></li></ul></li></ul></div>

# Importing data

To run any script and to query the API in general, you will need a token. A code is generated every time you install the facebook.tracking.exposed

You can use the test one or enter you own. Read this if you don't know how to get your token: link.

Note: to get your data you can use the script fb_download.py or fb_update.py in scripts/, then locate the csv file in outputs/.

In [None]:
summary_path = '../sample_data/users/user_a.csv'

Import the necessary libraries. In this example we commented out the hierarchical configuration used to call scripts from the command line.

In [None]:
import datetime
import pandas as pd
from src import tools
import sys
import os
sys.path.insert(0, '../')


# from src.lib.config import config

print('Done!')

Now, we read the csv downloaded with facebook_download.py, remember that you can choose the amount of entries to retrieve by using the parameters --amount and --skip.

In [None]:
df = pd.read_csv(summary_path)
print('Done!')

This is how the data looks like:

In [None]:
from IPython.display import display

display(df.head())

# Manipulating dates

Now you can check the timeframe of the data you pulled.

In [None]:
df = tools.setDatetimeIndex(df)
maxDate = df.index.max()
minDate = df.index.min()
print('Information for timeframe: '+str(minDate)[:-6]+' to '+str(maxDate)[:-6])

OPTIONALLY, you can also cut it to get, in this example, the last 24 hours only.

In [None]:
start = maxDate-datetime.timedelta(days=7)
end = maxDate
df = tools.setTimeframe(df, str(start), str(end))
print('From '+str(start)+' to '+str(end)+'\n')

# Personal stats (digital wellbeing)

## Time spent using Facebook

You can get useful insights for yourself, for example you can estimate the you time spent of facebook during that timeframe.

In [None]:
timelines = df.timeline.unique()
total = pd.to_timedelta(0)

for t in timelines:
    ndf = tools.filter(t, df=df, what='timeline', kind='or')
    timespent = ndf.index.max() - ndf.index.min()
    total += timespent

print('Time spent on Facebook in this timeframe: '+str(total))

## Time spent watching ads

In [None]:
nature = df.nature.value_counts()

try:
    percentage = str((nature.sponsored/nature.organic)*100)[:-12]
except:
    nature['sponsored'] = 0
    percentage = str((nature.sponsored/nature.organic)*100)

print(percentage+'% of the posts are sponsored posts.')

timeads = (total.seconds)*(nature.sponsored/nature.organic)
print('You spent an estimate of '+str(datetime.timedelta(seconds=(timeads)))
      [:-7]+' watching ads on Facebook.')

## Most seen sources

You can also check which are the top news that are informing you.

In [None]:
n = 5
top = df.source.value_counts().nlargest(n)
print('Top '+str(n)+' sources of information are: \n'+top.to_string())

Of course, you can display this data graphically. (Run field twice if it doesn't work).

In [None]:
%matplotlib inline

top.plot.pie(autopct='%.2f', fontsize=13, figsize=(6, 6))

# Exploratory Data Analysis

You can change the x and y values easily to other columns to see different patterns. Note: You might need to trim the data first.

## Trending posts on your feed based on likes

In [None]:
import altair as alt

# for the notebook only (not for JupyterLab) run this command once per session
alt.renderers.enable('notebook')
alt.Chart(df).mark_circle().encode(
    x='impressionTime:T',
    y='LIKE:Q',
    color=alt.Color('source:N', legend=None),
    tooltip=['source:N', 'url:N']
).properties(
    width=800,
    height=400
).interactive()

## Posts seen more often and how did they rank in your feed


In [None]:
df['count'] = df.groupby('postId')['postId'].transform('count')

alt.Chart(df).transform_calculate(
    url='https://www.facebook.com' + alt.datum.permaLink
).mark_circle(size=50).encode(
    y='count:Q',
    x='average(impressionOrder):Q',
    color=alt.Color('source:N', legend=None),
    href='url:N',
    tooltip=['source:N', 'url:N']
).properties(
    width=800,
    height=400
).interactive()

## Wordclouds

In [None]:
# import necessary modules
import re
import matplotlib
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from stop_words import get_stop_words

matplotlib.rcParams['figure.figsize'] = [20, 10]

stop_words = get_stop_words('es') + get_stop_words('it') + \
    get_stop_words('nl')+get_stop_words('en')


def generate_wordcloud(text):
    wordcloud = WordCloud(font_path='../src/fonts/DejaVuSans.ttf',
                          relative_scaling=1.0,
                          stopwords=stop_words,  # set or space-separated string
                          width=2000,
                          height=1000
                          ).generate(text)
    plt.imshow(wordcloud)
    plt.figsize = (20, 10)
    plt.axis("off")
    plt.show()


text = df.texts.str.join(sep='').reset_index()
text.columns = ['date', 'words']
text = text.words.str.cat(sep=' ')
text = re.sub(r'\W+', ' ', text)

generate_wordcloud(text)