![Hack the feed insights banner](images/hackthefeed.png)

# Hack the Feed: Insights from Social Media Data

**Author:** Aregbesola Samuel

Playhouse Communication stands as a prominent digital marketing agency in Nigeria, known for merging design and media strategy with state-of-the-art technological innovations to transform the landscape of marketing. Their clientele includes both international industry giants and agile small to medium-sized enterprises, all of whom are reshaping their respective fields.

In this project we will try to generate game-changing insights for a high-profile client using exclusive social media data.

In the [data_cleaning workbook](data_cleaning.ipynb), the raw data was processed and cleaned for all four social media platforms. 

This python workbook contains codes used to explore the datasets and generate insights. 

In [2]:
# imports
import pandas as pd # library for data manipulation and analysis
import numpy as np # library for scientific computing
import matplotlib.pyplot as plt # library for visualizing data
import seaborn as sns # library for visualizing data
import missingno as msno # library for visualizing missing values

import warnings # library to ignore warnings
warnings.filterwarnings('ignore')

# set option to display all the columns of the dataframe
pd.set_option('display.max_columns', None)

In [9]:
# import helper functions
from helper_functions import feature_engineering_functions, time_series_functions

In [7]:
# import data
facebook = pd.read_csv('./data/facebook_clean.csv')
twitter = pd.read_csv('./data/twitter_clean.csv')
instagram = pd.read_csv('./data/instagram_clean.csv')
linkedin = pd.read_csv('./data/linkedin_clean.csv')

In [8]:
# Check dataframe dimensions
def print_df_shapes(dfs):
    for name, df in dfs.items():
        print(f"{name} ({df.shape[0]}, {df.shape[1]})")

# dataframes dictionary
dataframes = {'facebook': facebook, 'instagram': instagram, 'linkedin': linkedin, 'twitter': twitter}

print_df_shapes(dataframes)

facebook (8893, 45)
instagram (8516, 15)
linkedin (6332, 16)
twitter (7841, 26)


In [10]:
# Create a copy of the dataframes
facebook_copy = facebook.copy()
instagram_copy = instagram.copy()
linkedin_copy = linkedin.copy()
twitter_copy = twitter.copy()

In [16]:
# return all similar columns in the four dataframes
def similar_columns(df1, df2, df3, df4):
    df1_cols = df1.columns
    df2_cols = df2.columns
    df3_cols = df3.columns
    df4_cols = df4.columns
    similar_cols = []
    for col in df1_cols:
        if col in df2_cols and col in df3_cols and col in df4_cols:
            similar_cols.append(col)
    return similar_cols

In [19]:
selected_columns = similar_columns(facebook, instagram, twitter, linkedin)
print(selected_columns)

['Date', 'Network', 'Content Type', 'Sent by', 'Post', 'Impressions', 'Engagement Rate (per Impression)', 'Engagements', 'Reactions', 'Likes', 'Comments']


In [20]:
# Select similar columns in the four dataframes
facebook_select = facebook[selected_columns]
twitter_select = twitter[selected_columns]
instagram_select = instagram[selected_columns]
linkedin_select = linkedin[selected_columns]

In [21]:
# Concatenate the four dataframes
social_media = pd.concat([facebook_select, twitter_select, instagram_select, linkedin_select]).reset_index(drop = True)

In [25]:
print('Rows:', social_media.shape[0])
print('Columns:', social_media.shape[1])

Rows: 31582
Columns: 11


In [29]:
# check the data types of the columns
pd.DataFrame(social_media.dtypes, columns=['data_type'])

Unnamed: 0,data_type
Date,object
Network,object
Content Type,object
Sent by,object
Post,object
Impressions,float64
Engagement Rate (per Impression),float64
Engagements,float64
Reactions,float64
Likes,float64


In [26]:
# check for missing values
social_media.isnull().sum()

Date                                   0
Network                                0
Content Type                           0
Sent by                                0
Post                                 190
Impressions                            0
Engagement Rate (per Impression)    1433
Engagements                            0
Reactions                              0
Likes                                  0
Comments                               0
dtype: int64

In [48]:
social_media[social_media['Engagement Rate (per Impression)'].isnull()]['Network'].value_counts()

Facebook     1149
Instagram     284
Name: Network, dtype: int64

### Facebook Advertisement analysis

This section aims to check the performance of promoted content against non-promoted content

In [13]:
post_with_ads = facebook[(facebook['Impressions'] != facebook['Organic Impressions'])]

In [14]:
post_with_ads.head(2)

Unnamed: 0,Date,Network,Content Type,Sent by,Post,Impressions,Organic Impressions,Viral Impressions,Non-viral Impressions,Fan Impressions,Fan Organic Impressions,Non-fan Impressions,Non-fan Organic Impressions,Reach,Organic Reach,Viral Reach,Non-viral Reach,Fan Reach,Engagement Rate (per Impression),Engagement Rate (per Reach),Engagements,Reactions,Likes,Love Reactions,Haha Reactions,Wow Reactions,Sad Reactions,Angry Reactions,Comments,Shares,Click-Through Rate,Other Post Clicks,Post Clicks (All),Answers,Negative Feedback,Engaged Users,Engaged Fans,Users Talking About This,Unique Reactions,Unique Comments,Unique Shares,Unique Answers,Unique Post Clicks,Unique Other Post Clicks,Unique Negative Feedback
12,2017-09-17 11:37:00,Facebook,Photo,Unknown,"This EPL #Supersunday, it's the Blues against ...",95300.0,59484.0,35816.0,59484.0,57181.0,57181.0,38119.0,2303.0,56094.0,38717.0,18384.0,38717.0,35590.0,5.49,9.32,5230.0,641.0,631.0,6.0,0.0,4.0,0.0,0.0,1050.0,96.0,0.0,3443.0,3443.0,0.0,7.0,2431.0,1974.0,1398.0,620.0,955.0,95.0,0.0,1948.0,1628.0,6.0
15,2019-09-03 11:43:00,Facebook,Photo,Aramide Salami,Every human life is a precious gift to humanit...,70855.0,70787.0,35336.0,35449.0,37836.0,37836.0,33019.0,32951.0,52185.0,52185.0,24829.0,29127.0,30207.0,5.25,7.12,3717.0,669.0,622.0,23.0,2.0,0.0,2.0,20.0,186.0,187.0,0.0,2675.0,2675.0,0.0,1.0,2573.0,1591.0,852.0,649.0,120.0,170.0,0.0,1993.0,1679.0,1.0


In [33]:
print(f'There are {post_with_ads.shape[0]} promotted with ads')

There are 244 promotted with ads


In [34]:
post_without_ads = facebook[(facebook['Impressions'] == facebook['Organic Impressions'])]

In [38]:
# check post with ads by year
post_with_ads['year'] = post_with_ads['Posted'].dt.year

In [36]:
print(f'There are {post_without_ads.shape[0]} posts without ads')

There are 8649 posts without ads


In [37]:
facebook['Content Type'].value_counts() 

Photo    7615
Video     909
Text      275
Link       94
Name: Content Type, dtype: int64

In [39]:
#compare ctr of post with ads and post without ads
post_with_ads[post_with_ads['Content Type'] == 'Link']['Click-Through Rate'].mean()

0.34500000000000003

In [40]:
#compare ctr of post with ads and post without ads
post_without_ads[post_without_ads['Content Type'] == 'Link']['Click-Through Rate'].mean()

0.39568181818181825

In [None]:
# compare engagement of post with ads and post with ads
pd.DataFrame(post_with_ads[['Engagement Rate (per Impression)', 'Engagement Rate (per Reach)']].mean())

Unnamed: 0,0
Engagement Rate (per Impression),6.646311
Engagement Rate (per Reach),9.463443


In [41]:
# compare engagement of post with ads and post without ads
pd.DataFrame(post_without_ads[['Engagement Rate (per Impression)', 'Engagement Rate (per Reach)']].mean())

Unnamed: 0,0
Engagement Rate (per Impression),5.180753
Engagement Rate (per Reach),5.504623
