### DataSet Description:
The variability of consumer engagement is analysed through a Principal Component Analysis, highlighting the changes induced by the use of Facebook Live. The seasonal component is analysed through a study of the averages of the different engagement metrics for different time-frames (hourly, daily and monthly). Finally, we identify statistical outlier posts, that are qualitatively analyzed further, in terms of their selling approach and activities.



#### Descriptive Statistics for the data as follows:
* Mean
* Median
* Standard Deviation
* Quartiles
* Outliers

In [2]:
import numpy as np
import pandas as pd

dataset = pd.read_csv('datasets/facebook_data.csv')
# delete empty columns
del dataset['Column1']
del dataset['Column2']
del dataset['Column3']
del dataset['Column4']

dataset.head()

Unnamed: 0,status_id,status_type,status_published,num_reactions,num_comments,num_shares,num_likes,num_loves,num_wows,num_hahas,num_sads,num_angrys
0,1,video,4/22/2018 6:00,529,512,262,432,92,3,1,1,0
1,2,photo,4/21/2018 22:45,150,0,0,150,0,0,0,0,0
2,3,video,4/21/2018 6:17,227,236,57,204,21,1,1,0,0
3,4,photo,4/21/2018 2:29,111,0,0,111,0,0,0,0,0
4,5,photo,4/18/2018 3:22,213,0,0,204,9,0,0,0,0


### Descriptive Statistics for the Number of Reactions

In [3]:
mean = dataset['num_reactions'].mean()
median = dataset['num_reactions'].median()
std = dataset['num_reactions'].std()
quantiles = dataset['num_reactions'].quantile([0.25, 0.5, 0.75])
outliers = dataset['num_reactions'].quantile([0.01, 0.99])
# display this data
print(dataset.num_reactions.describe())
print("\nMedian:\n", median)
print("\nOutliers:\n", outliers)

count    7050.000000
mean      230.117163
std       462.625309
min         0.000000
25%        17.000000
50%        59.500000
75%       219.000000
max      4710.000000
Name: num_reactions, dtype: float64

Median:
 59.5

Outliers:
 0.01       0.00
0.99    2306.06
Name: num_reactions, dtype: float64


### Descriptive Statistics for the Number of Comments

In [4]:
mean = dataset['num_comments'].mean()
median = dataset['num_comments'].median()
std = dataset['num_comments'].std()
quartiles = dataset['num_comments'].quantile([0.25, 0.5, 0.75])
outliers = dataset['num_comments'].quantile([0.01, 0.99])

print(dataset.num_comments.describe())
print("\nMedian:\n", median)
print("\nOutliers:\n", outliers)


count     7050.000000
mean       224.356028
std        889.636820
min          0.000000
25%          0.000000
50%          4.000000
75%         23.000000
max      20990.000000
Name: num_comments, dtype: float64

Median:
 4.0

Outliers:
 0.01       0.00
0.99    4338.04
Name: num_comments, dtype: float64


### Descriptive Statistics for the Number of Shares

In [5]:
mean = dataset['num_shares'].mean()
median = dataset['num_shares'].median()
std = dataset['num_shares'].std()
quartiles = dataset['num_shares'].quantile([0.25, 0.5, 0.75])
outliers = dataset['num_shares'].quantile([0.01, 0.99])

print(dataset.num_shares.describe())
print("\nMedian:\n", median)
print("\nOutliers:\n", outliers)

count    7050.000000
mean       40.022553
std       131.599965
min         0.000000
25%         0.000000
50%         0.000000
75%         4.000000
max      3424.000000
Name: num_shares, dtype: float64

Median:
 0.0

Outliers:
 0.01      0.0
0.99    607.0
Name: num_shares, dtype: float64


### Descriptive Statistics for the Number of Likes

In [6]:
mean = dataset['num_likes'].mean()
median = dataset['num_likes'].median()
std = dataset['num_likes'].std()
quartiles = dataset['num_likes'].quantile([0.25, 0.5, 0.75])
outliers = dataset['num_likes'].quantile([0.01, 0.99])

print(dataset.num_likes.describe())
print("\nMedian:\n", median)
print("\nOutliers:\n", outliers)

count    7050.000000
mean      215.043121
std       449.472357
min         0.000000
25%        17.000000
50%        58.000000
75%       184.750000
max      4710.000000
Name: num_likes, dtype: float64

Median:
 58.0

Outliers:
 0.01       0.0
0.99    2297.0
Name: num_likes, dtype: float64


### Descriptive Statistics for all the attributes

In [16]:
# include the mean, median, standard deviation, quartiles and the outliers for all the attributes
# mean
mean = dataset.mean()
# median
median = dataset.median()
# standard deviation
std = dataset.std()
# quartiles
quartiles = dataset.quantile([0.25, 0.5, 0.75])
# outliers
outliers = dataset.quantile([0.01, 0.99])
dataset = dataset[['num_reactions', 'num_comments', 'num_shares', 'num_likes']]
dataset.describe(percentiles=[0.01, 0.25, 0.5, 0.75, 0.99])


Unnamed: 0,num_reactions,num_comments,num_shares,num_likes
count,7050.0,7050.0,7050.0,7050.0
mean,230.117163,224.356028,40.022553,215.043121
std,462.625309,889.63682,131.599965,449.472357
min,0.0,0.0,0.0,0.0
1%,0.0,0.0,0.0,0.0
25%,17.0,0.0,0.0,17.0
50%,59.5,4.0,0.0,58.0
75%,219.0,23.0,4.0,184.75
99%,2306.06,4338.04,607.0,2297.0
max,4710.0,20990.0,3424.0,4710.0


In [25]:
# make a table out of given text file of data and display it
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# read the data from the text file
data = pd.read_csv('graphs/Out_16.txt' , sep='\t', header=None)
# make a pretty table out of it
data


Unnamed: 0,0
0,+-----+-------------+------------+-----------+...
1,| |num_reactions|num_comments|num_shares |...
2,+-----+-------------+------------+-----------+...
3,|count|7050.000000 |7050.000000 |7050.000000|...
4,|mean |230.117163 |224.356028 |40.022553 |...
5,|std |462.625309 |889.636820 |131.599965 |...
6,|min |0.000000 |0.000000 |0.000000 |...
7,|1% |0.000000 |0.000000 |0.000000 |...
8,|25% |17.000000 |0.000000 |0.000000 |...
9,|50% |59.500000 |4.000000 |0.000000 |...
