# Example of Processing Facebook Reaction Data

by Max Woolf (@minimaxir)

*This notebook is licensed under the MIT License. If you use the code or data visualization designs contained within this notebook, it would be greatly appreciated if proper attribution is given back to this notebook and/or myself. Thanks! :)*

In [34]:
source("Rstart.R")

library(tidyr)
library(viridis)

sessionInfo()

R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] viridis_0.3.4      tidyr_0.4.1        stringr_1.0.0      digest_0.6.9      
 [5] RColorBrewer_1.1-2 scales_0.4.0       extrafont_0.17     ggplot2_2.1.0     
 [9] dplyr_0.4.3        readr_0.2.2       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4      Rttf2pt1_1.3.3   magrittr_1.5     munsell_0.4.3   
 [5] uuid_0.1-2       colorspace_1.2-6 R6_2.1.2         plyr_1.8.3      
 [9] tools_3.3.0      parallel_3.3.0   gtable_0.2.0     DBI_0.4         
[13] extrafontdb_1.0  lazyeval_0.1.10  assertthat_0.1   gridExtra_2.2.1 
[17] IRdisplay_0.3    repr_0.4         base64enc_0.1-3  IRkernel_0.5    
[21] evaluate_0.9     rzmq_0.7.7       stringi_1.0-1    j

In [3]:
df <- read_csv("cnn_facebook_statuses.csv") %>% filter(status_published > '2016-02-24 00:00:00')

print(head(df))
nrow(df)

Source: local data frame [6 x 15]

                     status_id
                         (chr)
1 5550296508_10154919083226509
2 5550296508_10154919005411509
3 5550296508_10154918925156509
4 5550296508_10154918906011509
5 5550296508_10154918844706509
6 5550296508_10154918803531509
Variables not shown: status_message (chr), link_name (chr), status_type (chr),
  status_link (chr), status_published (time), num_reactions (int), num_comments
  (int), num_shares (int), num_likes (int), num_loves (int), num_wows (int),
  num_hahas (int), num_sads (int), num_angrys (int)


In [31]:
df_agg <- df %>% group_by(date = as.Date(substr(status_published, 1, 10))) %>%
                summarize(total_likes=sum(num_likes),
                          total_loves=sum(num_loves),
                          total_wows=sum(num_wows),
                          total_hahas=sum(num_hahas),
                          total_sads=sum(num_sads),
                          total_angrys=sum(num_angrys)) %>%
                arrange(date)

print(head(df_agg))

Source: local data frame [6 x 7]

        date total_likes total_loves total_wows total_hahas total_sads
      (date)       (int)       (int)      (int)       (int)      (int)
1 2016-02-24      215784       12366       9699        6670       2699
2 2016-02-25      183785        8280       4879       12300       2049
3 2016-02-26      191436        6445       6141       14510       1874
4 2016-02-27      144926        8828       2300        1004       1984
5 2016-02-28      140882        6593       1627        3657       3654
6 2016-02-29      286802       13716       4404        5899       4410
Variables not shown: total_angrys (int)


For ggplot, data must be converted to long format.

In [62]:
df_agg_long <- df_agg %>% gather(key=reaction, value=count, total_likes:total_angrys) %>%
                        mutate(reaction=factor(reaction))

print(head(df_agg_long,20))

Source: local data frame [20 x 3]

         date    reaction  count
       (date)      (fctr)  (int)
1  2016-02-24 total_likes 215784
2  2016-02-25 total_likes 183785
3  2016-02-26 total_likes 191436
4  2016-02-27 total_likes 144926
5  2016-02-28 total_likes 140882
6  2016-02-29 total_likes 286802
7  2016-03-01 total_likes 197091
8  2016-03-02 total_likes 204942
9  2016-03-03 total_likes 198320
10 2016-03-04 total_likes 113997
11 2016-03-05 total_likes 154004
12 2016-03-06 total_likes 219300
13 2016-03-07 total_likes 140551
14 2016-03-08 total_likes 161067
15 2016-03-09 total_likes 104399
16 2016-03-10 total_likes 158898
17 2016-03-11 total_likes 212756
18 2016-03-12 total_likes  98536
19 2016-03-13 total_likes  91079
20 2016-03-14 total_likes 155147


Create a stacked area chart. (filled to 100%)

In [64]:
plot <- ggplot(df_agg_long, aes(x=date, y=count, color=reaction, fill=reaction)) +
            geom_bar(size=0.25, position="fill", stat="identity") +
            fte_theme() +
            scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%b %Y")) +
            scale_y_continuous(labels=percent) +
            theme(legend.title = element_blank(),
                  legend.position="top",
                  legend.direction="horizontal",
                  legend.key.width=unit(0.5, "cm"),
                  legend.key.height=unit(0.25, "cm"),
                  legend.margin=unit(0,"cm")) +
            scale_color_viridis(discrete=T) +
            scale_fill_viridis(discrete=T) +
            labs(title="Daily Breakdown of Facebook Reactions on CNN's FB Posts",
                 x="Date Status Posted",
                 y="% Reaction Marketshare")

max_save(plot, "reaction-example-1", "Facebook")

![](reaction-example-1.png)

The Likes reaction skews things. Run plot without it.

In [65]:
plot <- ggplot(df_agg_long %>% filter(reaction!="total_likes"), aes(x=date, y=count, color=reaction, fill=reaction)) +
            geom_bar(size=0.25, position="fill", stat="identity") +
            fte_theme() +
            scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%b %Y")) +
            scale_y_continuous(labels=percent) +
            theme(legend.title = element_blank(),
                  legend.position="top",
                  legend.direction="horizontal",
                  legend.key.width=unit(0.5, "cm"),
                  legend.key.height=unit(0.25, "cm"),
                  legend.margin=unit(0,"cm")) +
            scale_color_viridis(discrete=T) +
            scale_fill_viridis(discrete=T) +
            labs(title="Daily Breakdown of Facebook Reactions on CNN's FB Posts",
                 x="Date Status Posted",
                 y="% Reaction Marketshare")

max_save(plot, "reaction-example-2", "Facebook")

![](reaction-example-2.png)

That visualization might be too crowded: use percent-wise calculations instead, and switch data to NYTimes for comparison.

In [76]:
df <- read_csv("nytimes_facebook_statuses.csv") %>% filter(status_published > '2016-02-24 00:00:00')

df_agg <- df %>% group_by(date = as.Date(substr(status_published, 1, 10))) %>%
                summarize(total_reactions=sum(num_loves)+sum(num_wows)+sum(num_hahas)+sum(num_sads)+sum(num_angrys),
                          perc_loves=sum(num_loves)/total_reactions,
                          perc_wows=sum(num_wows)/total_reactions,
                          perc_hahas=sum(num_hahas)/total_reactions,
                          perc_sads=sum(num_sads)/total_reactions,
                          perc_angrys=sum(num_angrys)/total_reactions) %>%
                select(-total_reactions) %>%
                arrange(date)

print(head(df_agg))

Source: local data frame [6 x 6]

        date perc_loves  perc_wows perc_hahas  perc_sads perc_angrys
      (date)      (dbl)      (dbl)      (dbl)      (dbl)       (dbl)
1 2016-02-24  0.3930676 0.17360566 0.08621367 0.09740770  0.24970542
2 2016-02-25  0.1919722 0.08666052 0.29210694 0.09332671  0.33593362
3 2016-02-26  0.1435334 0.18946182 0.10831220 0.17396450  0.38472809
4 2016-02-27  0.2736496 0.13627639 0.06443652 0.27570606  0.24993145
5 2016-02-28  0.7713515 0.08522014 0.04054117 0.03737970  0.06550746
6 2016-02-29  0.3399680 0.08842370 0.12708762 0.11256005  0.33196065


In [77]:
df_agg_long <- df_agg %>% gather(key=reaction, value=count, perc_loves:perc_angrys) %>%
                        mutate(reaction=factor(reaction))

print(head(df_agg_long,20))

Source: local data frame [20 x 3]

         date   reaction      count
       (date)     (fctr)      (dbl)
1  2016-02-24 perc_loves 0.39306756
2  2016-02-25 perc_loves 0.19197220
3  2016-02-26 perc_loves 0.14353339
4  2016-02-27 perc_loves 0.27364957
5  2016-02-28 perc_loves 0.77135153
6  2016-02-29 perc_loves 0.33996797
7  2016-03-01 perc_loves 0.34061714
8  2016-03-02 perc_loves 0.24681208
9  2016-03-03 perc_loves 0.35172992
10 2016-03-04 perc_loves 0.19499779
11 2016-03-05 perc_loves 0.14512737
12 2016-03-06 perc_loves 0.40097144
13 2016-03-07 perc_loves 0.30259557
14 2016-03-08 perc_loves 0.36623147
15 2016-03-09 perc_loves 0.21422640
16 2016-03-10 perc_loves 0.31396083
17 2016-03-11 perc_loves 0.33173516
18 2016-03-12 perc_loves 0.06377902
19 2016-03-13 perc_loves 0.25712914
20 2016-03-14 perc_loves 0.33751152


In [78]:
plot <- ggplot(df_agg_long, aes(x=date, y=count, color=reaction)) +
            geom_line(size=0.5, stat="identity") +
            fte_theme() +
            scale_x_date(breaks = date_breaks("1 month"), labels = date_format("%b %Y")) +
            scale_y_continuous(labels=percent) +
            theme(legend.title = element_blank(),
                  legend.position="top",
                  legend.direction="horizontal",
                  legend.key.width=unit(0.5, "cm"),
                  legend.key.height=unit(0.25, "cm"),
                  legend.margin=unit(0,"cm")) +
            scale_color_viridis(discrete=T) +
            scale_fill_viridis(discrete=T) +
            labs(title="Daily Breakdown of Facebook Reactions on NYTimes's FB Posts",
                 x="Date Status Posted",
                 y="% Reaction Marketshare")

max_save(plot, "reaction-example-3", "Facebook")

![](reaction-example-3.png)

# The MIT License (MIT)

Copyright (c) 2016 Max Woolf

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.