# Classifying the Emotions of Facebook Posts Using Reactions Data

by Max Woolf (@minimaxir)

*This notebook is licensed under the MIT License. If you use the code or data visualization designs contained within this notebook, it would be greatly appreciated if proper attribution is given back to this notebook and/or myself. Thanks! :)*

In [297]:
options(warn=1)

source("Rstart.R")

library(plotly)
library(htmlwidgets)
library(tidyr)

sessionInfo()

R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] tidyr_0.4.1        htmlwidgets_0.6    viridis_0.3.4      plotly_3.6.0      
 [5] stringr_1.0.0      digest_0.6.9       RColorBrewer_1.1-2 scales_0.4.0      
 [9] extrafont_0.17     ggplot2_2.1.0      dplyr_0.4.3        readr_0.2.2       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4      plyr_1.8.3       base64enc_0.1-3  tools_3.3.0     
 [5] uuid_0.1-2       jsonlite_0.9.19  evaluate_0.9     gtable_0.2.0    
 [9] IRdisplay_0.3    DBI_0.4          yaml_2.1.13      IRkernel_0.5    
[13] parallel_3.3.0   rzmq_0.7.7       gridExtra_2.2.1  Rttf2pt1_1.3.3  
[17] repr_0.4         httr_1.1.0       R6_2.1.2         extrafontdb_1.0 
[21] magrittr_1.5  

In [277]:
df <- read_csv("cnn_facebook_statuses.csv") %>% filter(status_published > '2016-02-24 00:00:00')

df %>% head() %>% print()
df %>% nrow() %>% print()

Source: local data frame [6 x 15]

                     status_id
                         (chr)
1 5550296508_10154953526846509
2 5550296508_10154953465071509
3 5550296508_10154953390456509
4 5550296508_10154952135926509
5 5550296508_10154952030666509
6 5550296508_10154952002266509
Variables not shown: status_message (chr), link_name (chr), status_type (chr),
  status_link (chr), status_published (time), num_reactions (int), num_comments
  (int), num_shares (int), num_likes (int), num_loves (int), num_wows (int),
  num_hahas (int), num_sads (int), num_angrys (int)
[1] 4629


Check correlation between variables for redundancy checks.

In [278]:
df %>% select(num_likes,num_loves,num_wows,num_hahas,num_sads,num_angrys) %>%
        cor() %>%
        round(2) %>%
        print()

           num_likes num_loves num_wows num_hahas num_sads num_angrys
num_likes       1.00      0.80     0.46      0.23     0.27       0.12
num_loves       0.80      1.00     0.22      0.15     0.29       0.03
num_wows        0.46      0.22     1.00      0.08     0.12       0.09
num_hahas       0.23      0.15     0.08      1.00    -0.04       0.09
num_sads        0.27      0.29     0.12     -0.04     1.00       0.20
num_angrys      0.12      0.03     0.09      0.09     0.20       1.00


Hardcode reactions colors (normally, you could use factors, but that is complicated since reactions may be missing for some Pages)

In [279]:
reactions <- c("Love","Wow","Haha","Sad","Angry")

colors <- tbl_df(data.frame(react_type=reactions, color=c("#e74c3c","#f1c40f","#e67e22","#2980b9","#8e44ad")))

Helper function to get the `title` of a post; if no title exists, use the `message` instead.

In [280]:
get_tooltip_title <- function(x) {
    ifelse(is.na(x[3]) | x[3] == "Timeline Photos",
           ifelse(is.na(x[2]), "[No Message]", paste0(substr(x[2],1,60),"...")), x[3])
}

df$tooltip_title <- as.character(apply(df, 1, get_tooltip_title))

df %>% head() %>% select(tooltip_title) %>% print()

Source: local data frame [6 x 1]

                                                               tooltip_title
                                                                       (chr)
1 Selena Gomez, Yoko Ono and Lady Gaga sign open letter to stop gun violence
2                                  Tornado in China kills at least 51 people
3                                               Supreme Court cases to watch
4                                   GOP congressman: 'This isn't about guns'
5                         Democrats continue their sit-in on the House floor
6                                     Rep. John Lewis goes back to his roots


Helper functions:

* Get the value of the Max percentage in a Post
* Get the Emotion it corresponds to.
* Construct the tooltip using the data above.

In [281]:
reactions <- c("Love","Wow","Haha","Sad","Angry")

get_max_perc <- function(x) {
    max(x[2],x[3],x[4],x[5],x[6])
}

get_max_reaction_type <- function(x) {
    if (x[7]==0) {return (NA)}
    reactions[which(c(x[2],x[3],x[4],x[5],x[6])==x[8])]
}

get_tooltip <- function(x) {
    sprintf("%s<br>%0.1f%% %s out of %s Reactions", x[1],as.numeric(x[8])*100,x[9],format(as.numeric(x[7]), big.mark=","))
}



get_max_perc(c("test", 0.73584906,0.01886792,0.00000000,0.207547170,0.03773585,53))
get_max_reaction_type(c("test", 0.73584906,0.01886792,0.00000000,0.207547170,0.03773585,53,0.73584906))
get_tooltip(c("test", 0.73584906,0.01886792,0.00000000,0.207547170,0.03773585,5353,0.73584906,"Love"))

Get percentages, then process row-wise using the helper functions above.

In [282]:
df_agg <- df %>% mutate(total_reactions=num_loves+num_wows+num_hahas+num_sads+num_angrys,
                          perc_loves=num_loves/total_reactions,
                          perc_wows=num_wows/total_reactions,
                          perc_hahas=num_hahas/total_reactions,
                          perc_sads=num_sads/total_reactions,
                          perc_angrys=num_angrys/total_reactions) %>%
                select(tooltip_title, perc_loves,perc_wows,perc_hahas,perc_sads,perc_angrys, total_reactions)

df_agg$max_perc <- apply(df_agg, 1, get_max_perc)
df_agg$react_type <- as.character(apply(df_agg, 1, get_max_reaction_type))
df_agg$tooltip <- as.character(apply(df_agg, 1, get_tooltip))

df_agg %>% select(-tooltip_title,-perc_loves) %>% head() %>% print()

Source: local data frame [6 x 8]

   perc_wows perc_hahas  perc_sads perc_angrys total_reactions     max_perc
       (dbl)      (dbl)      (dbl)       (dbl)           (int)        (chr)
1 0.00000000 0.12500000 0.00000000 0.125000000               8  0.750000000
2 0.36391437 0.01223242 0.61162080 0.003058104             327 0.6116207951
3 0.36363636 0.09090909 0.09090909 0.090909091              11  0.363636364
4 0.03169014 0.07746479 0.03521127 0.683098592             284 0.6830985915
5 0.02885683 0.05401406 0.01960784 0.173140954            2703  0.724380318
6 0.00896861 0.14349776 0.01345291 0.044843049             223  0.789237668
Variables not shown: react_type (chr), tooltip (chr)


Add color information and filter on the 75% threshold. Note we must conduct [one-sided z test on proportion](https://onlinecourses.science.psu.edu/stat200/node/53).

In [286]:
df_agg_long <- df_agg %>%
                    select(react_type, max_perc, tooltip,total_reactions) %>%
                    left_join(colors) %>%
                    mutate(react_type=factor(react_type), max_perc=as.numeric(max_perc)) %>%
                    filter(max_perc >= 0.75,
                           total_reactions > 20,
                           (max_perc - 0.75)/sqrt(max_perc*(1-max_perc)/total_reactions) > qnorm(0.99))

#write.csv(df_agg_long,"df_agg_long.csv",row.names=F)
df_agg_long %>% select(react_type, max_perc, color) %>% head() %>% print()

Joining by: "react_type"


Source: local data frame [6 x 3]

  react_type  max_perc   color
      (fctr)     (dbl)  (fctr)
1        Wow 0.8750000 #f1c40f
2        Sad 0.9237805 #2980b9
3        Wow 0.8690808 #f1c40f
4       Haha 0.9019992 #e67e22
5       Haha 0.7905687 #e67e22
6        Sad 0.8496241 #2980b9


In [306]:
plot <- ggplot(data=df_agg_long, aes(x=react_type, y=max_perc, color=color)) +
            geom_point(position = position_jitter(width = 0.5), size=1, shape=1) +
            fte_theme() +
            coord_flip() +
            scale_x_discrete() +
            scale_y_continuous(limits=c(0.75,1.02), breaks=seq(0.75,1,by=0.05), labels=percent) +
            scale_color_identity() +
            theme(axis.title.y=element_blank()) +
            labs(y="% of Type of Facebook Reactions on Post", title="User Reactions on CNN's Facebook Posts")

max_save(plot, "facebook-reaction-1", "Facebook")

![](facebook-reaction-1.png)

In [307]:
theme_color <- "#f7f8fa"

i_plot <- ggplot(data=df_agg_long, aes(x=react_type, y=max_perc, color=color, text=tooltip)) +
            geom_point(position = position_jitter(width = 0.75), size=2, shape=1) +
            fte_theme() +
            coord_flip() +
            scale_x_discrete() +
            scale_y_continuous(limits=c(0.75,1.02), breaks=seq(0.75,1.00,by=0.05), labels=percent) +
            scale_color_identity() +
            theme(axis.title.y=element_blank(),
                  axis.title.x = element_text(size = 10),
                  axis.text.x = element_text(size = 14, family = "Source Sans Pro"),
                  axis.text.y = element_text(size = 14, family = "Source Sans Pro"),
                  plot.title=element_text(size = 18)) +
            theme(plot.background=element_rect(fill=theme_color), 
                  panel.background=element_rect(fill=theme_color),
                  panel.border=element_rect(color=theme_color),
                  strip.background=element_rect(fill=theme_color)) +
            labs(y="% of Type of Facebook Reactions on Post", title="User Reactions on CNN's Facebook Posts")

## plot.ly settings
i_plot <- i_plot %>%
    ggplotly(tooltip=c("text")) %>%
    config(displaylogo = F, scrollZoom = F, modeBarButtonsToRemove = list('sendDataToCloud', 'toImage'))


## Cannot use as.widget() due to bad defaults; must createWidget the hard way
createWidget(name="plotly",x=plotly_build(i_plot), sizingPolicy=sizingPolicy(browser.padding = 0, 
            browser.fill = F, defaultWidth = "100%", defaultHeight = 400)) %>%
saveWidget("cnn-header-fb.html", selfcontained=F, libdir="plotly")

## Turn Chart into single function

In [264]:
make_mood_matrix <- function(app_id, app_secret, page_id, get_data=T) {
    
    if (get_data) {
    system(sprintf("python get_fb_posts_fb_page_mod.py %s %s %s", app_id, app_secret, page_id))
    }
    
    df <- read_csv(sprintf("%s_facebook_statuses.csv", page_id)) %>% filter(status_published > '2016-02-24 00:00:00')
    if (nrow(df) == 0) {return(NA)}
    
    df$tooltip_title <- as.character(apply(df, 1, get_tooltip_title))
    
    df_agg <- df %>% mutate(total_reactions=num_loves+num_wows+num_hahas+num_sads+num_angrys,
                          perc_loves=num_loves/total_reactions,
                          perc_wows=num_wows/total_reactions,
                          perc_hahas=num_hahas/total_reactions,
                          perc_sads=num_sads/total_reactions,
                          perc_angrys=num_angrys/total_reactions) %>%
                select(tooltip_title, perc_loves,perc_wows,perc_hahas,perc_sads,perc_angrys, total_reactions)

    df_agg$max_perc <- apply(df_agg, 1, get_max_perc)
    df_agg$react_type <- as.character(apply(df_agg, 1, get_max_reaction_type))
    df_agg$tooltip <- as.character(apply(df_agg, 1, get_tooltip))
    
    df_agg_long <- df_agg %>%
                    select(react_type, max_perc, tooltip,total_reactions) %>%
                    left_join(colors) %>%
                    mutate(react_type=factor(react_type), max_perc=as.numeric(max_perc)) %>%
                    filter(max_perc >= 0.75,
                           total_reactions > 20,
                           (max_perc - 0.75)/sqrt(max_perc*(1-max_perc)/total_reactions) > qnorm(0.99))
    
    i_plot <- ggplot(data=df_agg_long, aes(x=react_type, y=max_perc, color=color, text=tooltip)) +
            geom_point(position = position_jitter(width = 0.75), size=2, shape=1) +
            fte_theme() +
            coord_flip() +
            scale_x_discrete() +
            scale_y_continuous(limits=c(0.75,1), breaks=seq(0.75,1.00,by=0.05), labels=percent) +
            scale_color_identity() +
            theme(axis.title.y=element_blank(),
                  axis.title.x = element_text(size = 10, color="#525252"),
                  axis.text.x = element_text(size = 14, family = "Source Sans Pro", color="#737373"),
                  axis.text.y = element_text(size = 14, family = "Source Sans Pro", color="#737373"),
                  plot.title=element_text(size = 18)) +
            labs(y="% of Type of Facebook Reactions on Post", title=sprintf("User Reactions on %s's Facebook Posts", page_id))

    i_plot <- i_plot %>%
        ggplotly(tooltip=c("text")) %>%
        config(displaylogo = F, scrollZoom = F, modeBarButtonsToRemove = list('sendDataToCloud', 'toImage'))

    createWidget(name="plotly",x=plotly_build(i_plot), sizingPolicy=sizingPolicy(browser.padding = 0, 
                browser.fill = F, defaultWidth = "100%", defaultHeight = 400)) %>%
    saveWidget(sprintf("%s-interactive-fb.html", page_id), selfcontained=F, libdir="plotly")

}

In [167]:
app_id <- "<FILL IN>"
app_secret <- "<FILL IN>" # DO NOT SHARE WITH ANYONE!
page_id <- "nytimes"

make_mood_matrix(app_id, app_secret, page_id)

In [226]:
make_mood_matrix(app_id, app_secret, "berniesanders", F)

Joining by: "react_type"
In left_join_impl(x, y, by$x, by$y): joining factor and character vector, coercing into character vector

In [261]:
pages <- unlist(read_delim("top_fb_pages.txt", "\n", col_names=F))

print(pages[1:10])

                 X11                  X12                  X13 
         "cristiano"               "cavs"          "vindiesel" 
                 X14                  X15                  X16 
               "nba"     "huffingtonpost"            "abcnews" 
                 X17                  X18                  X19 
    "bleacherreport" "goodmorningamerica"            "foxnews" 
                X110 
      "jasonstatham" 


In [None]:
start_index <- 41

invisible(lapply(pages[start_index:length(pages)], function(x) make_mood_matrix(app_id, app_secret, x)))

# The MIT License (MIT)

Copyright (c) 2016 Max Woolf

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.