Botany2020tweets_emojtest.Rmd

---
title: "A summary of the tweets generated at Botany 2020 (#Botany2020)"
author: "E.J. Rollinson"
output:
  html_document:
    toc: yes
  pdf_document:
    toc: yes
editor_options: 
  chunk_output_type: console
---  

```{r, echo=FALSE, results='hide', message=FALSE, warning=FALSE}

library(extrafont)
#extrafont::loadfonts(device="win") #run once to import fonts; code below assumes Open Sans installed locally
library(dplyr)
library(ggplot2)
library(knitr)
library(tidyr)
library(RColorBrewer)
library(stringr)
#devtools::install_github("dill/emoGG")
#library(emoGG)
#devtools::install_github("coolbutuseless/ggpattern")
library(ggpattern)
library(magick)

#The comments below show how to use rtweet to download Twitter data. You will need to create an app at https://dev.twitter.com/ to get Twitter API OAuth values that fill into the spaces indicated below.

#load libraries
# library(rtweet)
# library(tidyverse)
# library(lubridate)
# library(tidytext)
# 
# #pass your keys to API
# appname <- "YourAppName"
# key <- "YourKey"
# secret <- "YourSecretKey"
# access_token <- "YourAccessToken"
# access_secret <- "YourSecretAccessToken"
# 
# twitter_token <-create_token(
#   app = appname,
#   consumer_key = key,
#   consumer_secret = secret,
#   access_token = access_token,
#   access_secret = access_secret)
# 
# tweets<-search_tweets(q = "#Botany2020",
#                       n=10000, retryonratelimit = TRUE)
# save_as_csv(tweets, "Botany2020FullWeek.csv")


dt_tweets <- read.csv("Botany2020FullWeek.csv", stringsAsFactors = FALSE)
dt_tweets$created_at <- as.Date(dt_tweets$created_at)
uniques <- dt_tweets %>%
  filter(is_retweet=="FALSE")

```

Emoji figure test

```{r}

#this one works sort of


flags <- c(rep("C:/Users/erollinson/Documents/GitHub Projects/erollinson.github.io/ramenbar.jpg",7))

ggplot(mpg, aes(class)) +
  geom_bar_pattern(
    aes(
      pattern_filename = class
    ),
    pattern         = 'image',
    pattern_type    = 'none',
    fill            = 'white',
    colour          = 'white',
    pattern_scale   = -2,
    pattern_filter  = 'point',
    pattern_gravity = 'west'
  ) +
  theme_classic(18) +
  labs(
    title = "ggpattern::geom_bar_pattern() + coord_flip()",
    subtitle = "pattern = 'image'"
  ) +
  theme(legend.position = 'none') +
  scale_pattern_filename_discrete(choices = flags) +
  coord_flip() +
  scale_pattern_discrete(guide = guide_legend(nrow = 1))


#try it on tweet graphs

top_users <- dt_tweets %>% group_by(screen_name) %>%
  summarize(total_tweets = n(),
            Retweets = sum(is_retweet),
            Original = sum(!is_retweet),
            .groups="drop_last") %>%
  arrange(desc(total_tweets)) %>%
  slice(1:50) %>%
  gather(type, n_tweets, -screen_name, -total_tweets)

top_users$screen_name <- reorder(top_users$screen_name,
                                top_users$total_tweets,
                                function(x) sum(x))

imagefill <- c(rep("C:/Users/erollinson/Documents/GitHub Projects/erollinson.github.io/coffeebar.png",nrow(top_users)/2))


ggplot(filter(top_users, type=="Original")) + 
  geom_bar_pattern(aes(x = screen_name, y = n_tweets, pattern_filename=n_tweets),
    pattern         = 'image',
    pattern_type    = 'none',
    fill            = 'white',
    colour          = 'white',
    pattern_scale   = -2,
    pattern_filter  = 'point',
    pattern_gravity = 'west',
    position = position_stack(reverse = TRUE), 
                             stat = "identity") +
  ylab("Number of tweets") +
  scale_pattern_filename_discrete(choices = imagefill) +
  coord_flip() +
  theme_bw() +
  labs(title="Most #Botany2020 Tweets (including retweets)", subtitle = "Mon - Fri, Top 50 Users") +
  theme(axis.text = element_text(size = 11, family = "Open Sans", color="black"),
        legend.text = element_text(size = 12, family="Open Sans"),
        legend.title = element_blank(),
        axis.title.x = element_text(size = 12, family = "Open Sans"),
        axis.title.y = element_blank(),
        plot.title= element_text(size=14, family="Open Sans", color="black"),
        plot.subtitle = element_text(size=12, family="Open Sans", color="black"))
```


## About this document
This document was produced by [Emily J. Rollinson](http://rollinsonecology.com) ([\@ejrollinson](https://twitter.com/ejrollinson) on Twitter). 

The code generating this document was originally developed by [Francois Michonneau](https://github.com/fmichonneau) ([\@fmic_](https://twitter.com/fmic_) on Twitter) for the 2015 Evolution meeting and can be found [here](https://github.com/fmichonneau/evol2015-tweets). 

I originally adapted this code for the 2015 Ecological Society of America meeting (#ESA100) in Baltimore, MD, and have since reproduced it for a series of ESA and Botany meetings. I also modified it in 2020 to use the rtweet package instead of twitteR to aggregate tweets, updated the code used to generate the wordcloud, and modified the formatting of the plots.
 
Tweets using the hashtag #Botany2020 were aggregated from Twitter using the R package [rtweet](https://cran.r-project.org/web/packages/rtweet/index.html) and the Twitter API. The summary statistics are static as of 8:00 AM EDT Aug 01 2020 (additional tweets, retweets, and likes after that point are not included).

This document was generated using RMarkdown, and the source is [available on GitHub](https://github.com/erollinson/erollinson.github.io).

This document (and associated code) is released under a CC0 licence.


## Basic summary

Of the `r nrow(dt_tweets)` tweets tagged #Botany2020 between `r min(dt_tweets$created_at)` and `r max(dt_tweets$created_at)`:

|Description | n |
|------------|---|
|Total of original tweets (no retweets): | `r sum(!dt_tweets$is_retweet)`|
|Number of users who tweeted (including retweeting): | `r n_distinct(dt_tweets$screen_name)`|
|Number of users who tweeted (no retweets): | `r n_distinct(uniques$screen_name)`|


## The 5 most liked tweets

```{r top-fav, echo=FALSE, results='asis', warning=FALSE}
top_fav <- dt_tweets %>%
  filter(!is_retweet) %>%
  arrange(desc(favorite_count)) %>%
  slice(1:5)
b<-as.data.frame(top_fav)
b$idlink <-str_sub(b$status_id, 2, nchar(b$status_id))

render_tweet <- function(dt, row) {
    screen_name <- dt[i, "screen_name"]
    id <- format(dt[i, "idlink"], scientific = FALSE)
    txt <- dt[i, "text"]
    created <- format(dt[i, "created_at"], "%Y-%m-%d")
    n_fav <- dt[i, "favorite_count"]
    n_retweets <- dt[i, "retweet_count"]
        cat("<blockquote class=\"twitter-tweet\" data-lang=\"en\"> \n",
        "<p lang=\"en\" dir=\"ltr\">",
        txt,
        "</p>&mdash; ",
        "<a href=\"https://twitter.com/", screen_name, "\">", screen_name, "</a>", "&nbsp;|&nbsp;",
        "<a href=\"https://twitter.com/",
        screen_name, "/status/", id, "\"> ", created, "</a> &nbsp;|&nbsp;",
        n_retweets, " retweets, ",  n_fav, " likes </blockquote>",
        "\n \n",
        sep = "")
}

for (i in seq_len(nrow(b))) {
    render_tweet(b, i)
}

```


## The 5 most retweeted tweets

```{r top-rt, echo=FALSE, results='asis', warning=FALSE}
top_rt <- dt_tweets %>%
  filter(!is_retweet) %>%
  arrange(desc(retweet_count)) %>%
  slice(1:5)
c<-as.data.frame(top_rt)
c$idlink <-str_sub(c$status_id, 2, nchar(c$status_id))

for (i in seq_len(nrow(b))) {
    render_tweet(c, i)
}

```

## Top tweeters

All generated tweets (including retweets)

```{r top-users-all, echo=FALSE, fig.height=10, warning=FALSE}
top_users <- dt_tweets %>% group_by(screen_name) %>%
  summarize(total_tweets = n(),
            Retweets = sum(is_retweet),
            Original = sum(!is_retweet),
            .groups="drop_last") %>%
  arrange(desc(total_tweets)) %>%
  slice(1:50) %>%
  gather(type, n_tweets, -screen_name, -total_tweets)

top_users$screen_name <- reorder(top_users$screen_name,
                                top_users$total_tweets,
                                function(x) sum(x))

pal <- brewer.pal(9, "YlGnBu")

ggplot(top_users) + 
  geom_bar(aes(x = screen_name, y = n_tweets, fill = type), 
                             position = position_stack(reverse = TRUE), 
                             stat = "identity") +
  ylab("Number of tweets") +
  coord_flip() +
  scale_fill_manual(values = pal[c(7, 4)]) +
  theme_bw() +
  labs(title="Most #Botany2020 Tweets (including retweets)", subtitle = "Mon - Fri, Top 50 Users") +
  theme(axis.text = element_text(size = 11, family = "Open Sans", color="black"),
        legend.text = element_text(size = 12, family="Open Sans"),
        legend.title = element_blank(),
        axis.title.x = element_text(size = 12, family = "Open Sans"),
        axis.title.y = element_blank(),
        plot.title= element_text(size=14, family="Open Sans", color="black"),
        plot.subtitle = element_text(size=12, family="Open Sans", color="black"))
```

Only for original tweets (retweets excluded)

```{r, top-users-orig, echo=FALSE, fig.height=10, warning=FALSE}
top_orig_users <- dt_tweets %>% group_by(screen_name) %>%
  summarize(total_tweets = n(),
            Retweet = sum(is_retweet),
            Original = sum(!is_retweet),
            .groups="drop_last") %>%
  arrange(desc(Original)) %>%
  slice(1:50)

top_orig_users$screenName <- reorder(top_orig_users$screen_name,
                                     top_orig_users$Original,
                                     function(x) sum(x))

pal <- brewer.pal(9, "YlGnBu")

library(ggpattern)

ggplot(top_orig_users)  + geom_bar(aes(x = screenName, y = Original), stat = "identity", fill = pal[4]) +
  ylab("Number of tweets") +
  coord_flip() +
    theme_bw() +
    labs(title="Most #Botany2020 Tweets (excluding retweets)", subtitle = "Mon - Fri, Top 50 Users") +
  theme(axis.text = element_text(size = 11, family = "Open Sans", color="black"),
        legend.text = element_text(size = 12, family="Open Sans"),
        legend.title = element_blank(),
        axis.title.x = element_text(size = 12, family = "Open Sans"),
        axis.title.y = element_blank(),
        plot.title= element_text(size=14, family="Open Sans", color="black"),
        plot.subtitle = element_text(size=12, family="Open Sans", color="black"))

coffee <-"C:/Users/Emily/Documents/Github Projects/erollinson.github.io/coffee.png"

coffeelist <-rep(c("C:/Users/Emily/Documents/Github Projects/erollinson.github.io/coffee.png"),each=50)

ggplot(top_orig_users, aes(screenName, Original)) + geom_bar_pattern(aes(fill=screenName, pattern="image", pattern_filename = coffee), pattern_type"tile") + scale_pattern_filename_discrete(choices=coffeelist)
                                                                     
                                                                     flags <- c(
  system.file("img", "flag", "au.png", package = "ggpattern"),
  system.file("img", "flag", "dk.png", package = "ggpattern"),
  system.file("img", "flag", "gb.png", package = "ggpattern"),
  system.file("img", "flag", "gr.png", package = "ggpattern"),
  system.file("img", "flag", "no.png", package = "ggpattern"),
  system.file("img", "flag", "se.png", package = "ggpattern"),
  system.file("img", "flag", "us.png", package = "ggpattern")
)

p <- ggplot(mpg, aes(class)) +
  geom_bar_pattern(
    aes(
      pattern_filename = class
    ),
    pattern         = 'image',
    pattern_type    = 'tile',
    fill            = 'white',
    colour          = 'black',
    pattern_filter  = 'box',
    pattern_scale   = -1
  ) +
  theme_bw(18) +
  labs(
    title = "ggpattern::geom_bar_pattern()",
    subtitle = "pattern = 'image'"
  ) +
  theme(legend.position = 'none') +
  scale_pattern_filename_discrete(choices = flags) +
  coord_fixed(ratio = 1/15) +
  scale_pattern_discrete(guide = guide_legend(nrow = 1))

p
```


## Most favorited/retweeted users

The figures below only include users who tweeted 5+ times, and don't include retweets.

### Number of favorites received by users

```{r, fig.height=10, echo=FALSE, warning=FALSE}
impact <- dt_tweets %>% filter(!is_retweet) %>%
  group_by(screen_name) %>%
  summarize(n_tweets = n(),
            n_fav = sum(favorite_count),
            n_rt =  sum(retweet_count),
            mean_fav = mean(favorite_count),
            mean_rt = mean(retweet_count),
            .groups="drop_last")
### Most favorited
most_fav <- impact %>%
  arrange(desc(n_fav)) %>%
  slice(1:50)

most_fav$screen_name <- reorder(most_fav$screen_name,
                               most_fav$n_fav,
                               sort)

ggplot(most_fav) + geom_bar(aes(x = screen_name, y = n_fav),
                            stat = "identity", fill = pal[4]) +
  coord_flip() + ylab("Total number of likes") +
    theme_bw() +
  theme(axis.text = element_text(size = 11, family = "Open Sans", color="black"),
        legend.text = element_text(size = 12, family="Open Sans"),
        legend.title = element_blank(),
        axis.title.x = element_text(size = 12, family = "Open Sans"),
        axis.title.y = element_blank())
```

### Number of retweets received by users

```{r, fig.height=10, echo=FALSE, warning=FALSE}
## Most retweeted

most_rt <- impact %>%
  arrange(desc(n_rt)) %>%
  slice(1:50)

most_rt$screen_name <- reorder(most_rt$screen_name,
                              most_rt$n_rt,
                              sort)

ggplot(most_rt) + geom_bar(aes(x = screen_name, y = n_rt),
                           stat = "identity", fill = pal[4]) +
  coord_flip() + ylab("Total number of retweets") +
  theme_bw() +
  theme(axis.text = element_text(size = 11, family = "Open Sans", color="black"),
        legend.text = element_text(size = 12, family="Open Sans"),
        legend.title = element_blank(),
        axis.title.x = element_text(size = 12, family = "Open Sans"),
        axis.title.y = element_blank())
```

### Mean numbers of likes received

```{r, fig.height=10, echo=FALSE, warning=FALSE}

### Mean likes

hi_mean_fav <- impact %>%
  arrange(desc(mean_fav)) %>%
  slice(1:50)

hi_mean_fav$screen_name <- reorder(hi_mean_fav$screen_name,
                                  hi_mean_fav$mean_fav,
                                  sort)

ggplot(hi_mean_fav) + geom_bar(aes(x = screen_name, y = mean_fav),
                           stat = "identity", fill = pal[4]) +
  coord_flip() + ylab("Number of likes / tweets") +
  theme_bw() +
  theme(axis.text = element_text(size = 11, family = "Open Sans", color="black"),
        legend.text = element_text(size = 12, family="Open Sans"),
        legend.title = element_blank(),
        axis.title.x = element_text(size = 12, family = "Open Sans"),
        axis.title.y = element_blank())

```

### Mean numbers of retweets received

```{r, fig.height=10, echo=FALSE, warning=FALSE}

### Mean retweets

hi_mean_rt <- impact %>%
  arrange(desc(mean_rt)) %>%
  slice(1:50)

hi_mean_rt$screen_name <- reorder(hi_mean_rt$screen_name,
                                 hi_mean_rt$mean_rt,
                                 sort)

ggplot(hi_mean_rt) + geom_bar(aes(x = screen_name, y = mean_rt),
                           stat = "identity", fill =pal[5]) +
  coord_flip() + xlab("User") + ylab("Number of retweets / tweets") +
  theme_bw() +
  theme(axis.text = element_text(size = 11, family = "Open Sans", color="black"),
        legend.text = element_text(size = 12, family="Open Sans"),
        legend.title = element_blank(),
        axis.title.x = element_text(size = 12, family = "Open Sans"),
        axis.title.y = element_blank())


```

## Word cloud

The top 100 words among the original tweets, excluding retweets, hashtags, mentions, URLs, and common English words ("the', "&", etc.).


```{r word-cloud, echo=FALSE, message=FALSE, warning=FALSE}
library(wordcloud)
library(tm)
library(stringr)
library(ghibli)
library(qdapRegex)

palcloud <-brewer.pal(9, "YlGnBu")

cleaned <- dt_tweets %>%
  filter(!is_retweet) %>%
  .$text %>%
  str_c() %>%
  str_remove("\\n") %>%
  rm_twitter_url() %>%
  rm_url() %>%
  str_remove_all("#\\S+") %>%
  str_remove_all("@\\S+") %>%
  removeWords(stopwords("english")) %>%
  removeNumbers() %>%
  stripWhitespace() %>%
  removePunctuation() %>%
  removeWords(c("amp")) %>%
  str_remove("uff") %>%
  str_remove("â") %>%
  str_remove("use") %>%
  str_remove("its") %>%
  str_remove("edt")

#stopwords() removes common english words; remaining str_remove() lines clean up some remaining junk and common words  
corpus <- Corpus(VectorSource(cleaned)) %>%
  TermDocumentMatrix() %>%
  as.matrix()

corpus <- sort(rowSums(corpus), decreasing=TRUE)
corpus <- data.frame(word=names(corpus), freq=corpus, row.names=NULL)
corpus <- corpus %>%
  filter(word != "its" & word != "edt" & word != "the" & word != "this" & word != "i€™m" & word !="and") #catching common words that were for whatever reason not removed by stopwords and str_remove above

wordcloud(corpus$word, corpus$freq, max.words=100, colors = palcloud[5:9], random.order = FALSE, scale = c(4, 0.9), rot.per=0, fixed.asp=FALSE)

```

-----

<p xmlns:dct="http://purl.org/dc/terms/" xmlns:vcard="http://www.w3.org/2001/vcard-rdf/3.0#">
  <a rel="license"
     href="http://creativecommons.org/publicdomain/zero/1.0/">
    <img src="http://i.creativecommons.org/p/zero/1.0/88x31.png" style="border-style: none;" alt="CC0" />
  </a>
  <br />
  To the extent possible under law,
  <a rel="dct:publisher"
     href="https://github.com/erollinson/erollinson.github.io">
    <span property="dct:title">Emily J. Rollinson</span></a>
  has waived all copyright and related or neighboring rights to
  <span property="dct:title">Summary of tweets at Botany 2020</span>.
This work is published from:
<span property="vcard:Country" datatype="dct:ISO3166"
      content="US" about="https://github.com/erollinson/erollinson.github.io">
  United States</span>.
</p>