# academictwitteR

The release of the new [Twitter v2 API](https://developer.twitter.com/en/support/twitter-api/v2) with its [academic research access](https://developer.twitter.com/en/products/twitter-api/academic-research) option has sparked the creation of several new `R` packages. Among those, [`academictwitteR`](https://github.com/cjbarrie/academictwitteR) has the best documentation and is the easiest one to use for collecting Twitter data. Hence, we will focus on that package in this notebook. Another new `R` package that makes use of the v2 API is [`voson.tcn`](https://vosonlab.github.io/voson.tcn/) (from [*VOSON Lab*](http://vosonlab.net/) who also created [`vosonSML`](https://vosonlab.github.io/vosonSML/)). However, `voson.tcn` was created for analyzing conversation networks on Twitter, so the units of analyses are conversations instead of users for that package.

*Note*: The first version of the `academictwitteR` was based on a [gist](https://gist.github.com/schochastics/1ff42c0211916d73fc98ba8ad0dcb261) by [David Schoch](https://gist.github.com/schochastics) (who now also works at GESIS).

## Load packages

In addition to the `academictwitteR` package, we load the packages from the [`tidyverse`](https://www.tidyverse.org/) for some (minor) data wrangling.

In [None]:
library(academictwitteR)
library(tidyverse)

## Authentication

Before we can collect data via the Twitter v2 API, we need to set up our credentials.

**NB**: You should treat all information relating to your API access like a password and never share it or post it publicly anywhere. Although nobody except you should be able to access your personal instance of this notebook, if you want to be extra cautious, you can delete your API access information from the following cell after running it once (and save the notebook again after that).

In [None]:
# enter a bearer token for your v2 API app here
bearer_token <- ""

## Import list of accounts

The file [twitter_accounts.csv](./data/twitter_accounts.csv) in the `data` folder of this repository contains the Twitter screen names of [*GESIS - Leibniz Institute for the Social Sciences*](https://twitter.com/gesis_org), [*GESIS Training*](https://twitter.com/gesistraining/) and the [*Social Data Science Lab*](https://twitter.com/socdatalab) at *Cardiff University* which we will use in the following examples.

In [None]:
users_df <- read_csv("./data/twitter_accounts.csv")
users <- users_df %>%
    pull(Screen_Name)

## Collecting tweets from specific users

We now collect all tweets sent from these accounts between January 1st and June 22nd, 2022. We limit the maximum number of tweets to 1000, and save the resulting `JSON` files in the `data` folder.

In [None]:
tweets_df <- get_all_tweets(
    users = users,
    start_tweets = "2021-01-01T00:00:00Z",
    end_tweets = "2022-06-22T00:00:00Z",
    n = 10000,
    data_path = "./data",
    bearer_token = bearer_token
  )

From these `JSON` files, we can now create an `R` dataframe.

In [None]:
tweets <- bind_tweets(data_path = "data/",
                      output_format = "tidy")

We can now have a first look at the data.

In [None]:
glimpse(tweets)

## User information

If we want to get some (additional) information about the accounts, we can use the `get_user_profile` function. To use this, we need the user ID (instead of the screen name).

In [None]:
profiles <- get_user_profile(unique(tweets$author_id),
                             bearer_token)

We can also check what these look like.

In [None]:
glimpse(profiles)

We can now combine the profile information with the tweets data. In the following cell, we just use the variables `name`, `username`, and `location` from the profile information.

In [None]:
tweets_combined <- tweets_df %>% 
    left_join(select(profiles, id, name, username, location),
              by = c("author_id" = "id"))

## Followed accounts

If we are (also) interested in network data, we can make use of functions from the `academictwitteR` package that allow us to collect information about the accounts a user follows or is followed by. In the following example, we will gather information about the accounts that our three exemplary accounts follow. We, again, need the user IDs for this.

In [None]:
ids <- profiles %>% 
    pull(id)

followed <- get_user_following(ids, bearer_token)

In [None]:
glimpse(followed)

## Process & save results

If we check the types of columns in the `tweets_combined` object, we can see that some of them are lists or dataframes.

In [None]:
sapply(tweets_combined, class)

For example, the column `public_metrics` that contains information on how often a tweet has been liked, retweeted, quoted, or replied to is a dataframe.

In [None]:
glimpse(tweets_combined$public_metrics)

If we want to save our results as a `.csv` file using the `dplyr` function `write_csv()`(or the base `R` function `write.csv()`), the object cannot include list- or dataframe-columns. To solve this issue, we could split the list columns into "regular" vector columns (e.g., using the [`unnest()` function](https://tidyr.tidyverse.org/reference/nest.html) from the [`tidyr` package](https://tidyr.tidyverse.org/index.html)) and/or append the dataframe columns (using `bind_cols()` from `dplyr` or `cbind()` from base `R`). However, to keep it simple here, we will just remove all dataframe- and list-columns and so that we can save the resulting object in a `.csv` file.

In [None]:
tweets_combined_df <- tweets_combined %>% 
    select(where(~ !is.data.frame(.x) & !is.list(.x)))

Before we save the resulting dataframe as `csv`, we can check it to make sure that it does not contain any list- or dataframe-columns anymore.

In [None]:
glimpse(tweets_combined_df)

In [None]:
write_csv(tweets_combined_df, "./data/tweets_r.csv")

We need to go through the same steps for the data on the followed accounts.

In [None]:
followed <- followed %>% 
    select(where(~ !is.data.frame(.x)))

write_csv(followed, "./data/followed_r.csv")