# vosonSML

The examples in this notebook are adapted from the [`vosonSML` *GitHub* repository](https://github.com/vosonlab/vosonSML).

*Note*: Another good option for collecting network data via the [Twitter API v1.1](https://developer.twitter.com/en/docs/twitter-api/v1) is the [`twittercrawler` package](https://github.com/AndrewCarr24/twittercrawler).

## Load packages

In addition to `vosonSML`, we will load the [`magrittr` package](https://magrittr.tidyverse.org/index.html) so we can use the [pipe %>% operator](https://magrittr.tidyverse.org/reference/pipe.html) included in that package as well as the [`dplyr`](https://dplyr.tidyverse.org/) for some (minor) data wrangling.

In [None]:
library(vosonSML)
library(magrittr)
library(dplyr)

## Authentication

To use `vosonSML` for collecting Twitter data, you need to create an access token using the API credentials for the app you created.

**NB**: You should treat all information relating to your API key like a password and never share it or post it publicly anywhere. Although nobody except you should be able to access your personal instance of this notebook (and your edits will also not be persistent if you do not have/use a *GESIS Notebooks* user account), if you want to be extra cautious, you can delete your API access information from the following cell after running it once (and save the notebook again after that).

In [None]:
twitterAuth <- Authenticate("twitter", 
                            appName = "My App",
                            apiKey = "xxxxxxxxxxxx",
                            apiSecret = "xxxxxxxxxxxx",
                            accessToken = "xxxxxxxxxxxx",
                            accessTokenSecret = "xxxxxxxxxxxx")

## Import list of users

The file [twitter_accounts.csv](./data/twitter_accounts.csv) in the `data` folder of this repository contains the Twitter screen names of [*GESIS - Leibniz Institute for the Social Sciences*](https://www.gesis.org/en/home) and the [*Social Data Science Lab*](http://socialdatalab.net/) at *Cardiff University* which we will use in the following examples.

In [None]:
users_df <- read.csv("./data/twitter_accounts.csv")
users = as.character(users_df$Screen.Name)

## Build a search query

In the next cell, we will construct a search query for getting tweets from the accounts in our user list by using basic string operations and a [regular expression](https://en.wikipedia.org/wiki/Regular_expression).

In [None]:
query <- gsub('.{4}$', '', paste0("from:", users, " OR ", collapse = ""))

## Search & collect tweets

Using the credentials and the search query we created before, we can now search for and collect tweets. In the function below, we set the maximum number of tweets to 100 and include retweets.

In [None]:
user_tweets <- twitterAuth %>%
  Collect(searchTerm = query,
          searchType = "recent",
          numTweets = 100,
          includeRetweets = TRUE,
          retryOnRateLimit = TRUE,
          writeToFile = FALSE,
          verbose = TRUE)

**NB**: This function uses the [standard search API](https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/overview) which only returns tweets for the last 7 days.

Instead of searching for tweets from specific accounts, we can, e.g., also search for tweets that use a specific hashtag.

In [None]:
hashtag_tweets <- twitterAuth %>%
  Collect(searchTerm = "#rstats",
          searchType = "recent",
          numTweets = 100,
          includeRetweets = FALSE,
          retryOnRateLimit = TRUE,
          writeToFile = FALSE,
          verbose = TRUE)

It is, of course, possible to construct other (and more complex) [search queries](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query) using all options the Twitter API offers for this.

## Process & save results

The `user_tweets` object has a special kind of class which is relevant if we want to further work with it using the `vosonSML` package (e.g., for network analysis).

In [None]:
class(user_tweets)

If we look at the types of the columns included in the `user_tweets` object, we see that it includes a couple of list-columns. These cannot be saved in a `.csv` file with the base `R` function `write.csv()`.

In [None]:
sapply(user_tweets, class)

We could change the list-columns into vectors (e.g., using the [`unnest()` function](https://tidyr.tidyverse.org/reference/nest.html) from the [`tidyr` package](https://tidyr.tidyverse.org/index.html)). However, to keep it simple here, we will just remove all list-columns and change the class of the `user_tweets` object to dataframe so that we can save the resulting object in a `.csv` file.

In [None]:
user_tweets_df <- user_tweets %>% 
    as.data.frame() %>%
    select(where(~ !is.list(.x)))

In [None]:
write.csv(user_tweets_df, "./data/voson_tweets.csv")