/
twitter.Rmd
68 lines (56 loc) · 1.68 KB
/
twitter.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
title: "Example: Social media analysis (Twitter)"
author: "Kohei Watanabe"
output:
html_document:
toc: true
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "##"
)
```
Using **quanteda**'s `fcm()` and `textplot_network()`, you can perform visual analysis of social media posts in terms of co-occurrences of hashtags or usernames in a few steps. The dataset for this example contains only 10,000 Twitter posts, but you can easily analyse more than one million posts on your laptop computer.
```{r, message = FALSE}
library(quanteda)
```
## Load sample data
```{r}
load("data/data_corpus_tweets.rda")
```
## Construct a document-feature matrix of Twitter posts
```{r}
dfmat_tweets <- tokens(data_corpus_tweets, remove_punct = TRUE) |>
dfm()
head(dfmat_tweets)
```
# Hashtags
## Extract most common hashtags
```{r}
dfmat_tag <- dfm_select(dfmat_tweets, pattern = "#*")
toptag <- names(topfeatures(dfmat_tag, 50))
head(toptag)
```
## Construct feature-occurrence matrix of hashtags
```{r}
library("quanteda.textplots")
fcmat_tag <- fcm(dfmat_tag)
head(fcmat_tag)
fcmat_topgat <- fcm_select(fcmat_tag, pattern = toptag)
textplot_network(fcmat_topgat, min_freq = 0.1, edge_alpha = 0.8, edge_size = 5)
```
# Usernames
## Extract most frequently mentioned usernames
```{r}
dfmtat_users <- dfm_select(dfmat_tweets, pattern = "@*")
topuser <- names(topfeatures(dfmtat_users, 50))
head(topuser)
```
## Construct feature-occurrence matrix of usernames
```{r}
fcmat_users <- fcm(dfmtat_users)
head(fcmat_users)
fcmat_users <- fcm_select(fcmat_users, pattern = topuser)
textplot_network(fcmat_users, min_freq = 0.1, edge_color = "orange", edge_alpha = 0.8, edge_size = 5)
```