-
Notifications
You must be signed in to change notification settings - Fork 9
/
03e-facebook-data-collection.Rmd
95 lines (73 loc) · 2.61 KB
/
03e-facebook-data-collection.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
title: "Scraping data from Facebook"
author: "Pablo Barbera"
date: "June 28, 2017"
output: html_document
---
### Scraping web data from Facebook
To scrape data from Facebook's API, we'll use the `Rfacebook` package.
```{r}
library(Rfacebook)
```
To get access to the Facebook API, you need an OAuth code. You can get yours going to the following URL: [https://developers.facebook.com/tools/explorer](https://developers.facebook.com/tools/explorer)
Once you're there:
1. Click on "Get Access Token"
2. Copy the long code ("Access Token") and paste it here below, substituting the fake one I wrote:
```{r}
fb_oauth = 'YOUR_TOKEN_HERE'
```
Now try running the following line:
```{r}
getUsers("me", token=fb_oauth, private_info=TRUE)
```
Does it return your Facebook public information? Yes? Then we're ready to go. See also `?fbOAuth` for information on how to get a long-lived OAuth token.
At the moment, the only information that can be scraped from Facebook is the content of public pages.
The following line downloads the ~200 most recent posts on the facebook page of Donald Trump
```{r}
page <- getPage("DonaldTrump", token=fb_oauth, n=20, reactions=TRUE, api="v2.9")
```
What information is available for each of these posts?
```{r}
page[1,]
```
Which post got more likes, more comments, and more shares?
```{r}
page[which.max(page$likes_count),]
page[which.max(page$comments_count),]
page[which.max(page$shares_count),]
```
What about other reactions?
```{r}
page[which.max(page$love_count),]
page[which.max(page$haha_count),]
page[which.max(page$wow_count),]
page[which.max(page$sad_count),]
page[which.max(page$angry_count),]
```
We can also subset by date. For example, imagine we want to get all the posts from early November 2012 on Barack Obama's Facebook page
```{r}
page <- getPage("barackobama", token=fb_oauth, n=100,
since='2012/11/01', until='2012/11/10')
page[which.max(page$likes_count),]
```
And if we need to, we can also extract the specific comments from each post.
```{r}
post_id <- page$id[which.max(page$likes_count)]
post <- getPost(post_id, token=fb_oauth, n.comments=1000, likes=FALSE)
```
This is how you can view those comments:
```{r}
comments <- post$comments
head(comments)
```
Also, note that users can like comments! What is the comment that got the most likes?
```{r}
comments[which.max(comments$likes_count),]
```
This is how you get nested comments:
```{r}
page <- getPage("barackobama", token=fb_oauth, n=1)
post <- getPost(page$id, token=fb_oauth, comments=TRUE, n=100, likes=FALSE)
comment <- getCommentReplies(post$comments$id[1],
token=fb_oauth, n=500, likes=TRUE)
```