-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add get_comment_reddit #1
Conversation
Great! redditextractoR seems to be FUBAR now. Would you consider adding the ability of adding an upvote percentage column of the comments? |
As far as I'm aware that information is not available to us. Only its "net" number of votes. |
redditExtractoR was doing that up until yesterday. Is that just a pushshift.io thing? Additionally, I just compiled your PR, and the scores all return as 1. |
It might just be a pushshift.io thing. The reason why are getting 1 i scores is because the comments you have pulled are too young to have been voted on. If you look a little further back in time or limit your search you start seeing other scores. rreddit::get_comment_reddit("catstandingup")$score
#> ✔ #1: collected 507 posts
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [24] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
#> [47] 1 1 2 2 2 1 1 3 1 1 3 1 1 1 2 2 2 3 3 2 3 5 3
#> [70] 1 2 0 0 1 2 2 2 1 -2 6 3 1 2 2 1 1 2 3 3 5 6 1
#> [93] 1 1 2 3 3 2 2 1 5 1 1 5 4 -3 3 1 2 -5 1 1 3 1 2
#> [116] 2 1 1 1 2 1 2 2 1 1 2 2 3 1 3 2 0 1 1 2 0 1 2
#> [139] 1 1 1 1 1 1 2 2 1 1 1 1 2 1 2 1 1 2 2 2 1 3 2
#> [162] 1 4 6 3 4 3 1 1 1 1 1 1 1 1 1 1 1 4 3 2 3 5 4
#> [185] 1 1 1 2 2 2 2 1 1 1 1 3 4 1 1 2 2 2 1 1 1 2 2
#> [208] 1 0 2 2 2 3 2 3 1 1 2 3 2 1 1 1 2 1 2 1 2 2 3
#> [231] 1 2 1 2 3 2 2 2 1 1 2 3 3 1 3 0 1 1 1 6 1 4 1
#> [254] 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1
#> [277] 1 2 2 0 1 1 1 1 1 1 0 2 1 2 2 3 3 1 2 2 1 0 0
#> [300] 3 0 2 2 1 2 2 3 2 3 2 2 3 3 2 2 2 5 2 1 2 1 1
#> [323] 1 0 2 3 3 2 3 2 3 4 1 1 1 1 1 2 1 1 1 1 2 3 0
#> [346] 3 2 2 1 1 2 1 1 2 1 0 1 3 3 5 9 -2 1 3 1 3 1 3
#> [369] 1 3 3 2 2 2 2 2 2 1 2 2 2 2 1 1 2 2 2 1 2 1 2
#> [392] 1 1 2 2 2 2 2 3 3 2 2 1 1 2 2 2 2 2 2 1 2 1 2
#> [415] 1 1 1 1 1 1 1 1 0 2 2 3 1 1 2 1 2 1 3 1 2 1 1
#> [438] 2 3 1 1 1 1 2 1 2 1 1 1 2 2 2 2 1 2 1 1 1 1 2
#> [461] 2 1 3 1 1 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 2 3 4
#> [484] 1 1 2 1 1 2 1 2 3 1 1 1 1 2 1 2 1 1 1 1 1 1 1
#> [507] 1 Created on 2019-01-05 by the reprex package (v0.2.1) |
I'm not sure how pushshift.io works, but at the time of scraping, almost all of the comments had votes. It must be that pushshift caches things right away, but waits before updating. Also, I got RedditExtractoR working again, I think I was just banned, but not too sure, as there is poor documentation, and error handling. I will update the issue there now. I'm glad this project exists in case the other one goes down. I'm scraping comments for sentiment analysis, and my filters are based on the upvote percentage, and overall upvote percentage of the sub (I manually figure this out for each sub). |
I added a function
get_comment_reddit()
based onget_r_reddit()
that returns comments instead of submission.