## Sample Reddit comments over time

Collect data to build a classifier that can identify toxic comments.

The script periodically scans comments in all posts within a list of subreddits.

Stores text comments with features:
- comment text
- comment ID#
- post ID#
- parent ID#
- user name
- sub name
- comment post time
- vote score list (5 min intervals)
- contoversial flag state

Note: PRAW install:
pip install praw

About upvote_ratio: "upvote_ratio is a separate API call. It's annoying when you want to get a list of posts, since it slows the process down 101 times"

calculating up and down votes:
- ups = int(round((ratio*submission.score)/(2*ratio - 1)) if ratio != 0.5 else round(submission.score/2))
- downs = ups - submission.score


See also Pushshift API:

- https://www.reddit.com/r/pushshift/comments/9l8n1i/new_to_pushshift_read_this_faq_etc/
- https://github.com/pushshift/api


## my login details

From Reddit app register:

personal use script id 7BHzw3jn54Hm7Q

secret = Qw9lMWDx99daGcJ1vX6xX_peL3c




In [4]:
# remove warnings
import warnings
warnings.filterwarnings('ignore')
# ---

import pandas as pd
import numpy as np
import datetime
import time
import csv

## create reddit instance and log in

In [5]:
import praw

reddit = praw.Reddit(client_id='7BHzw3jn54Hm7Q',
                     client_secret='Qw9lMWDx99daGcJ1vX6xX_peL3c',
                     password='buckeroo',
                     user_agent='testscript',
                     username='quill65')

print(reddit.user.me())


quill65


## Collecting posts and comments 

This cell collects all comments from 100 'top' sorted posts in a given list of subreddits. The collected comments include a number of features such as time of comment, user info, number of replies and voting score.

The comments are written to a CSV file.

NOTE 1: 2/19/19 replace_comments=0 (delete all morecomments) changed to =None (expand all morecomments). Earlier data sampled may not be the same.


In [None]:
header = ['comment_ID', 'sub_name','post_ID', 'parent_ID', 
          'time', 'age_re_post','age_re_now',
          'u_id', 'u_name', 'u_created', 'u_comment_karma', 'u_link_karma',
          'num_replies', 'controversy', 'score', 'text']

# give list of subreddit names to sample from
# subnames = ['politics', 'democrats', 'republicans']
subnames = ['politics']
# subnames = ['aww']
# subnames = ['photography']
#subnames = ['todayilearned']

# create output filename, appending unique time string
csvfilename = ('comment_sample_' + '_'.join(s for s in subnames) + 
    datetime.datetime.now().strftime('%y%m%d_%H%M%S') + '.csv')

# number of posts ('submissions') to sample from each subreddit
numsubs = 100

with open(csvfilename, 'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)          
    writer.writerow(header)

    # sample from each sub in sublist
    for subname in subnames:
        i = 1
        # sample all comments from each of numsubs posts
        for sub in reddit.subreddit(subnames[0]).top(limit=numsubs):
            print('post %d : %s'%(i,sub.title))
            i += 1
            # expand comment tree to include all comments 
            sub.comments.replace_more(limit=None)
            print(' ',len(sub.comments.list()),'comments...')
            for com in sub.comments.list():
                try:
                    if com.score_hidden != True:
                            text = com.body.encode().decode('ascii',errors='ignore')
                            writer.writerow([com.id, subname, sub.id, com.parent_id,
                                             com.created_utc, 
                                             com.created_utc-sub.created_utc,
                                             datetime.datetime.utcnow().timestamp() - com.created_utc,
                                             com.author.id, com.author.name,
                                             com.author.created_utc, 
                                             com.author.comment_karma,
                                             com.author.link_karma,
                                             len(com.replies.list()),
                                             com.controversiality,
                                             com.score,
                                             text])
                except:
                    print('**error:', com.body)

            # write this post's comments to disk
            csvfile.flush()
print('\n\ndone')



post 1 : Kim Davis, clerk who refused to sign marriage licenses for gay couples, loses to Democrat
  2581 comments...
**writerow error A lot of people seem to forget she was a Democrat when the story first broke.

[Source](https://www.google.ca/amp/s/amp.cnn.com/cnn/2015/09/26/politics/kim-davis-no-longer-democrat/index.html)
**writerow error Kim Davis got that Stephen A Smith hairline
**writerow error looks like such a bs retarx, no such thing as w/l tho
**writerow error [deleted]
**writerow error [deleted]
**writerow error Kim Davis is ugly
**writerow error [deleted]
**writerow error [deleted]
**writerow error [deleted]
**writerow error [deleted]
**writerow error Golly, never saw that coming, huh?
**writerow error Did anyone think she was actually gonna win lol
**writerow error [removed]
**writerow error Good fuck that fucking bitch
**writerow error [deleted]
**writerow error Wow she must be pissed! Losing is bad enough, but to a Democrat?  Lolz. Thx, Lady Karma.
**writerow error Com

**writerow error You should care about them being able to have the same rights as everyone else and not be subject to hate crimes. 
**writerow error >I really don’t care about the LGBTQ community. 

Bruh you’ve already identified yourself as a republican. We already know this dumbass. 
**writerow error Seriously each person running got like 5000 votes. It’s a county clerk
**writerow error Then these children need better supervision.
**writerow error [deleted]
**writerow error [deleted]
**writerow error My religion says I'm allowed to take things from the grocery store without paying and if you dont respect that then you're denying me my religious freedom! /s For real though, if you try to apply her logic to any other scenario she sounds like a nutcase.
**writerow error [deleted]
**writerow error [deleted]
**writerow error [removed]
**writerow error I like the way you roll
**writerow error Hey look! The morality police. To what do I owe the visitation of Gabriel the arc angel? 
**writer

**writerow error [deleted]
**writerow error "Find the leakers!"

"But Mr. President, you ARE the leakers!"
**writerow error [deleted]
**writerow error **Trump fires the Attorney General:**

*"Highly irregular but he's technically allowed to do that"*

**Trump fires the FBI Director:**

*"Highly irregular but that's technically within his purview"*

**Trump leaks classified intel to the Russians:**

*"Well, he just compromised foreign intelligence assets and damaged the credibility of America's intelligence services, but the president is technically allowed to declassify information so its not illegal"*

**Trump nukes the world:**

*"Highly irregular, but if we're speaking in strictly legal terms he's technically allowed to do that!"*

You have to draw the line somewhere. The legality of these acts is no longer the issue. The President could legally blow up the world; that doesn't mean he should. Trump has demonstrated that he is unfit to be President of the United States and needs to b

**writerow error It's treason then.
**writerow error [removed]
**writerow error [deleted]
**writerow error [deleted]
**writerow error Theory time:

1. The primary source is TASS, who from what I have read was in the room.
1. WaPo knows Russian journalists get killed so won't say that.
1. WaPo confirms with top intelligence officials. Uses them as the source.
1. TASS was there with the 2 ambassadors at the request of Putin.
1. Putin wanted to give Trump a reminder of their agreements.
1. Putin has TASS leak info to WaPo.
1. Now Trump either has to deny that he did talk classified intel with Russians, which now give Putin blackmail material.
1. Or he has to come out and admit that he talked about classified intel and know it will look bad.

Remember the WH was angry about those pictures.
**writerow error Seems like WaPo is baiting the WH, they've held back part of the story.
**writerow error I am a Trump supporter, and assuming that this is real, I am mad. 
**writerow error [deleted]
**w

**writerow error I want to hear McConnell try to spin this away from treason. 
**writerow error "You'd be in jail"

 - Donald Trump
**writerow error Republican Voters : Fake News

Republican Leaders : But who did the leaking? That's the important part. I am deeply troubled by his actions of clearly colluding and working with Russia, but I will still continue to support the president everyday in hopes that he will change.
**writerow error This is insane. Hey, GOP dick brains, time to do something for the country.
**writerow error [deleted]
**writerow error And now 100 days into his presidency, the White House is going to be operating with zero intelligence for however longer they stay active, because now for sure without a shadow of a doubt do all the intelligence agencies think he's compromised, even if the republicans get up there and bullshit, that's probably what's going on behind the scences. 
**writerow error [deleted]
**writerow error Somebody in the trump white house had to have

**writerow error Sitting on the west coast here I was thinking trump made it through the day not fucking something up....nope. Not even sure why I thought it was possible. 
**writerow error Yeah... don't do that
**writerow error [deleted]
**writerow error Its just his private company CEO mentality at play again..

These are his business partners, why can't they see the commercially sensitive material? 

F'en tool he is.

**writerow error Interesting coincidence (?): The Trump/Kislyak meeting took place on Wednesday. Thursday morning, Burr (Intelligence Committee Chair), and Warner (Intelligence Committee Vice-Chair) scrambled out of a televised committee hearing for a meeting with Rosenstein (Deputy AG) that they "couldn't push off." 

The meeting was pre-scheduled, but is often cancelled (three busy guys), but Rosenstein's office told the other two this one was mandatory, despite the televised committee meeting (which are usually not missed for political purposes).
**writerow error Th

**writerow error So what are grounds for impeachment? 
**writerow error [deleted]
**writerow error [deleted]
**writerow error [deleted]
**writerow error [deleted]
**writerow error [removed]
**writerow error Trumps a toothless moron, but he's no tyrant. Tyrants aren't met with univeral outcry, but thunderous applause. 


**writerow error [deleted]
**writerow error [deleted]
**writerow error *No one said classified info wasn't leaked, they said sources and methods weren't leaked, so the details gained from those sources and methods may have been leaked. *McMasters denied things that were not in the article.
**writerow error [deleted]
**writerow error His name was Seth Rich. 
**writerow error [deleted]
**writerow error [removed]
**writerow error [deleted]
**writerow error [deleted]
**writerow error [deleted]
**writerow error [deleted]
**writerow error So if the only people in the room were US Officials and the Russian foreign minister and Ambassador... who the hell leaked this information

**writerow error [deleted]
**writerow error [deleted]
**writerow error Fake news. Muh Russians 
**writerow error Our President is sharing information with a potential ally. Not a big deal. Trump has talked about wanting to work together to combat terrorism in the middle east. Can't work together if you don't help each other but i forgot RUSSIA IS THE BOOGEYMAN.
**writerow error [removed]
**writerow error [removed]
**writerow error since when do liberals care about classified information?  Oh i see...this is one of those things that liberals only care about when it makes the other side look bad.  Got it.  
**writerow error It's a shame we could've gotten a man as decent as Ted Cruz instead of this slimeball who's no better then the folks who try to sell you shitty overpriced stereo equipment for a "great deal" on the side of the road. 
**writerow error [deleted]
**writerow error [deleted]
**writerow error I've been reading all week that they were going to limit laptops on flights in Eur

**writerow error [deleted]
**writerow error I don't like these kinds of news.

1. I don't like Trump.
2. The source is, as always, random US official with no name, no info. Why would I believe WaPo? Why would I take what they say at face value just because they make these claims about Trump?

Excuse me for not believing everything US mainstream media says.
**writerow error [deleted]
**writerow error [deleted]
**writerow error [deleted]
**writerow error If only Reddit could have been this upset about Hillary rigging the primaries, getting Americans killed in Benghazi, selling Uranium to the Russians or revealing classified information of her own and then destroying all evidence.

If Hillary wasn't prosecuted for anything, Trump never will be. Liberals have a huge double standard going with this one. Good Luck!
**writerow error White house confirmed this is fake news.   Sad liberals and their fake news
**writerow error [deleted]
**writerow error ~~why are you hiding this thread from the 

**writerow error Wait ya mean our president that can be baited with simple humble-brags like "well we heard from one our more reliable sources about this situation and X Y, and Z happened.. but I'm sure you don't worry yourself with such things, considering your American-first rhetoric."

Trump: "oh let me tell you about this Intel that not even fucking the U.K. And France now about. It's from this guy that -----"
**writerow error If you're the president and you are not getting great intel, there is something SERIOUSLY fucked up with your intel. 
**writerow error I got the best intel folks
**writerow error "Nobody has better intel than me." -Trump
**writerow error Why do you get angry at what someone said he said? You dont even know you super smart cookie you
**writerow error [deleted]
**writerow error [Individuals who are "extremely careless" with classified information should be denied further access to such info.](https://twitter.com/SpeakerRyan/status/751106162574053376) - Paul Rya

**writerow error Eww Pence 🤢
**writerow error All they'd have to say is "you probably wouldn't know anything about this...but have you heard about ..." and he would jump on it just to prove he knows about everything. That's how much of a child out president is currently. 
**writerow error I doubt Lavrov had to do or say anything.
**writerow error This is why fucking nothing will happen. Because it's not illegal per se.
**writerow error No, he's really just deeply stupid and insecure.
**writerow error This is SO much bigger than the Comey firing. I have never seen such a big, juicy, nasty piece of news in my life. There is no going back after this.
**writerow error We already have a dead Navy Seal from one of his stunts.
**writerow error when you think "is this it" look at congress and realize they aint doing shit about it. 
**writerow error Yeah, I agree. I kind of want some of those things to be true but if he leaves office and doesn't brag about knowing this stuff within a few years 

**writerow error I think it's Jordan. 
**writerow error Somewhere towards the middle.

It's truly remarkable that the shaved orangutan could pay a ghostwriter anything when you think about it
**writerow error [deleted]
**writerow error [deleted]
**writerow error this could easily set a new record. i bet 300k+. just wait until it reaches the front page.

edit: strangely it doesnt appear on the frontpage even when not logged in
**writerow error It's at the top of /r/all. Of course it's gonna get brigaded.
**writerow error Jeffrey Lord is the biggest asshat on TV.
**writerow error [deleted]
**writerow error [deleted]
**writerow error [deleted]
**writerow error [deleted]
**writerow error It's like if your girlfriend accuses you of cheating on her and you respond, "This is ludicrous... At no point did I take any money out of your purse".

The fact that he won't actually address the claims being made pretty much guarantees that the story is true. 
**writerow error I don't know but please hel

**writerow error Let's hope we don't have to go there. 
**writerow error That's gross, goddamnit. 
**writerow error They did not say they were in the room. They said they received information from current and former officials. That means a current official could have spoken with a former official and confirmed the story with the reporter.
**writerow error Why are we questioning the source itself instead of questioning the source's validity? Who cares who it comes from if it's true?
**writerow error I don't think whoever is in charge of uap's would ever disclose to trump if there is anything to disclose.
**writerow error They're reporting that the White House denies the claims. Obviously Fox would never be caught reporting actual information that is negative about Republicans. They blindly toe the Republican line. 
**writerow error The GOP controls the house, the senate, the presidency, the Supreme Court, the majority of state legislatures and the majority of governorships... even if Tr

**writerow error Well considering that Trump just admitted to this on twitter....what's silly again?

Oh right. The fact that you drink in right wing propaganda like a stupid milkshake.
**writerow error Hypernormalisation 
**writerow error Because they are fucking desperate 
**writerow error mods work for the post. 
**writerow error This is the same source that said there was a basement at Planet Pizza
**writerow error No.

Impeachment is a political, not a legal process.

This wasn't illegal.

In full context it wasn't even necessarily bad to share with the Russians the existence of an isis plot to put a laptop bomb on a passenger plane.

So unless you think this not illegal and not necessarily bad thing is enough to cause the Republican majority in Congress to draw articles of impeachment, then no.
**writerow error Nope. Go back to Commiefornia
**writerow error No.

Let's use the rule of law to remove this traitor.
**writerow error Wait, so now we don't care about leaking classified 

**writerow error Because we have 1 named source vs 1000 unnamed / anonymous sources
**writerow error [deleted]
**writerow error [deleted]
**writerow error The "denials" released by the White House do not deny ANYWHERE that Trump released classified information. They deny that "sources and methods" were compromised, which is NOT what the WaPo story alleges at all. That is not the same thing, at all. "Sources and methods" /= classified info
**writerow error [deleted]
**writerow error wtf?
**writerow error [deleted]
**writerow error Nonsense. You can't claim that we're all "numb to all these emergencies" while simultaneously arguing that tons of people are having an outburst. Who are these people having an outburst if everybody is numb to the stories? 

Isn't it funny how this argument never gets raised against conservatives. They spent years claiming that Obama was a secret communist muslim and now if you go out to Trump country all the Trumplicans claim that this is true. So even lies e

**writerow error [deleted]
**writerow error As all reports are repeatedly stating, NOWHERE in that statement is a denial that Trump gave away information. "Sources and methods" /= classified info
**writerow error A big part of a classification is how it came to be known. Even the acknowledgement of information can be dangerous. Trump clearly has no concept of OPSEC
**writerow error "The story as reported is false." Means one detail was wrong, could be a misspelled name or anything, he doesn't say what was false.

"At no time were any intelligence sources or methods discussed." The WaPo article never said they were.

"And the President did not disclose any military operations that were not all ready known." Says nothing about intelligence operations.

"I was in the room, it didn't happen." Define it, he denied things that were never claimed in the article.

**writerow error Even though everything he refuted is refuted by WaPo and he didn't even mention the issue of sharing shared intell

**writerow error If only there was an article or something!
**writerow error the Clinton Foundation aka Murder Incorporated. 
**writerow error Trump has more chance of being impeached over eating too much ice cream.
**writerow error Along with [Reuters](http://www.reuters.com/article/us-usa-trump-russia-idUSKCN18B2MX) and the [New York Times](https://www.nytimes.com/2017/05/15/us/politics/trump-russia-classified-information-isis.html)?
**writerow error Why? Provide proof of when they've published false information without correction. Proof, not spin. Proof.
**writerow error It is in the article that it isn't illegal, what it is, is a further erosion of trust for our allies.
**writerow error It's over buddy, Republicans in Congress are turning on him now. Adios!  
**writerow error [This comment in a nutshell](https://en.wikipedia.org/wiki/False_equivalence)
**writerow error Leaking shit that an ally asked you not to?

Is that legit in your view?

And can I ask: is Russia a democracy?
**

In [None]:
# run this cell if you interrupt the kernel to close the open output file 
csvfile.close()