# Scraping Single Posts

We can access Reddit posts as JSONs. 

Example: https://www.reddit.com/r/bipolar/comments/1cpnhjf/bipolar_disorder_does_not_define_your_future/

Adding .json at the end of the post accesses it as a JSON.

Example: https://www.reddit.com/r/bipolar/comments/1cpnhjf/bipolar_disorder_does_not_define_your_future/.json

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from seaborn import set_style
set_style("whitegrid")

import requests

In [23]:
response = requests.get(url="https://www.reddit.com/r/bipolar/comments/1cpnhjf/bipolar_disorder_does_not_define_your_future/.json")
data = response.json()
data

[{'kind': 'Listing',
  'data': {'after': None,
   'dist': 1,
   'modhash': '',
   'geo_filter': '',
   'children': [{'kind': 't3',
     'data': {'approved_at_utc': None,
      'subreddit': 'bipolar',
      'selftext': 'I wish I had just two minutes to talk to my 15 year old self. When I was diagnosed with bipolar disorder 11 years ago I thought my life was over.  I barely remember my high school years because I was just so fucking unstable. The turning point in my life was being apart of an IOP program around 21 years old.  I was stabilized and remained stable through hard work and consistently asking for help when I needed it. None of what I’ve accomplished would’ve been possible without the love and support from my family and husband.  \nI want the world to know that you can live a happy successful high quality life despite having a high stigmatized disorder, like bipolar disorder.  \nTimes might be dark right now, but you’re 4.0 or whatever your light at the end of the tunnel looks 

In [77]:
df = pd.DataFrame(data[0]['data']['children'][0])
df.loc['selftext']['data']

'I wish I had just two minutes to talk to my 15 year old self. When I was diagnosed with bipolar disorder 11 years ago I thought my life was over.  I barely remember my high school years because I was just so fucking unstable. The turning point in my life was being apart of an IOP program around 21 years old.  I was stabilized and remained stable through hard work and consistently asking for help when I needed it. None of what I’ve accomplished would’ve been possible without the love and support from my family and husband.  \nI want the world to know that you can live a happy successful high quality life despite having a high stigmatized disorder, like bipolar disorder.  \nTimes might be dark right now, but you’re 4.0 or whatever your light at the end of the tunnel looks like… it’s there and it exists.  Don’t give up.'

# Reddit API

To access the Reddit API, you need to complete the following steps:

1. Sign-up for or log into your Reddit account: https://www.reddit.com
2. Create an app to get a developer key: https://www.reddit.com/prefs/apps
    * The following has a nice walkthrough of how to get things set up from here: https://www.geeksforgeeks.org/scraping-reddit-using-python/
    * Another quick-start guide: https://praw.readthedocs.io/en/latest/getting_started/quick_start.html

In [2]:
# install the python reddit API wrapper if you haven't yet

#!pip install praw

Collecting praw
  Downloading praw-7.7.1-py3-none-any.whl.metadata (9.8 kB)
Collecting prawcore<3,>=2.1 (from praw)
  Downloading prawcore-2.4.0-py3-none-any.whl.metadata (5.0 kB)
Collecting update-checker>=0.18 (from praw)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Downloading praw-7.7.1-py3-none-any.whl (191 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m191.0/191.0 kB[0m [31m753.7 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading prawcore-2.4.0-py3-none-any.whl (17 kB)
Downloading update_checker-0.18.0-py3-none-any.whl (7.0 kB)
Installing collected packages: update-checker, prawcore, praw
Successfully installed praw-7.7.1 prawcore-2.4.0 update-checker-0.18.0


In [8]:
# import Reddit API packages

import praw

In [17]:
# create a read-only Reddit instance

from my_praw_info import get_client_id, get_client_secret, get_user_agent

In [34]:
# create a Reddit read-only instance
reddit = praw.Reddit(client_id=get_client_id(),
                     client_secret=get_client_secret(),
                     user_agent=get_user_agent())

In [12]:
# Choose subreddit to scrape

subreddit = reddit.subreddit("BPD")

# Display the name of the Subreddit
print("Display Name:", subreddit.display_name)
 
# Display the title of the Subreddit
print("Title:", subreddit.title)
 
# Display the description of the Subreddit
print("Description:", subreddit.description)

Display Name: BPD
Title: Borderline Personality Disorder
Description: **If you are feeling suicidal, please call 911 or one of these hotlines:**

* [US Hotline (And Chat)](http://www.suicidepreventionlifeline.org/gethelp.aspx)
* International Hotlines: [1](http://www.iasp.info/resources/Crisis_Centres/) and [2](http://www.suicide.org/international-suicide-hotlines.html)


This is a place for those who have Borderline Personality Disorder, their family members and friends, and anyone else who is interested in learning more about it. We ask that you be kind, empathetic, respectful, and non-judgmental.  Language that dehumanizes, personal attacks, and trolling will not be tolerated. 

**[Please read our subreddit rules HERE before posting](https://www.reddit.com/r/BPD/about/rules/).** **Our rules and guidelines are discussed in more depth [in our wiki](https://www.reddit.com/r/BPD/wiki/index).**

#This is not the place to ask for a diagnosis for yourself or anyone else

Only a mental heal

#### Documentation

Attributes for subreddit submissions/posts: https://praw.readthedocs.io/en/stable/code_overview/models/submission.html

In [35]:
# Get new posts in subreddit

posts = subreddit.new() # I've found it maxes out at 100 posts in a single pull

In [36]:
# Create post dictionary

posts_dict = {'Title':[],'Post Text':[], 
              'Author Flair':[], 'Original Content':[],
              'ID':[], 'Score':[], 
              'Total Comments':[], 'Post URL':[]}

# Scrape submissions in posts

for post in posts:
    # Title of each post
    posts_dict["Title"].append(post.title)
     
    # Text inside a post
    posts_dict["Post Text"].append(post.selftext)

    # Author flair text - None if no flair
    posts_dict['Author Flair'].append(post.author_flair_text)

    # Check for original content
    posts_dict['Original Content'].append(post.is_original_content)
     
    # Unique ID of each post
    posts_dict["ID"].append(post.id)
     
    # The score of a post
    posts_dict["Score"].append(post.score)
     
    # Total number of comments inside the post
    posts_dict["Total Comments"].append(post.num_comments)
     
    # URL of each post
    posts_dict["Post URL"].append(post.url)

# Save the data in a dataframe

new_posts = pd.DataFrame(posts_dict)
new_posts


Unnamed: 0,Title,Post Text,Author Flair,Original Content,ID,Score,Total Comments,Post URL
0,Do yourself a favor and get off the internet,Title pretty much sums it up. This sub is fill...,,False,1cst5ax,1,0,https://www.reddit.com/r/BPD/comments/1cst5ax/...
1,Lost interested in sex now that you’re in a he...,So I’ve always been pretty sexually active and...,,False,1cst4sd,1,0,https://www.reddit.com/r/BPD/comments/1cst4sd/...
2,how do you guys switch your mindsets,I used to be amazing at being avoidant and sel...,,False,1cst3rm,1,0,https://www.reddit.com/r/BPD/comments/1cst3rm/...
3,Relationship Anxiety,"Okay so, I’m 19 going away to school next Sept...",,False,1csspxg,1,0,https://www.reddit.com/r/BPD/comments/1csspxg/...
4,I feel so empty Idek how it feels to be whole ...,"I feel empty as a water can in summer heat , t...",,False,1cssouc,1,0,https://www.reddit.com/r/BPD/comments/1cssouc/...
...,...,...,...,...,...,...,...,...
95,I have feelings for my best friend and I want ...,"I make friends, get close, most of the time gr...",,False,1cse7tr,1,3,https://www.reddit.com/r/BPD/comments/1cse7tr/...
96,Some days are so much harder than others,Its been two weeks since me and my boyfriend b...,,False,1csdwo3,4,1,https://www.reddit.com/r/BPD/comments/1csdwo3/...
97,How do you trust love again?,I am at a hopeless rock bottom right now and i...,,False,1cscrek,18,4,https://www.reddit.com/r/BPD/comments/1cscrek/...
98,How to get ex-boyfriend to stop?,"I am very exhausted, even more as is usuall. I...",,False,1cscr9m,45,21,https://www.reddit.com/r/BPD/comments/1cscr9m/...


In [37]:
new_posts.sort_values('Author Flair')

Unnamed: 0,Title,Post Text,Author Flair,Original Content,ID,Score,Total Comments,Post URL
5,Does anyone else enjoy driving? I love it.,"I can discover new places outside of the city,...",user has bpd,False,1css6c3,3,2,https://www.reddit.com/r/BPD/comments/1css6c3/...
7,F28 can't stop wishing my high school ex would...,I self sabotaged this relationship a few years...,user has bpd,False,1csrsnb,1,0,https://www.reddit.com/r/BPD/comments/1csrsnb/...
14,what to do is fp is suicidal and made up his mind,i met him 4 years ago online and he's been suc...,user has bpd,False,1csrkaa,1,0,https://www.reddit.com/r/BPD/comments/1csrkaa/...
29,I'm in love,I'm in love with my partner. I've been in a fe...,user has bpd,False,1cspyjl,4,2,https://www.reddit.com/r/BPD/comments/1cspyjl/...
30,How can I be positive about myself if I hate m...,I've been told by a few people that I need to ...,user has bpd,False,1csptbi,6,3,https://www.reddit.com/r/BPD/comments/1csptbi/...
...,...,...,...,...,...,...,...,...
93,How do I get a diagnosis for BPD or another di...,PS I live in Toronto.\n\nCould I just go to a ...,,False,1cser1t,2,0,https://www.reddit.com/r/BPD/comments/1cser1t/...
95,I have feelings for my best friend and I want ...,"I make friends, get close, most of the time gr...",,False,1cse7tr,1,3,https://www.reddit.com/r/BPD/comments/1cse7tr/...
96,Some days are so much harder than others,Its been two weeks since me and my boyfriend b...,,False,1csdwo3,4,1,https://www.reddit.com/r/BPD/comments/1csdwo3/...
97,How do you trust love again?,I am at a hopeless rock bottom right now and i...,,False,1cscrek,18,4,https://www.reddit.com/r/BPD/comments/1cscrek/...


In [38]:
# how many authors have flair?

new_posts['Author Flair'].value_counts()

Author Flair
user has bpd                   14
user knows someone with bpd     1
Name: count, dtype: int64

In [48]:
# pull comments from a single post as well -- post 5 in data frame for example

from praw.models import MoreComments

submission = reddit.submission(url=new_posts['Post URL'][5])

comments = {'Comments':[]}

for comment in submission.comments:
    if type(comment) == MoreComments:
        continue
    comments['Comments'].append(comment.body)

comments_df = pd.DataFrame(comments)
comments_df



Unnamed: 0,Comments
0,"That's awesome dude, I love driving too. Maybe..."
1,"Yes love driving alone with music, especially ..."
2,I drive for Uber so yeah. Getting paid to dri...
3,"I love driving, especially on the motorway on ..."


In [47]:
new_posts['Post URL'][5]

'https://www.reddit.com/r/BPD/comments/1css6c3/does_anyone_else_enjoy_driving_i_love_it/'

In [None]:
# create a Reddit read-only instance
reddit = praw.Reddit(client_id=get_client_id(),
                     client_secret=get_client_secret(),
                     user_agent=get_user_agent())

In [50]:
# Choose subreddit to scrape

subreddit = reddit.subreddit("BPD")

# Get new posts in subreddit

posts = subreddit.new(limit=500) # I've found it maxes out at 100 posts in a single pull

# Create post dictionary

posts_dict = {'Title':[], 'Post Date':[],
              'Post Text':[], 
              'Author Flair':[], 'Original Content':[],
              'ID':[], 'Score':[], 
              'Total Comments':[], 'Post URL':[]}

# Scrape submissions in posts

for post in posts:
    # Title of each post
    posts_dict["Title"].append(post.title)

    # Date of past in Unix Time
    posts_dict['Post Date'].append(post.created_utc)
     
    # Text inside a post
    posts_dict["Post Text"].append(post.selftext)

    # Author flair text - None if no flair
    posts_dict['Author Flair'].append(post.author_flair_text)

    # Check for original content
    posts_dict['Original Content'].append(post.is_original_content)
     
    # Unique ID of each post
    posts_dict["ID"].append(post.id)
     
    # The score of a post
    posts_dict["Score"].append(post.score)
     
    # Total number of comments inside the post
    posts_dict["Total Comments"].append(post.num_comments)
     
    # URL of each post
    posts_dict["Post URL"].append(post.url)

# Save the data in a dataframe

new_posts_1 = pd.DataFrame(posts_dict)
new_posts_1


Unnamed: 0,Title,Post Date,Post Text,Author Flair,Original Content,ID,Score,Total Comments,Post URL
0,My partner (21m) recently got diagnosed with b...,1.715812e+09,"I started dating my boyfriend a month ago, tho...",,False,1csxkls,1,0,https://www.reddit.com/r/BPD/comments/1csxkls/...
1,I don’t know…,1.715811e+09,I don’t know what’s wrong with me.. I need to ...,,False,1csxbod,1,0,https://www.reddit.com/r/BPD/comments/1csxbod/...
2,Help,1.715811e+09,Someone close to me has diagnosed BPD and just...,,False,1csx4ul,1,0,https://www.reddit.com/r/BPD/comments/1csx4ul/...
3,Should I end my friendship with my FP?,1.715811e+09,My best friend has been my favourite person fo...,,False,1csx41m,2,0,https://www.reddit.com/r/BPD/comments/1csx41m/...
4,Disappointed with my BPD support group,1.715810e+09,In February I joined a BPD support group for m...,,False,1cswxvp,2,1,https://www.reddit.com/r/BPD/comments/1cswxvp/...
...,...,...,...,...,...,...,...,...,...
495,Breakup advice?,1.715559e+09,Hi! I've never posted on reddit before and in ...,user has bpd,False,1cqm11w,2,0,https://www.reddit.com/r/BPD/comments/1cqm11w/...
496,I wish I had someone to hug and feel safe with...,1.715558e+09,I wish I had a person to love. I wish I could ...,,False,1cqli46,132,25,https://www.reddit.com/r/BPD/comments/1cqli46/...
497,The good bad and ugly. tell me it gets easier....,1.715558e+09,I was born in 1996. to a very unstable mother ...,,False,1cqlg9p,14,4,https://www.reddit.com/r/BPD/comments/1cqlg9p/...
498,Anyone else?,1.715557e+09,Has anyone else experienced this?\n\nI am spli...,,False,1cqlg1x,2,2,https://www.reddit.com/r/BPD/comments/1cqlg1x/...


In [52]:
new_posts_1.sort_values('Author Flair')

Unnamed: 0,Title,Post Date,Post Text,Author Flair,Original Content,ID,Score,Total Comments,Post URL
151,Certain things making me feel invalid and unhe...,1.715739e+09,PwBpd here and I’m having a hard time wrapping...,,False,1cs9x60,1,4,https://www.reddit.com/r/BPD/comments/1cs9x60/...
23,DAE fall ‘in love’ with almost everyone?,1.715805e+09,"Although I’m bisexual, I’ve always gravitated ...",user has bpd,False,1csusag,7,2,https://www.reddit.com/r/BPD/comments/1csusag/...
243,i am so selfish,1.715697e+09,"i am such a brat about everything, it’s sick. ...",user has bpd,False,1crtdxh,10,1,https://www.reddit.com/r/BPD/comments/1crtdxh/...
244,school and motivation,1.715663e+09,how do i not stop doing everything. i’m actual...,user has bpd,False,1crk8n8,1,0,https://www.reddit.com/r/BPD/comments/1crk8n8/...
250,Advanced Dbt for impulsivity and addictions?,1.715693e+09,One of the more severe symptoms I experience i...,user has bpd,False,1crs6da,1,1,https://www.reddit.com/r/BPD/comments/1crs6da/...
...,...,...,...,...,...,...,...,...,...
494,need some advice!!,1.715560e+09,uaggh i dont have the means to get diagnosed w...,,False,1cqm77d,2,0,https://www.reddit.com/r/BPD/comments/1cqm77d/...
496,I wish I had someone to hug and feel safe with...,1.715558e+09,I wish I had a person to love. I wish I could ...,,False,1cqli46,132,25,https://www.reddit.com/r/BPD/comments/1cqli46/...
497,The good bad and ugly. tell me it gets easier....,1.715558e+09,I was born in 1996. to a very unstable mother ...,,False,1cqlg9p,14,4,https://www.reddit.com/r/BPD/comments/1cqlg9p/...
498,Anyone else?,1.715557e+09,Has anyone else experienced this?\n\nI am spli...,,False,1cqlg1x,2,2,https://www.reddit.com/r/BPD/comments/1cqlg1x/...


In [53]:
new_posts_1['Author Flair'].value_counts()

Author Flair
user has bpd                   58
user knows someone with bpd     3
                                1
user is curious about bpd       1
Name: count, dtype: int64

In [1]:
# working on converting Unix Time to sortable datetime

from datetime import datetime 

datetime.utcfromtimestamp(new_posts_1['Post Date'][0]).day

NameError: name 'new_posts_1' is not defined

### To-do

* Turn scraping into a function of (number of posts, category, time delay)
* Write DataFrame to .csv