# Scraping Single Posts

We can access Reddit posts as JSONs. 

Example: https://www.reddit.com/r/bipolar/comments/1cpnhjf/bipolar_disorder_does_not_define_your_future/

Adding .json at the end of the post accesses it as a JSON.

Example: https://www.reddit.com/r/bipolar/comments/1cpnhjf/bipolar_disorder_does_not_define_your_future/.json

In [12]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from seaborn import set_style
set_style("whitegrid")

import requests

In [13]:
response = requests.get(url="https://www.reddit.com/r/bipolar/comments/1cpnhjf/bipolar_disorder_does_not_define_your_future/.json")
data = response.json()
data

[{'kind': 'Listing',
  'data': {'after': None,
   'dist': 1,
   'modhash': '',
   'geo_filter': '',
   'children': [{'kind': 't3',
     'data': {'approved_at_utc': None,
      'subreddit': 'bipolar',
      'selftext': 'I wish I had just two minutes to talk to my 15 year old self. When I was diagnosed with bipolar disorder 11 years ago I thought my life was over.  I barely remember my high school years because I was just so fucking unstable. The turning point in my life was being apart of an IOP program around 21 years old.  I was stabilized and remained stable through hard work and consistently asking for help when I needed it. None of what I‚Äôve accomplished would‚Äôve been possible without the love and support from my family and husband.  \nI want the world to know that you can live a happy successful high quality life despite having a high stigmatized disorder, like bipolar disorder.  \nTimes might be dark right now, but you‚Äôre 4.0 or whatever your light at the end of the tunnel 

In [4]:
df = pd.DataFrame(data[0]['data']['children'][0])
df.loc['selftext']['data']

'I wish I had just two minutes to talk to my 15 year old self. When I was diagnosed with bipolar disorder 11 years ago I thought my life was over.  I barely remember my high school years because I was just so fucking unstable. The turning point in my life was being apart of an IOP program around 21 years old.  I was stabilized and remained stable through hard work and consistently asking for help when I needed it. None of what I‚Äôve accomplished would‚Äôve been possible without the love and support from my family and husband.  \nI want the world to know that you can live a happy successful high quality life despite having a high stigmatized disorder, like bipolar disorder.  \nTimes might be dark right now, but you‚Äôre 4.0 or whatever your light at the end of the tunnel looks like‚Ä¶ it‚Äôs there and it exists.  Don‚Äôt give up.'

# Reddit API

To access the Reddit API, you need to complete the following steps:

1. Sign-up for or log into your Reddit account: https://www.reddit.com
2. Create an app to get a developer key: https://www.reddit.com/prefs/apps
    * The following has a nice walkthrough of how to get things set up from here: https://www.geeksforgeeks.org/scraping-reddit-using-python/
    * Another quick-start guide: https://praw.readthedocs.io/en/latest/getting_started/quick_start.html

In [5]:
# install the python reddit API wrapper if you haven't yet

#!pip install praw

In [14]:
# import Reddit API packages

import praw

In [15]:
# create a read-only Reddit instance

from my_praw_info import get_client_id, get_client_secret, get_user_agent

In [16]:
# create a Reddit read-only instance
reddit = praw.Reddit(client_id=get_client_id(),
                     client_secret=get_client_secret(),
                     user_agent=get_user_agent())

In [17]:
# Choose subreddit to scrape

subreddit = reddit.subreddit("BPD")

# Display the name of the Subreddit
print("Display Name:", subreddit.display_name)
 
# Display the title of the Subreddit
print("Title:", subreddit.title)
 
# Display the description of the Subreddit
print("Description:", subreddit.description)

Display Name: BPD
Title: Borderline Personality Disorder
Description: **If you are feeling suicidal, please call 911 or one of these hotlines:**

* [US Hotline (And Chat)](http://www.suicidepreventionlifeline.org/gethelp.aspx)
* International Hotlines: [1](http://www.iasp.info/resources/Crisis_Centres/) and [2](http://www.suicide.org/international-suicide-hotlines.html)


This is a place for those who have Borderline Personality Disorder, their family members and friends, and anyone else who is interested in learning more about it. We ask that you be kind, empathetic, respectful, and non-judgmental.  Language that dehumanizes, personal attacks, and trolling will not be tolerated. 

**[Please read our subreddit rules HERE before posting](https://www.reddit.com/r/BPD/about/rules/).** **Our rules and guidelines are discussed in more depth [in our wiki](https://www.reddit.com/r/BPD/wiki/index).**

#This is not the place to ask for a diagnosis for yourself or anyone else

Only a mental heal

#### Documentation

Attributes for subreddit submissions/posts: https://praw.readthedocs.io/en/stable/code_overview/models/submission.html

In [10]:
# Get new posts in subreddit

posts = subreddit.new() # I've found it maxes out at 100 posts in a single pull

In [18]:
# Create post dictionary

posts_dict = {'Title':[],'Post Text':[], 
              'Author Flair':[], 'Original Content':[],
              'ID':[], 'Score':[], 
              'Total Comments':[], 'Post URL':[]}

# Scrape submissions in posts

for post in posts:
    # Title of each post
    posts_dict["Title"].append(post.title)
     
    # Text inside a post
    posts_dict["Post Text"].append(post.selftext)

    # Author flair text - None if no flair
    posts_dict['Author Flair'].append(post.author_flair_text)

    # Check for original content
    posts_dict['Original Content'].append(post.is_original_content)
     
    # Unique ID of each post
    posts_dict["ID"].append(post.id)
     
    # The score of a post
    posts_dict["Score"].append(post.score)
     
    # Total number of comments inside the post
    posts_dict["Total Comments"].append(post.num_comments)
     
    # URL of each post
    posts_dict["Post URL"].append(post.url)

# Save the data in a dataframe

new_posts = pd.DataFrame(posts_dict)
new_posts


Unnamed: 0,Title,Post Text,Author Flair,Original Content,ID,Score,Total Comments,Post URL
0,My relationship with music,I've noticed that it really does affect my moo...,,False,1cudr5i,1,0,https://www.reddit.com/r/BPD/comments/1cudr5i/...
1,"If I Can‚Äôt Have Him, I‚Äôll Die",No I won‚Äôt but it feels like it. My best frien...,,False,1cudh5m,2,1,https://www.reddit.com/r/BPD/comments/1cudh5m/...
2,jealousy over celebrities,i struggle so much with jealousy i have seriou...,,False,1cud4ye,2,0,https://www.reddit.com/r/BPD/comments/1cud4ye/...
3,My fp isn't my bf,I feel guilty about my bf not being my favorit...,user has bpd,False,1cucttj,1,0,https://www.reddit.com/r/BPD/comments/1cucttj/...
4,Self sabotage and finances,I feel like I‚Äôve hit rock bottom with my self ...,,False,1cucms6,1,0,https://www.reddit.com/r/BPD/comments/1cucms6/...
...,...,...,...,...,...,...,...,...
95,What character do you relate to that doesn‚Äôt s...,"To elaborate more, I relate heavily to Rei Aya...",,False,1ctvy7z,4,7,https://www.reddit.com/r/BPD/comments/1ctvy7z/...
96,Just got DC with BPD?,Edit: Dxed. Didn't realize it was autocorrecte...,,False,1ctvu1r,1,1,https://www.reddit.com/r/BPD/comments/1ctvu1r/...
97,Where do I go to get diagnosed??,I‚Äôm been trying to find a place I could get a ...,,False,1ctv4qu,1,1,https://www.reddit.com/r/BPD/comments/1ctv4qu/...
98,does anyone else get hallucinations?,Ive never really had strong hallucinations bef...,,False,1ctuj2j,4,5,https://www.reddit.com/r/BPD/comments/1ctuj2j/...


In [19]:
new_posts.sort_values('Author Flair')

Unnamed: 0,Title,Post Text,Author Flair,Original Content,ID,Score,Total Comments,Post URL
3,My fp isn't my bf,I feel guilty about my bf not being my favorit...,user has bpd,False,1cucttj,1,0,https://www.reddit.com/r/BPD/comments/1cucttj/...
11,why does it feel so weird to meeee. üò≠,"Today, I started feeling like my husband was j...",user has bpd,False,1cuaxkf,1,0,https://www.reddit.com/r/BPD/comments/1cuaxkf/...
12,vent/rant/scream into the void,(not sure if this should be labeled nsfw? but ...,user has bpd,False,1cuaob1,1,1,https://www.reddit.com/r/BPD/comments/1cuaob1/...
19,Advice needed on not letting your frustration ...,One of the most constant criticisms I get from...,user has bpd,False,1cu9wj1,2,0,https://www.reddit.com/r/BPD/comments/1cu9wj1/...
24,my life is going to end soon,marking as venting but feel free to send suppo...,user has bpd,False,1cu9bpe,2,3,https://www.reddit.com/r/BPD/comments/1cu9bpe/...
...,...,...,...,...,...,...,...,...
95,What character do you relate to that doesn‚Äôt s...,"To elaborate more, I relate heavily to Rei Aya...",,False,1ctvy7z,4,7,https://www.reddit.com/r/BPD/comments/1ctvy7z/...
96,Just got DC with BPD?,Edit: Dxed. Didn't realize it was autocorrecte...,,False,1ctvu1r,1,1,https://www.reddit.com/r/BPD/comments/1ctvu1r/...
97,Where do I go to get diagnosed??,I‚Äôm been trying to find a place I could get a ...,,False,1ctv4qu,1,1,https://www.reddit.com/r/BPD/comments/1ctv4qu/...
98,does anyone else get hallucinations?,Ive never really had strong hallucinations bef...,,False,1ctuj2j,4,5,https://www.reddit.com/r/BPD/comments/1ctuj2j/...


In [20]:
# how many authors have flair?

new_posts['Author Flair'].value_counts()

Author Flair
user has bpd    10
Name: count, dtype: int64

In [23]:
# pull comments from a single post as well -- post 5 in data frame for example

from praw.models import MoreComments

submission = reddit.submission(url=new_posts['Post URL'][95])

comments = {'Comments':[]}

for comment in submission.comments:
    if type(comment) == MoreComments:
        continue
    comments['Comments'].append(comment.body)

comments_df = pd.DataFrame(comments)
comments_df



Unnamed: 0,Comments
0,Chihiro from Spirited Away
1,Kaneki Ken - Tokyo Ghoul.
2,Motoko Kusanagi - Ghost in the Shell
3,Sawako Kuronuma - Kimi Ni Todoke
4,niles crane!
5,Buddy from Buddy simulator 1984


In [24]:
new_posts['Post URL'][95]

'https://www.reddit.com/r/BPD/comments/1ctvy7z/what_character_do_you_relate_to_that_doesnt_seem/'

In [26]:
new_posts[new_posts['Post Text'].str.contains('treatment')]
new_posts[new_posts['Post Text'].str.contain('diagnosis')]

Unnamed: 0,Title,Post Text,Author Flair,Original Content,ID,Score,Total Comments,Post URL
38,Splitting cost me my relationship,I'll start this off with‚Äî I'm undiagnosed. How...,,False,1cu65xy,1,1,https://www.reddit.com/r/BPD/comments/1cu65xy/...
50,He makes suicide threats?!,Edit: Just found out he‚Äôs now in a relationshi...,,False,1cu2lkv,15,24,https://www.reddit.com/r/BPD/comments/1cu2lkv/...
59,i feel like life itself wants me to give up //...,"tw // rape , sexism , abuse , cancer , suicide...",,False,1cu05ii,2,2,https://www.reddit.com/r/BPD/comments/1cu05ii/...


In [None]:
# create a Reddit read-only instance
reddit = praw.Reddit(client_id=get_client_id(),
                     client_secret=get_client_secret(),
                     user_agent=get_user_agent())

In [27]:
# Choose subreddit to scrape

subreddit = reddit.subreddit("BPD")

# Get new posts in subreddit

posts = subreddit.new(limit=750) # I've found it maxes out at 100 posts in a single pull

# Create post dictionary

posts_dict = {'Title':[], 'Post Date':[],
              'Post Text':[], 'Author':[],
              'Author Flair':[], 'Original Content':[],
              'ID':[], 'Score':[], 
              'Total Comments':[], 'Post URL':[]}

# Scrape submissions in posts

for post in posts:
    # Title of each post
    posts_dict["Title"].append(post.title)

    # Date of past in Unix Time
    posts_dict['Post Date'].append(post.created_utc)
     
    # Text inside a post
    posts_dict["Post Text"].append(post.selftext)

    # Text inside a post
    posts_dict["Author"].append(post.author)

    # Author flair text - None if no flair
    posts_dict['Author Flair'].append(post.author_flair_text)

    # Check for original content
    posts_dict['Original Content'].append(post.is_original_content)
     
    # Unique ID of each post
    posts_dict["ID"].append(post.id)
     
    # The score of a post
    posts_dict["Score"].append(post.score)
     
    # Total number of comments inside the post
    posts_dict["Total Comments"].append(post.num_comments)
     
    # URL of each post
    posts_dict["Post URL"].append(post.url)

# Save the data in a dataframe

new_posts_1 = pd.DataFrame(posts_dict)
new_posts_1


Unnamed: 0,Title,Post Date,Post Text,Author,Author Flair,Original Content,ID,Score,Total Comments,Post URL
0,How do you just have sex?,1.715974e+09,I never understood how people can just have se...,getdemvitamins,,False,1cue1fb,1,0,https://www.reddit.com/r/BPD/comments/1cue1fb/...
1,Struggling and I don‚Äôt even know what I need o...,1.715974e+09,I went through a breakup I was not expecting a...,badpunsbin,,False,1cudvm3,1,1,https://www.reddit.com/r/BPD/comments/1cudvm3/...
2,Jumping from suicidal idealisation to health a...,1.715974e+09,I wondered if anyone else gets this? I've suff...,MannerGreedy6380,user has bpd,False,1cudvf9,1,0,https://www.reddit.com/r/BPD/comments/1cudvf9/...
3,My relationship with music,1.715974e+09,I've noticed that it really does affect my moo...,bluecoat99,,False,1cudr5i,1,0,https://www.reddit.com/r/BPD/comments/1cudr5i/...
4,"If I Can‚Äôt Have Him, I‚Äôll Die",1.715973e+09,No I won‚Äôt but it feels like it. My best frien...,Calm_Arrival5033,,False,1cudh5m,3,1,https://www.reddit.com/r/BPD/comments/1cudh5m/...
...,...,...,...,...,...,...,...,...,...,...
745,Severe Bad Self image tips ?,1.715561e+09,I have really bad self image since middle scho...,Either_Snow_5621,,False,1cqmlgm,5,1,https://www.reddit.com/r/BPD/comments/1cqmlgm/...
746,Broke it off with a whirlwind lover and feelin...,1.715561e+09,"Cw for sex, sex addiction, self-sabotage/harm,...",coomquing,,False,1cqmkb3,1,1,https://www.reddit.com/r/BPD/comments/1cqmkb3/...
747,I just did something so psychotic tw self harm,1.715561e+09,My partner and I had a disagreement on the wee...,sweetangel622,,False,1cqmg4p,1,1,https://www.reddit.com/r/BPD/comments/1cqmg4p/...
748,WHY AM I CLINICALLY DRAMATIC,1.715561e+09,I usually consider myself pretty self-aware an...,Orchid_Dull,,False,1cqmfrt,52,6,https://www.reddit.com/r/BPD/comments/1cqmfrt/...


In [28]:
new_posts_1.sort_values('Author Flair')

Unnamed: 0,Title,Post Date,Post Text,Author,Author Flair,Original Content,ID,Score,Total Comments,Post URL
420,Certain things making me feel invalid and unhe...,1.715739e+09,PwBpd here and I‚Äôm having a hard time wrapping...,TheoFtM98765,,False,1cs9x60,1,4,https://www.reddit.com/r/BPD/comments/1cs9x60/...
2,Jumping from suicidal idealisation to health a...,1.715974e+09,I wondered if anyone else gets this? I've suff...,MannerGreedy6380,user has bpd,False,1cudvf9,1,0,https://www.reddit.com/r/BPD/comments/1cudvf9/...
514,Advanced Dbt for impulsivity and addictions?,1.715693e+09,One of the more severe symptoms I experience i...,No-Protection3185,user has bpd,False,1crs6da,1,1,https://www.reddit.com/r/BPD/comments/1crs6da/...
508,school and motivation,1.715663e+09,how do i not stop doing everything. i‚Äôm actual...,troyfucktoy,user has bpd,False,1crk8n8,1,0,https://www.reddit.com/r/BPD/comments/1crk8n8/...
507,i am so selfish,1.715697e+09,"i am such a brat about everything, it‚Äôs sick. ...",killakittybaby,user has bpd,False,1crtdxh,12,1,https://www.reddit.com/r/BPD/comments/1crtdxh/...
...,...,...,...,...,...,...,...,...,...,...
745,Severe Bad Self image tips ?,1.715561e+09,I have really bad self image since middle scho...,Either_Snow_5621,,False,1cqmlgm,5,1,https://www.reddit.com/r/BPD/comments/1cqmlgm/...
746,Broke it off with a whirlwind lover and feelin...,1.715561e+09,"Cw for sex, sex addiction, self-sabotage/harm,...",coomquing,,False,1cqmkb3,1,1,https://www.reddit.com/r/BPD/comments/1cqmkb3/...
747,I just did something so psychotic tw self harm,1.715561e+09,My partner and I had a disagreement on the wee...,sweetangel622,,False,1cqmg4p,1,1,https://www.reddit.com/r/BPD/comments/1cqmg4p/...
748,WHY AM I CLINICALLY DRAMATIC,1.715561e+09,I usually consider myself pretty self-aware an...,Orchid_Dull,,False,1cqmfrt,52,6,https://www.reddit.com/r/BPD/comments/1cqmfrt/...


In [29]:
new_posts_1['Author Flair'].value_counts()

Author Flair
user has bpd                   79
user knows someone with bpd     4
user is curious about bpd       3
                                1
Name: count, dtype: int64

In [31]:
new_posts_1[new_posts_1['Post Text'].str.contains('treatment')]
new_posts_1[new_posts_1['Post Text'].str.contains('diagnosis')]

Unnamed: 0,Title,Post Date,Post Text,Author,Author Flair,Original Content,ID,Score,Total Comments,Post URL
12,University Pressure?,1715967000.0,I finished my degree before getting diagnosed ...,AOhasthingstoSayo,,False,1cub6oy,2,0,https://www.reddit.com/r/BPD/comments/1cub6oy/...
26,Everything is too much,1715951000.0,I (25F) have been diagnosed with BPD back in 2...,nowayyoutt,,False,1cu4m75,1,0,https://www.reddit.com/r/BPD/comments/1cu4m75/...
33,more info on EUPD/rant I suppose,1715960000.0,Pre warning I‚Äôm dyslexic sorry for the spellin...,AdProper4382,,False,1cu8f38,1,0,https://www.reddit.com/r/BPD/comments/1cu8f38/...
58,Question Reguarding BPD,1715938000.0,TLDR: can one with bpd maintain a CDL-A\n.\nHe...,FireyZeflo,,False,1cu0z3j,2,1,https://www.reddit.com/r/BPD/comments/1cu0z3j/...
99,Just got DC with BPD?,1715917000.0,Edit: Dxed. Didn't realize it was autocorrecte...,taki-noboru-desu,,False,1ctvu1r,1,1,https://www.reddit.com/r/BPD/comments/1ctvu1r/...
100,Where do I go to get diagnosed??,1715915000.0,I‚Äôm been trying to find a place I could get a ...,MoonAnimeBaby95,,False,1ctv4qu,1,1,https://www.reddit.com/r/BPD/comments/1ctv4qu/...
105,New here..,1715911000.0,Not asking for diagnosis\n \nI was sent over h...,Dry_Possible_1792,,False,1ctty4w,3,6,https://www.reddit.com/r/BPD/comments/1ctty4w/...
115,please help me.,1715904000.0,"hi, i am a 20 y/o female newly diagnosed with ...",applewasmyidea_,,False,1ctrlgq,11,14,https://www.reddit.com/r/BPD/comments/1ctrlgq/...
129,"I was just diagnosed, but I'm nothing like my ...",1715898000.0,I'm just trying to process all this. I was tol...,Aggravating_Yak9580,,False,1ctpq60,1,1,https://www.reddit.com/r/BPD/comments/1ctpq60/...
135,Do you ever feel like you may be autistic?,1715895000.0,(F20)Having BPD I 100% know how being neurodiv...,N0nameN0facejoedoe,,False,1ctomt5,190,96,https://www.reddit.com/r/BPD/comments/1ctomt5/...


In [33]:
# working on converting Unix Time to sortable datetime

from datetime import datetime 

print(datetime.utcfromtimestamp(new_posts_1['Post Date'][0]).year,
    datetime.utcfromtimestamp(new_posts_1['Post Date'][0]).month,
    datetime.utcfromtimestamp(new_posts_1['Post Date'][0]).day)

2024 5 17


In [35]:
# write to csv
new_posts_1.to_csv('test.csv')

### To-do

* Turn scraping into a function of (number of posts, category, time delay)
* Write DataFrame to .csv