<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# DSI-Project 3: Web-APIs-Classification : Depression Diagnosis based on Natural Language Processing 

---

### Problem Statement 

---

Depression (major depressive disorder) is a common and serious medical illness that negatively affects how you feel, the way you think and how you act. Fortunately, it is also treatable. Depression causes feelings of sadness and/or a loss of interest in activities you once enjoyed. It can lead to a variety of emotional and physical problems and can decrease your ability to function at work and at home. 

Felix Torres, a physician, reviewed that depression affects an estimated one in six people (16.6%) will experience depression at some time in their life, especially during the late teens to mid-20s.
[Source](https://psychiatry.org/patients-families/depression/what-is-depression) Therefore, people tend to seek a safe space to express their depression.

Reddit is a social news site where users create and share content. The site has communities called subreddits for different interests and any user can create a subreddit.
As users browse, they can choose to go to specific communities. Their front page features posts from all the communities they follow. They can also browse r/all, which draws popular posts from subreddits all over Reddit.

The users in these subreddits tend to be very supportive of those in their community. Because each subreddit has rules users need to abide by as well, moderators will ban or suspend anyone who tries to put others down. As a result, many of these support communities serve as safe spaces for those most vulnerable.[Source](https://www.internetmatters.org/hub/news-blogs/what-is-reddit-what-parents-need-to-know/)

However, expressing the depression of some people is not enough. The depression can develop into anxiety, self-harm, suicide, and creating violence in the society.

This leads our to the problem: **"How can we differentiate between depressed and normal people who want to get off their chest?"**

### Data Collection

---

In [4]:
# Import Packages
import pandas as pd
import numpy as np
import requests
import time

#### Pulling Data
---
Reddit's APIs can only pull 25 requests at once, in accordance with their limitations. I will therefore loop these pulls in order to obtain the following consecutive requests in an effort to increase efficiency.

In [5]:
# Create Scrapping function to pull subreddit posts
def get_data(urls, num_req):
             
    # Set header to prevent error 
    headers = {"User-agent" : "GA SG Bot 0.1"} 
             
    # Get posts as list of dictionaries, each containing data on one post
    posts = []          
    for url in urls: 
        after = None
             
        # Create a loop that does max 25 requests per pull    
        for i in range(num_req): 
            print(f"Page of number request:{i} of {url}")
            if after == None:
                params = {}
            else:
                params = {"after" : after}
            res = requests.get(url, params = params, headers=headers)
            if res.status_code == 200 :
                the_json = res.json()
                posts.extend(the_json["data"]["children"]) 
                after = the_json["data"]["after"]
            else:
                print(res.status_code)
                break
            time.sleep(1)
    
    return posts

In [6]:
# Scrape Offmychest and Depression posts from subreddits
# We want dataset of each subreddit about 1000 rows
# num_req 1000/25 = 40 
raw_data = get_data([
            "https://www.reddit.com/r/offmychest/.json",
            "https://www.reddit.com/r/depression/.json"], num_req=40)

Page of number request:0 of https://www.reddit.com/r/offmychest/.json
Page of number request:1 of https://www.reddit.com/r/offmychest/.json
Page of number request:2 of https://www.reddit.com/r/offmychest/.json
Page of number request:3 of https://www.reddit.com/r/offmychest/.json
Page of number request:4 of https://www.reddit.com/r/offmychest/.json
Page of number request:5 of https://www.reddit.com/r/offmychest/.json
Page of number request:6 of https://www.reddit.com/r/offmychest/.json
Page of number request:7 of https://www.reddit.com/r/offmychest/.json
Page of number request:8 of https://www.reddit.com/r/offmychest/.json
Page of number request:9 of https://www.reddit.com/r/offmychest/.json
Page of number request:10 of https://www.reddit.com/r/offmychest/.json
Page of number request:11 of https://www.reddit.com/r/offmychest/.json
Page of number request:12 of https://www.reddit.com/r/offmychest/.json
Page of number request:13 of https://www.reddit.com/r/offmychest/.json
Page of number r

In [7]:
# Create Dataframe of Data Scrapping function
# Select only keys title, selftext, subreddit
def get_dataframe(data):
    title = [] 
    selftext = [] 
    subreddit = [] 
    for i in range(len(data)):
        title.append(data[i]["data"]["title"])
        selftext.append(data[i]["data"]["selftext"])
        subreddit.append(data[i]["data"]["subreddit"])
        
    data = pd.DataFrame({
                        "title" : title,
                        'selftext': selftext,
                        'subreddit': subreddit,
                        })
    return data

In [8]:
# Call function get_dataframe
df = get_dataframe(raw_data)

# View
df.head()

Unnamed: 0,title,selftext,subreddit
0,We have persistent scammers preying on this co...,"Folks, a reminder that [Rule 3](/r/offmychest/...",offmychest
1,it's my friend's birthday,everyone is posting pictures with her. when sh...,offmychest
2,I slapped my boyfriend last night,At 12 am I was woken up by him playing music r...,offmychest
3,It is my birthday today,That is all its just my birthday :),offmychest
4,Parents: we (teachers) are not asking you to b...,Take away their phone. Lord knows that’s part ...,offmychest


In [11]:
# Save law data results from scrape api as csv. file
df.to_csv('../data/df_raw_o_d.csv', index=False)

---