[banner]: b1.png
![banner]

# Welcome to Destiny Subreddit Analyzer

## How it works
Below you will find a few a few charts already displayed, with the rest hidden behind switches that allow you to toggle the informational charts on and off. This is done because some charts take a minute or longer to load, and its easier to load the ones you want to look at one at a time rather than all at once. Some of these charts will allow you to adjust their input using widgets; these widgets will display after you select to turn a chart "On".

For charts which allow you to change the input parameters, please be patient with it as the charts have to load again with the new parameters. It'll keep displaying the old chart until the new one is finished loading, at which point it will switch the chart to the new one. 

In [1]:
#imports
import json
import requests
import pandas as pd
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
import numpy as np
import plotly.express as px
import ipywidgets as widgets

In [180]:
#functions

def jprint(obj):
    # create a formatted string of the Python JSON object
    text = json.dumps(obj, sort_keys=True, indent=4)
    print(text)
    
def format_comment(text):
    l = text.split()
    n = len(l)
    i=1
    result = l[0]
    while i < n:
        result = result + ' ' + l[i]
        if (n % 20) == 0:
            result = result + '\n'
        i += 1
    return result

def get_top_5_submissions():
    #returns a list of dictionaries each reresenting 1 of the top 5 submissions from dtg from last day
    submission_url = "https://api.pushshift.io/reddit/search/submission/"
    paramdict = {'after': "1d", "sort_type":"num_comments","subreddit":"destinythegame","size":5, "sort":"desc"}
    response = requests.get(url=submission_url, params= paramdict).json()
    response_data = response["data"]
    results = []
    for x in response_data:
        temp = {}
        temp["title"]=x["title"]
        temp["author"]=x["author"]
        temp["url"] = x["url"]
        temp["num_comments"]=x["num_comments"]
        temp["id"]=x["id"]
        comment_json = requests.get('https://api.pushshift.io/reddit/submission/comment_ids/{}'.format(x["id"])).json()
        comment_array = comment_json['data']
        temp['comment_ids']=comment_array
        results.append(temp)
    return results

def get_submissions(**kwrgs):
    #returns a list of dictionaries each reresenting 1 of the top 5 submissions from dtg from last day
    submission_url = "https://api.pushshift.io/reddit/search/submission/"
    response = requests.get(url=submission_url, params=kwargs).json()
    response_data = response["data"]
    return response_data

def get_comments(**kwrgs):
    #returns a list of dictionaries each reresenting 1 of the top 5 submissions from dtg from last day
    submission_url = "https://api.pushshift.io/reddit/search/comment/"
    response = requests.get(url=submission_url, params=kwargs).json()
    response_data = response["data"]
    return response_data


def get_comments_from_ids(comment_id_array):
    #returns a list of comments in string form seperated by "," for a set of comment ids
    comment_url = "https://api.pushshift.io/reddit/search/comment/"
    params_dict = {"ids": ", ".join(str(x) for x in comment_id_array), "fields":"body"}
    result = requests.get(url=comment_url, params=params_dict).json()
    data = result['data']
    final_result  = []
    for x in data:
        final_result.append(x["body"])
    return final_result

def analyze_comment(comment):
    #returns a list containing a comment, its polarity, and subjectvity
    blob = TextBlob(comment)
    polarity, subjectivity = blob.sentiment
    return polarity, subjectivity

def daily_submissions():
    #returns a list of dictionaries each reresenting 1 of the top 5 submissions from dtg from last day
    submission_url = "https://api.pushshift.io/reddit/search/submission/"
    paramdict = {'after': "1d","subreddit":"destinythegame"}
    response = requests.get(url=submission_url, params= paramdict).json()
    response_data = response["data"]
    results = []
    for x in response_data:
        temp = {}
        temp["title"]=x["title"].strip()
        temp["author"]=x["author"].strip()
        temp["url"] = x["url"].strip()
        temp["num_comments"]=x["num_comments"]
        temp["id"]=x["id"].strip()
        temp['text'] = x['selftext'].strip()
        results.append(temp)
    return results

def text_splitter(text):
    l = text.split()
    n = len(l)
    i = 0
    x = 1
    results = {'results1':'', 'results2':'', 'results3':'', 'results4':''}
    while i < n:
        if (n- i) >= 20:
            r = 'results'+str(x)
            results[r]=' '.join(l[i:i+20])
            i = i+20
            x += 1
        else:
            r = 'results'+str(x)
            results[r]= ' '.join(l[i:i+(n-i)])
            break
    return results['results1'], results['results2'], results['results3'], results['results4']
         

def get_submissions_df(subreddit, **kwargs):
    #returns a list of dictionaries each reresenting 1 of the top 5 submissions from dtg from last day
    submission_url = "https://api.pushshift.io/reddit/search/submission/"
    kwargs['subreddit']=subreddit
    response = requests.get(url=submission_url, params=kwargs).json()
    response_data = response["data"]
    results = []
    for x in response_data:
        try:
            if isinstance(x['selftext'], str):
                temp = {}
                temp["title"]=x["title"].strip()
                temp["author"]=x["author"].strip()
                temp["url"] = x["url"].strip()
                temp["num_comments"]=x["num_comments"]
                temp["id"]=x["id"].strip()
                temp['text'] = x['selftext'].strip()
                temp['post_length'] = len(x['selftext'].split())
                temp['date'] = pd.to_datetime(x['created_utc'], unit='s')
                results.append(temp)
        except:
            pass
    df = pd.DataFrame.from_dict(results)
    df = df[df.text != '[removed]']
    df = df[df.post_length > 10]
    df['polarity'], df['subjectivity'] = zip(*df['text'].map(analyze_comment))
    df['sentiment'] = df.apply(lambda x: 'positive' if x['polarity'] > 0 else 'negative', axis=1)
    df['subreddit'] = subreddit
    return df

def get_submissions_df_handles_none(subreddit, **kwargs):
    #returns a list of dictionaries each reresenting 1 of the top 5 submissions from dtg from last day
    submission_url = "https://api.pushshift.io/reddit/search/submission/"
    kwargs['subreddit']=subreddit
    response = requests.get(url=submission_url, params=kwargs).json()
    response_data = response["data"]
    if len(response_data) < 1:
        print(response_data)
        return None
    results = []
    for x in response_data:
        try:
            if isinstance(x['selftext'], str):
                temp = {}
                temp["title"]=x["title"].strip()
                temp["author"]=x["author"].strip()
                temp["url"] = x["url"].strip()
                temp["num_comments"]=x["num_comments"]
                temp["id"]=x["id"].strip()
                temp['text'] = x['selftext'].strip()
                temp['post_length'] = len(x['selftext'].split())
                temp['date'] = pd.to_datetime(x['created_utc'], unit='s')
                results.append(temp)
        except:
            pass
    df = pd.DataFrame.from_dict(results)
    df = df[df.text != '[removed]']
    df = df[df.post_length > 10]
    df['polarity'], df['subjectivity'] = zip(*df['text'].map(analyze_comment))
    df['sentiment'] = df.apply(lambda x: 'positive' if x['polarity'] > 0 else 'negative', axis=1)
    df['subreddit'] = subreddit
    return df

def wipe_then_display_out(fig):
    #requires a figure input
    out.clear_output()
    with out:
        fig.show()

In [182]:
df = get_submissions_df("destinythegame", after = "1d", size = 500)
df2 = get_submissions_df("destiny2", after = "1d", size = 500)
#dfraid = get_submissions_df("raidsecrets", after = "1d", size = 500)
#dfcrucible = get_submissions_df("crucibleplaybook", after = "1d", size = 500)

df3 = df.append(df2)

fig1 = px.scatter(df, 
                  x='date',
                  y='polarity',
                 hover_data=["author", "id"],
                marginal_x='histogram',
               color_discrete_sequence=["violet", "lightslategrey"], # colors to use
               color="sentiment", # what should the color depend on?
               size="num_comments", # the more votes, the bigger the circle
               #size_max=10, # not too big
               labels={"polarity": "Post Polarity", "date": "Date Posted"}, # axis names
               title=f"Sentiment analysis of r/destinythegame posts from previous 24 hours", # title of figure
 )

fig2 = px.scatter(df2, 
                  x='date',
                  y='polarity',
                 hover_data=["author", "id"],
                marginal_x='histogram',
               color_discrete_sequence=["violet", "lightslategrey"], # colors to use
               color="sentiment", # what should the color depend on?
               size="num_comments", # the more votes, the bigger the circle
               #size_max=10, # not too big
               labels={"polarity": "Post Polarity", "date": "Date Posted"}, # axis names
               title=f"Sentiment analysis of r/destiny2 posts from previous 24 hours", # title of figure
 )




fig3 = px.scatter(df3, 
                  x='date',
                  y='polarity',
                 hover_data=['subreddit', "author","url", "id"],
                marginal_x='histogram',
               color_discrete_sequence=["blue", "green"], # colors to use
               color="subreddit", # what should the color depend on?
               #size="num_comments", # the more votes, the bigger the circle
               #size_max=10, # not too big
               labels={"polarity": "Post Polarity", "date": "Date Posted"},
                  category_orders = {'subreddit': ['DestinyTheGame', 'Destiny2']},
                  facet_col = 'subreddit',
               title=f"r/destinythegame vs r/destiny2 posts from previous 24 hours", # title of figure
 )


# bargraph showing percentage of posts postive and negative, objective and subjective, for dtg and destiny2
labels = ['+polarity %', '-polarity %', 'subjective %', 'objective %']
dtg0 = [len(df[df.polarity >= 0]) / len(df), len(df[df.polarity < 0])/len(df), len(df[df.subjectivity >= .5])/len(df),
      len(df[df.subjectivity < .5])/len(df)]
d20 = [len(df2[df2.polarity >= 0]) / len(df2), len(df2[df2.polarity < 0])/len(df2), len(df2[df2.subjectivity >= .5])/len(df2),
      len(df2[df2.subjectivity < .5])/len(df2)]
dtg = dict(zip(labels, dtg0))
d2 = dict(zip(labels,d20))

labels2 = ['subreddit', 'rating', 'polarity/subjectvity', 'percent']
df4list = [dict(zip(labels2,['destinythegame', 'positive polarity', 'polarity', dtg0[0]])),
           dict(zip(labels2,['destinythegame', 'negative polarity', 'polarity', dtg0[1]])),
           dict(zip(labels2,['destinythegame', 'subjective', 'subjectivity', dtg0[2]])),
           dict(zip(labels2,['destinythegame', 'objective', 'subjectivity', dtg0[3]])),
           dict(zip(labels2,['destiny2', 'positive polarity', 'polarity', d20[0]])),
           dict(zip(labels2,['destiny2', 'negative polarity', 'polarity', d20[1]])),
           dict(zip(labels2,['destiny2', 'subjective', 'subjectivity', d20[2]])),
           dict(zip(labels2,['destiny2', 'objective', 'subjectivity', d20[3]]))
        ]


df4 = pd.DataFrame(df4list)
fig4 = px.bar(df4, x='polarity/subjectvity',
              y ='percent',
              color='rating',
              barmode='group',
              category_orders = {'subreddit': ['DestinyTheGame', 'Destiny2']},
              labels={'percent':'percent of total posts', 'polarity/subjectvity': 'sentiment analysis'},
              facet_col = 'subreddit',
             title='Comparison of post sentiment and subjectivity over last 24 hours for r/destinythegame vs r/destiny2')

fig1.show()
fig2.show()
fig3.show()
fig4.show()

## Sentiment Analysis of Posts for Last 30 Days of Selected Subreddit
**WARNING: THIS CHART TAKES A LONG TIME (2+ mins) TO LOAD FOR CERTAIN SUBREDDITS BUT ITS WORTH IT**

This shows all the posts from the last 30 days of a selected subreddit, and plots the
posts by their polarity. The size of the dots represents the number of comments on each post.
This way you can see how well postive and negative posts do at attracting commentors.

The bar chart looking thing on top is a histogram. It shows total number of positive posts versus total
number of negative posts for the section of time represented by the area directly under each bar.

In [179]:
#####OUTPUT 1#####
#functions---------------------
def sub_data(subreddit, days=30):
    i = 2
    j = 1
    df = get_submissions_df(subreddit, after = '1d', size = 500)
    while i <= days:
        temp = get_submissions_df(subreddit, after = ''+str(i)+'d',before=''+str(j)+'d', size = 500)
        df = df.append(temp)
        i += 1
        j += 1
    fig = px.scatter(df, 
                  x='date',
                  y='polarity',
                 hover_data=["author", "id"],
                marginal_x='histogram',
               color_discrete_sequence=["violet", "lightslategrey"], # colors to use
               color="sentiment", # what should the color depend on?
               size="num_comments", # the more votes, the bigger the circle
               #size_max=10, # not too big
               labels={"polarity": "Post Polarity", "date": "Date Posted"}, # axis names
               title=f"Sentiment analysis of r/"+subreddit+" "+"last " + str(days) + " days", # title of figure
                 )
    return fig.show()



def display_out1(x=False):
    if x:
        with out1:
            o = widgets.interactive(sub_data, subreddit=subreddit_select_widget, days=widgets.fixed(30))
            display(o)
    else:
        out1.clear_output()
        
#widgets------------------------
subreddit_select_widget = widgets.RadioButtons(
    options=['destinythegame', 'destiny2', 'lowsodiumdestiny', 'raidsecrets', 'crucibleplaybook'],
    value = 'destinythegame',
    description='subreddit',
    disabled=False)

out1 = widgets.Output(layout={'border': '4px solid black'})

wout1 = widgets.ToggleButtons( options=[('Off', False),('On',True)],
                          description='Load Chart?',
                          disabled=False,
                          button_style='success')


#display-------------------------
widgets.interact(display_out1, x=wout1)
display(out1)

interactive(children=(ToggleButtons(button_style='success', description='Load Chart?', options=(('Off', False)…

Output(layout=Layout(border='4px solid black'))

## Compare last 500 post from selected subreddits
Hit "On" to load the widget and select a different subreddit from either one of the
selection boxs to cause the chart to display.

In [175]:
#####Output 2#####

#functions---------------------
    
def compare_subreddit_last_500_sidebyside(one,two):
    count = 10
    dflist = []
    subs = [one,two]
    #get posts for subs
    for sub in subs:
        i = 1
        j = 0
        x = 0
        df = get_submissions_df(sub, after = '1d', size = 500)
        while df is None:
            i+=1
            j+=1
            df = get_submissions_df(sub, after = str(i)+'d', before=str(j)+'d', size = 500)
            print('df was none ' + str(sub) +' ' + 'i= '+ str(i) + ' j = '+str(j))
        x = len(df)
        
        if x > count:
            df = df[:500]
        else:
            while x < count and i < count:
                if i == count-1:
                    print("i got to count")
                    return None
                i+=1
                j+=1
                temp = get_submissions_df(sub, after = str(i)+'d', before=str(j)+'d', size = 500)
                while temp is None:
                    i+=1
                    j+=1
                    temp = get_submissions_df(sub, after = str(i)+'d', before=str(j)+'d', size = 500)
                size = len(temp)
                if (x + size) < count:
                    df.append(temp)
                    x += size
                else:
                    rest = count - x
                    df.append(temp[:rest])
                    x = 1000
        #get polarity for sub:
        measures = [len(df[df.polarity >= 0]) / len(df), len(df[df.polarity < 0])/len(df),
                    len(df[df.subjectivity >= .5])/len(df),
                    len(df[df.subjectivity < .5])/len(df)]
        labels = ['subreddit', 'rating', 'polarity/subjectvity', 'percent']
        templist = [dict(zip(labels,[sub, 'positive polarity', 'polarity', measures[0]])),
               dict(zip(labels,[sub, 'negative polarity', 'polarity', measures[1]])),
               dict(zip(labels,[sub, 'subjective', 'subjectivity', measures[2]])),
               dict(zip(labels,[sub, 'objective', 'subjectivity', measures[3]]))]
        for thing in templist:
            dflist.append(thing)
    #convert dataframe
    df = pd.DataFrame(dflist)
    fig = px.bar(df, x='polarity/subjectvity',
              y ='percent',
              color='rating',
              barmode='group',
              category_orders = {'subreddit': ['destinythegame','destiny2', 'lowsodiumdestiny',
                                               'raidsecrets', 'crucibleplaybook']},
              labels = {'percent':'percent of total posts', 'polarity/subjectvity': 'sentiment analysis'},
              facet_col = 'subreddit',
             title='Comparison of post sentiment and subjectivity over last 500 posts for various destiny subreddits')
    fig.show()
    

def display_out2(x=False):
    if x:
        with out2:
            o = widgets.interactive(compare_subreddit_last_500_sidebyside,
                                           one=toggle_subs1, two=toggle_subs2)
            display(o)
    else:
        out2.clear_output()

#widgets---------------------------------------------------
out2 = widgets.Output(layout={'border': '4px solid blue'})

boxv = widgets.VBox([toggle_subs1, toggle_subs2])

toggle_subs1 = widgets.Dropdown( options=[('DestinyTheGame', 'destinythegame'),
                                               ('Destiny2', 'destiny2'),
                                               ('LowSodiumDestiny', 'lowsodiumdestiny'),
                                              ('RaidSecrets', 'raidsecrets'),
                                              ('CruciblePlaybook', 'crucibleplaybook')],
                                value='destinythegame',
                          description='Subreddit 1',
                          disabled=False)

toggle_subs2 = widgets.Dropdown( options=[('DestinyTheGame', 'destinythegame'),
                                               ('Destiny2', 'destiny2'),
                                               ('LowSodiumDestiny', 'lowsodiumdestiny'),
                                              ('RaidSecrets', 'raidsecrets'),
                                              ('CruciblePlaybook', 'crucibleplaybook')],
                          description='Subreddit 2',
                            value = 'destiny2',
                          disabled=False)

wout2 = widgets.ToggleButtons( options=[('Off', False),('On',True)],
                          description='Load Chart?',
                          disabled=False,
                          button_style='success')

widgets.interact(display_out2, x=wout2)

display(out2)


interactive(children=(ToggleButtons(button_style='success', description='Load Chart?', options=(('Off', False)…

Output(layout=Layout(border='4px solid blue'))

# About

## Why?
Well I noticed we have all kinds of nice tools to analyze in-game stats, but I didn't know of any tools to analyze
the destiny community itself outside of the game. So I figured I'd play around with what I know and build up a set of
tools that can be used to examine different aspects of the community starting with the destiny subreddits. 

This is currently just a quick demo to get it up and running,  but I hope to soon (before the end of January) have features that allow for customized search and data analysis of different aspects of the subreddit. This will be my first time creating something like this so any feedback or feature requests would be appreciated.

## Things I'd like to add soon

- Charts showing # pvp-focused posts vs # pve focused posts over time
- User search to see stats about a users various posting habits, overall attitute (pos or negative), favorite curse words, favorite day to post, posts / comments per subreddit,  and whatever else people can think of that they might want
- Post features most likely to get a bungie response (pvp or pve focused, specific words, time of post, etc)
- Whatever else I can think up or is suggested to me

## How it works

This is all done using python and jupyter notebooks with Viola to create a dashboard from the notebook and
Binder to deploy it to the web using github.

### Libraries and Packages

- **numpy** - matrix creation
- **pandas** - data manipulation and cleaning
- **requests** - accessing apis
- **plotly** - creation of interactive plots
- **textblob** - sentiment analysis of posts and comments
- **voila** - creation of standalone web apps from jupyter notebooks [link](https://github.com/voila-dashboards/voila)
- **ipywidgets** - widget creation

### APIs
**pushshift** - accesing reddit data [link](https://github.com/pushshift/api)

### Servives
**binder** - turn github repo into interactive notebook environment [link](https://mybinder.org/)


## Contact Me

Send complaints and suggestions to destiny_sub_analyzer@gmx.com

In [None]:
#to-do : create a widget for each graph so that you can enter an id number and get the full text of the post or comment