# Static analysis

## Intro - importing

In [30]:
import pandas as pd
import pickle
import numpy as np

In [31]:
with open("data/comments_cleaned", 'rb') as file:
    comments = pickle.load(file)
    
with open("data/submissions_cleaned", 'rb') as file:
    submissions = pickle.load(file)

In [32]:
submissions

Unnamed: 0,id,url,permalink,author,created_utc,subreddit,subreddit_id,num_comments,score,over_18,distinguished,domain,stickied,locked,hide_score
0,0,http://www.ignorancedenied.com/viewthread.php?...,/r/reddit.com/comments/648oo/brain_disease_is_...,DITUS,1199145615,reddit.com,t5_6,1,0,False,,ignorancedenied.com,False,False,False
1,1,http://www.flascience.org/wp/?p=363,/r/science/comments/648op/three_more_florida_c...,rmuser,1199145634,science,t5_mouw,5,20,False,,flascience.org,False,False,False
2,2,http://hosted.ap.org/dynamic/stories/O/ODD_SHO...,/r/reddit.com/comments/648or/nude_couple_grapp...,zorno,1199145709,reddit.com,t5_6,1,3,False,,hosted.ap.org,False,False,False
3,3,http://www.sltrib.com/opinion/ci_7846101?sourc...,/r/politics/comments/648os/apparently_bushs_pr...,rmuser,1199145735,politics,t5_2cneq,2,0,False,,sltrib.com,False,False,False
4,4,http://hosted.ap.org/dynamic/stories/O/ODD_RAR...,/r/reddit.com/comments/648ot/diners_find_rare_...,zorno,1199145735,reddit.com,t5_6,0,0,False,,hosted.ap.org,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2044805,2044805,http://ventaboutsports.blogspot.com/2008/12/so...,/r/funny/comments/7mq3n/some_extremely_corny_j...,themightymidget,1230767909,funny,t5_2qh33,0,1,False,,ventaboutsports.blogspot.com,False,False,False
2044806,2044806,http://www.pbs.org/mormons/etc/genealogy.html,/r/news/comments/7mq3o/pbs_looks_at_the_massiv...,Tom22,1230767926,news,t5_2qh3l,0,0,False,,pbs.org,False,False,False
2044807,2044807,http://www.narutogames.biz,/r/reddit.com/comments/7mq3q/naruto_games/,bixiebix,1230767937,reddit.com,t5_6,7,1,False,,narutogames.biz,False,False,False
2044808,2044808,http://www.youtube.com/watch?v=gdQH1CI4LHY&amp...,/r/politics/comments/7mq3r/ron_paul_on_recent_...,middkidd,1230767963,politics,t5_2cneq,3,1,False,,youtube.com,False,False,False


In [33]:
comments

Unnamed: 0,id,author,link_id,parent_id,created_utc,subreddit,subreddit_id,score,distinguished,gilded,controversiality
0,0,Haven,t3_648oh,t1_c02s9rv,1199145604,reddit.com,t5_6,4,,0,0
1,1,lilmiss2,t3_648oh,t1_c02s9rv,1199145620,reddit.com,t5_6,2,,0,0
2,2,EverybodysAnAsshole,t3_648et,t1_c02s976,1199145644,reddit.com,t5_6,2,,0,0
3,3,generalk,t3_647yd,t1_c02s8md,1199145647,programming,t5_2fwo,13,,0,0
4,4,seeker135,t3_6483n,t3_6483n,1199145650,politics,t5_2cneq,4,,0,0
...,...,...,...,...,...,...,...,...,...,...,...
4873684,4873684,CommodoreGuff,t3_7k1l5,t1_c06vpzj,1229579674,programming,t5_2fwo,1,,0,0
4873685,4873685,wolfzero,t3_7k4if,t1_c06vs7l,1229579675,technology,t5_2qh16,4,,0,0
4873686,4873686,Morgin_Black,t3_7k3w5,t3_7k3w5,1229579679,comics,t5_2qh0s,0,,0,0
4873687,4873687,onezerozeroone,t3_7k2bc,t1_c06vrvz,1229579685,atheism,t5_2qh2p,1,,0,0


## Actual start of the analysis

### Q1: How many unique subreddits occur? Which has the most comments, and which has the most active users?

I will divide this into 3 questions:

    1. Need to count submissions and comments with unique subreddits 
    2. Need to group and count subreddits by comments
    3. Need to group and count subreddits by users

> Could have used *subreddit_ids* but the name is also unqiue!        


In [34]:
subreddits_authors = pd.concat([submissions[['subreddit', 'author']], comments[['subreddit', 'author']]], ignore_index=True)
subreddits_authors = subreddits_authors.groupby('subreddit').agg({'author': "nunique"})
subreddits_authors = subreddits_authors.sort_values(by="author", ascending=False)
subreddits_authors

Unnamed: 0_level_0,author
subreddit,Unnamed: 1_level_1
reddit.com,163779
politics,38374
pics,29753
technology,28337
funny,28186
...,...
VGC,1
Vacations,1
Vanhomeless,1
VenezuelaReddit,1


Two birds one stone: 

In [35]:
print("Number of Subreddits:\n\n", subreddits_authors.shape[0])

Number of Subreddits:

 4359


In [36]:
print("Top 10 with most users:\n\n", subreddits_authors.iloc[:10])

Top 10 with most users:

                author
subreddit            
reddit.com     163779
politics        38374
pics            29753
technology      28337
funny           28186
entertainment   26360
science         25854
programming     25819
business        25253
worldnews       24937


In [37]:
comments_size = comments[['subreddit']].groupby('subreddit').size().reset_index(name='counts')
comments_size = comments_size.sort_values(by="counts", ascending=False)
comments_size

Unnamed: 0,subreddit,counts
1809,reddit.com,1143183
1755,politics,801396
1777,programming,345997
1741,pics,286192
1867,science,238291
...,...,...
1547,maculardegeneration,1
704,Shailendra,1
700,ServerSupport,1
1551,makhanhar,1


In [38]:
print("Subreddits with the most comments:\n\n", comments_size.iloc[:10])

Subreddits with the most comments:

         subreddit   counts
1809   reddit.com  1143183
1755     politics   801396
1777  programming   345997
1741         pics   286192
1867      science   238291
2144    worldnews   228793
836           WTF   187876
1305        funny   175547
1993   technology   149803
73      AskReddit   139760


### Q2: Avrage number of users per subreddit?

In [39]:
print("Avrage user per subreddit:\n\n", round(subreddits_authors['author'].mean(), 5))

Avrage user per subreddit:

 148.66506


### Q3: Users with the most submissions, users with the most comments

In [50]:
authors_of_submissions = submissions[['author']].groupby('author').size().reset_index(name='counts')
authors_of_submissions = authors_of_submissions.sort_values(by="counts", ascending=False)
# authors_of_submissions

In [59]:
print("Users with the most submissions:\n\n", authors_of_submissions.iloc[:10])

Users with the most submissions:

                   author  counts
84823                gst   18870
141813             qgyh2   12238
147359            rmuser    9822
173691            twolf1    8597
13172   IAmperfectlyCalm    8308
141766         qazamisan    6927
54960          charlatan    5998
90683           igeldard    5373
130852          noname99    5334
64933       democracy101    5332


In [52]:
authors_of_comments = comments[['author']].groupby('author').size().reset_index(name='counts')
authors_of_comments = authors_of_comments.sort_values(by="counts", ascending=False)
# authors_of_comments

In [53]:
print("Users with the most submissions:\n\n", authors_of_comments.iloc[:10])

Users with the most submissions:

                  author  counts
12598   NoMoreNicksLeft   13480
56871        malcontent   12159
57872            matts2   11672
58883        mexicodoug    9169
650                7oby    9161
21027         aletoledo    8085
61554          mutatron    7771
65056         otakucode    7759
69965  redditcensoredme    7468
43578            h0dg3s    7439
