## UDA Assignment 3 - Crowdsourced Recommender System
Created by: Aman Bhardwaj, Blake DeLong, Apurva Harsulkar, Colby Meline, Joel Nail, and Rahul Rangarao

In [43]:
import numpy as np
import pandas as pd
import math
import nltk
nltk.download('punkt')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.sentiment import SentimentIntensityAnalyzer

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/amanbhardwaj/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


### Creating a dataframe from our scraped data

In [56]:
# creating a dataframe from the review file and dropping any reviews with NA values
reviews = pd.read_csv('all_posts_V3.csv')
reviews.dropna(inplace=True)
reviews.reset_index(drop=True, inplace=True)

# defining a list of English stop words to remove from our reviews
try:
    stop_words = set(stopwords.words('english'))
except:
    nltk.download('stopwords')
    stop_words = set(stopwords.words('english'))
reviews['Post Text']=reviews['Title']+ ' ' + reviews['Post Text']
reviews.head()

Unnamed: 0.1,Unnamed: 0,Title,Post Text,ID,Score,Total Comments,Post URL
0,119,What’s up with scammers at campus lately?,What’s up with scammers at campus lately? Yest...,z88joo,36,1,https://www.reddit.com/r/UTAustin/comments/z88...
1,121,The new course evaluation surveys have no ques...,The new course evaluation surveys have no ques...,z90zby,34,2,https://www.reddit.com/r/UTAustin/comments/z90...
2,122,Anyone know where to find the routes and times...,Anyone know where to find the routes and times...,yx8h48,33,17,https://www.reddit.com/r/UTAustin/comments/yx8...
3,123,How do you feel about peoples feet up in class...,How do you feel about peoples feet up in class...,ykepni,34,8,https://www.reddit.com/r/UTAustin/comments/yke...
4,125,Why do dorms outlaw space heaters?,Why do dorms outlaw space heaters? Are space h...,yuwl13,33,23,https://www.reddit.com/r/UTAustin/comments/yuw...


In [57]:
# creating a dataframe from the review file and dropping any reviews with NA values
commentReviews = pd.read_csv('all_comments_v3.csv')
commentReviews.dropna(inplace=True)
commentReviews.reset_index(drop=True, inplace=True)

# defining a list of English stop words to remove from our reviews
try:
    stop_words = set(stopwords.words('english'))
except:
    nltk.download('stopwords')
    stop_words = set(stopwords.words('english'))

commentReviews.head()

Unnamed: 0.1,Unnamed: 0,Post ID,User,Body,Score
0,0,z88joo,samureiser,[Report 'em.](https://police.utexas.edu/servic...,28
1,1,yx574u,AHH-bbyshark,doesn’t everyone say the union smells like shi...,57
2,2,yx574u,federuiz22,The gender-neutral bathrooms in the San Jac lo...,30
3,3,yx574u,K3tchupm4n,3rd floor of the McCombs CBA building FOR SURE...,21
4,4,yx574u,greenieweenie714,On the floor you enter on in the tower there's...,9


### Sentiment Analysis

In [58]:
sentiment=SentimentIntensityAnalyzer()
# setting words that are associated with games to 0 so they don't influence sentiment scores
#sentiment.lexicon.update({'combat':0})

# function to get sentiment for a review - we take the average of each sentence's sentiment to improve performance
def get_sent(text):
    scores=[]
    text=str(text).lower()
    clean_text=[word for word in text.split() if word not in stop_words]
    clean_text=' '.join(e for e in clean_text)
    for sent in sent_tokenize(text):
        scores.append(sentiment.polarity_scores(sent)['compound'])

    return np.mean(scores)

In [59]:
reviews['sentiment']=reviews['Post Text'].apply(lambda txt: get_sent(txt)) # calculating sentiment for each review

In [60]:
reviews.head()

Unnamed: 0.1,Unnamed: 0,Title,Post Text,ID,Score,Total Comments,Post URL,sentiment
0,119,What’s up with scammers at campus lately?,What’s up with scammers at campus lately? Yest...,z88joo,36,1,https://www.reddit.com/r/UTAustin/comments/z88...,0.21075
1,121,The new course evaluation surveys have no ques...,The new course evaluation surveys have no ques...,z90zby,34,2,https://www.reddit.com/r/UTAustin/comments/z90...,-0.1779
2,122,Anyone know where to find the routes and times...,Anyone know where to find the routes and times...,yx8h48,33,17,https://www.reddit.com/r/UTAustin/comments/yx8...,0.403633
3,123,How do you feel about peoples feet up in class...,How do you feel about peoples feet up in class...,ykepni,34,8,https://www.reddit.com/r/UTAustin/comments/yke...,0.031038
4,125,Why do dorms outlaw space heaters?,Why do dorms outlaw space heaters? Are space h...,yuwl13,33,23,https://www.reddit.com/r/UTAustin/comments/yuw...,-0.056


In [61]:
reviews.sort_values(by='sentiment',ascending=True)

Unnamed: 0.1,Unnamed: 0,Title,Post Text,ID,Score,Total Comments,Post URL,sentiment
1076,2180,College is nothing like Pitch Perfect,College is nothing like Pitch Perfect I watche...,ylo17m,284,18,https://www.reddit.com/r/UTAustin/comments/ylo...,-0.9393
917,1625,"senior wasting my life away, looking for advice","senior wasting my life away, looking for advic...",rbjli2,42,6,https://www.reddit.com/r/UTAustin/comments/rbj...,-0.8960
717,1323,human feces on the ut west mall station NB,human feces on the ut west mall station NB the...,xoxkir,81,16,https://www.reddit.com/r/UTAustin/comments/xox...,-0.8748
1067,2139,Class of 23 Moment,Class of 23 Moment > Lose the most amount of c...,xif6tq,296,25,https://www.reddit.com/r/UTAustin/comments/xif...,-0.8343
1174,2557,Who is lighting cars on fire,Who is lighting cars on fire Seriously wtf is ...,d3fein,207,14,https://www.reddit.com/r/UTAustin/comments/d3f...,-0.7845
...,...,...,...,...,...,...,...,...
1161,2504,If anyone’s thinking about dropping a course b...,If anyone’s thinking about dropping a course b...,l2a704,217,29,https://www.reddit.com/r/UTAustin/comments/l2a...,0.8266
8,130,is there any reason why an RA would knock on s...,is there any reason why an RA would knock on s...,ytlm4e,33,5,https://www.reddit.com/r/UTAustin/comments/ytl...,0.8316
706,1311,Texas Women’s Basketball taking on defending N...,Texas Women’s Basketball taking on defending N...,tpngaz,80,3,https://www.reddit.com/r/UTAustin/comments/tpn...,0.8478
1131,2412,please for the love of god take a shower,please for the love of god take a shower thank...,yeyjei,232,29,https://www.reddit.com/r/UTAustin/comments/yey...,0.8779


In [68]:
commentReviews['comment_sentiment']=commentReviews['Body'].apply(lambda txt: get_sent(txt)) # calculating sentiment for each comment


In [71]:
commentsSentimentsGrouped=pd.DataFrame(commentReviews.groupby(['Post ID'])['comment_sentiment'].mean())
commentsSentimentsGrouped

Unnamed: 0_level_0,comment_sentiment
Post ID,Unnamed: 1_level_1
14pige,0.181725
198ml7,-0.006312
19evb1,0.129952
1tjlke,0.196120
25ua4e,0.155255
...,...
za00tv,0.231100
za07sy,-0.029425
za5suy,0.047033
za670h,0.099593


In [73]:
combined_df = reviews.set_index('ID').join(commentsSentimentsGrouped)
combined_df

Unnamed: 0_level_0,Unnamed: 0,Title,Post Text,Score,Total Comments,Post URL,sentiment,comment_sentiment
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
z88joo,119,What’s up with scammers at campus lately?,What’s up with scammers at campus lately? Yest...,36,1,https://www.reddit.com/r/UTAustin/comments/z88...,0.210750,-0.246950
z90zby,121,The new course evaluation surveys have no ques...,The new course evaluation surveys have no ques...,34,2,https://www.reddit.com/r/UTAustin/comments/z90...,-0.177900,0.077200
yx8h48,122,Anyone know where to find the routes and times...,Anyone know where to find the routes and times...,33,17,https://www.reddit.com/r/UTAustin/comments/yx8...,0.403633,0.256528
ykepni,123,How do you feel about peoples feet up in class...,How do you feel about peoples feet up in class...,34,8,https://www.reddit.com/r/UTAustin/comments/yke...,0.031038,0.069318
yuwl13,125,Why do dorms outlaw space heaters?,Why do dorms outlaw space heaters? Are space h...,33,23,https://www.reddit.com/r/UTAustin/comments/yuw...,-0.056000,0.114292
...,...,...,...,...,...,...,...,...
8z2rxm,2732,"Quadruple majoring in computer science, electr...","Quadruple majoring in computer science, electr...",178,28,https://www.reddit.com/r/UTAustin/comments/8z2...,0.218950,0.062553
vzw4am,2734,Here is a list of study spots on and off campu...,Here is a list of study spots on and off campu...,182,12,https://www.reddit.com/r/UTAustin/comments/vzw...,0.304860,0.275163
ngt6ha,2736,Horrible grading policy: 98.8% still A-,Horrible grading policy: 98.8% still A- I made...,180,63,https://www.reddit.com/r/UTAustin/comments/ngt...,0.019450,0.104415
wyqhkh,2741,You survived your first week - well done!,You survived your first week - well done! That...,178,14,https://www.reddit.com/r/UTAustin/comments/wyq...,0.345000,0.188991


Unnamed: 0_level_0,comment_sentiment
Post ID,Unnamed: 1_level_1
z90zby,0.0772
