# Project 3: 'AskFeminists' vs. 'MensRights'

## Part B. Data Cleaning

I kept the AskFeminists and MensRights dataframes separate during the cleaning process. For submissions, I combined 'title' and 'selftext' content into one 'text' column. I created columns to track posts/comments that were removed (by a moderator) or deleted (by a poster). AskFeminists had 1,847 removed posts/comments, about twice as many as MensRights. I removed links, numbers, characters, and references to '[removed]' or '[deleted]' from the text and lemmatized it (keeping stopwords in.) After lemmatizing, I removed contraction leftovers (ex. 'nt', 've') so they wouldn't skew the vectorizer counts.

In [1]:
# Import libaries
import pandas as pd
import regex as re
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import RegexpTokenizer

pd.set_option('display.max_colwidth', -1)
pd.options.display.max_columns = 999

In [2]:
# read in the two csvs
fem = pd.read_csv('./data/askfeminists_all')
men = pd.read_csv('./data/mensrights_all')

### Custom Columns

In [3]:
# drop unnamed column for both
fem.drop(columns='Unnamed: 0', inplace = True)
men.drop(columns='Unnamed: 0', inplace = True)

In [4]:
# making text column string
fem['text'] = fem['text'].astype('str')
men['text'] = men['text'].astype('str')

##### Making Columns for Removed/Deleted posts/comments

In [5]:
# making column for 'removed'
fem['removed'] = [1 if '[removed]' in i else 0 for i in fem['text']]
# making column for 'deleted'
fem['deleted'] = [1 if '[deleted]' in i else 0 for i in fem['text']]

In [6]:
# making column for 'removed'
men['removed'] = [1 if '[removed]' in i else 0 for i in men['text']]
# making column for 'deleted'
men['deleted'] = [1 if '[deleted]' in i else 0 for i in men['text']]

##### Removing [Removed]/[Deleted]

In [7]:
# for fem
fem['text'] = fem['text'].str.replace(r'[[][r][e][m][o][v][e][d][]]', '')
fem['text'] = fem['text'].str.replace(r'[[][d][e][l][e][t][e][d][]]', '')

In [8]:
# for men
men['text'] = men['text'].str.replace(r'[[][r][e][m][o][v][e][d][]]', '')
men['text'] = men['text'].str.replace(r'[[][d][e][l][e][t][e][d][]]', '')

#### Creating column for cleaned data

In [9]:
# loosely based on Matt's code to clean text
def review_to_words(raw_review):
    no_links = re.sub("(https.\S+)", " ", raw_review) # from Maurie, remove https links
    no_links_2 = re.sub("(http.\S+)", " ", no_links) # from Maurie, remove http links       
    letters_only = re.sub("[^a-zA-Z]", " ", no_links_2) # remove non-letters
    lower = letters_only.lower() # convert to lowercase
    return lower

In [10]:
# create column for clean text with stop words
fem['clean_text_stop'] = fem['text'].apply(lambda x: review_to_words(str(x)))
men['clean_text_stop'] = men['text'].apply(lambda x: review_to_words(str(x)))

### Checking Removed/Deleted

Checking how many posts/comments deleted or removed for each subreddit. AskFeminists has more removed content, suggesting more posters who violated the rules of the sub (either more trolling, or stricter moderators?)

In [11]:
# Checking how many removed/deleted posts in each
fem['removed'].value_counts()

0    27458
1    1847 
Name: removed, dtype: int64

In [12]:
men['removed'].value_counts()

0    30504
1    942  
Name: removed, dtype: int64

In [13]:
fem['deleted'].value_counts()

0    29193
1    112  
Name: deleted, dtype: int64

In [14]:
men['deleted'].value_counts()

0    30599
1    847  
Name: deleted, dtype: int64

### Lemmatizing Words

In [15]:
fem.head()

Unnamed: 0,text,type,subreddit,removed,deleted,clean_text_stop
0,"Should students still be shown Neil DeGrasse Tyson's videos?Hi, feminists. I'm a middle school science teacher. Astronomy is part of my curriculum. \n\nOver the past 10 years, Neil DeGrasse Tyson seems to have devoted his life to appearing in every damn astronomy video made. Like, he really is in all of them! And, for the most part, the videos are good and can be valuable to kids' education.\n\nHe was accused this past week, by several different women, of various forms of sexual misconduct. The accusations range from creepiness to rape.\n\nI've tried my best to be supportive of the metoo movement and all that's come with it. \n\nSo there's my dilemma. Should I continue to show videos to my students that are hosted by Neil DeGrasse Tyson?",post,AskFeminists,0,0,should students still be shown neil degrasse tyson s videos hi feminists i m a middle school science teacher astronomy is part of my curriculum over the past years neil degrasse tyson seems to have devoted his life to appearing in every damn astronomy video made like he really is in all of them and for the most part the videos are good and can be valuable to kids education he was accused this past week by several different women of various forms of sexual misconduct the accusations range from creepiness to rape i ve tried my best to be supportive of the metoo movement and all that s come with it so there s my dilemma should i continue to show videos to my students that are hosted by neil degrasse tyson
1,Does games like BFV and Call of Duty where there are female avatars promote/normalise violence towards women ?in the game women who get shot also cry in pain,post,AskFeminists,0,0,does games like bfv and call of duty where there are female avatars promote normalise violence towards women in the game women who get shot also cry in pain
2,What is Feminist political stance?,post,AskFeminists,1,0,what is feminist political stance
3,"What makes some victim blaming okay, and some not okay?For example if someone leaved their wallet on a bench, you'd victim blame them.",post,AskFeminists,0,0,what makes some victim blaming okay and some not okay for example if someone leaved their wallet on a bench you d victim blame them
4,how do you feel that you are sandwich makers and dishwashers?,post,AskFeminists,1,0,how do you feel that you are sandwich makers and dishwashers


In [16]:
# setting up tokenizer and lemmatizer
tokenizer = RegexpTokenizer(r'\w+')
lemmatizer = WordNetLemmatizer()

In [17]:
# function to lemmatize
def lemma(text):
    tokens = tokenizer.tokenize(str(text))
    lems = [lemmatizer.lemmatize(i) for i in tokens]
    
    return(" ".join(lems))

In [18]:
# add column for lemmatized words for fems
fem['lems'] = fem['clean_text_stop'].apply(lambda x: lemma(x))

In [19]:
# add column for lemmatized words for men
men['lems'] = men['clean_text_stop'].apply(lambda x: lemma(x))

##### Cleaning up contraction leftovers

In [20]:
# function to remove hanging contraction leftovers
def nocontract(x):
    x = re.sub("([ ][r][e][ ])", " ", x)
    x = re.sub("([ ][v][e][ ])", " ", x)
    x = re.sub("([ ][l][l][ ])", " ", x)
    x = re.sub("([ ][d][ ])", " ", x)
    x = re.sub("([ ][t][ ])", " ", x)
    x = re.sub("([ ][m][ ])", " ", x)
    x = re.sub("([ ][s][ ])", " ", x)
    return x

In [21]:
# applying no contractions function to both dfs
fem['lems'] = fem['lems'].apply(lambda x: nocontract(x))
men['lems'] = men['lems'].apply(lambda x: nocontract(x))

In [22]:
fem.head()

Unnamed: 0,text,type,subreddit,removed,deleted,clean_text_stop,lems
0,"Should students still be shown Neil DeGrasse Tyson's videos?Hi, feminists. I'm a middle school science teacher. Astronomy is part of my curriculum. \n\nOver the past 10 years, Neil DeGrasse Tyson seems to have devoted his life to appearing in every damn astronomy video made. Like, he really is in all of them! And, for the most part, the videos are good and can be valuable to kids' education.\n\nHe was accused this past week, by several different women, of various forms of sexual misconduct. The accusations range from creepiness to rape.\n\nI've tried my best to be supportive of the metoo movement and all that's come with it. \n\nSo there's my dilemma. Should I continue to show videos to my students that are hosted by Neil DeGrasse Tyson?",post,AskFeminists,0,0,should students still be shown neil degrasse tyson s videos hi feminists i m a middle school science teacher astronomy is part of my curriculum over the past years neil degrasse tyson seems to have devoted his life to appearing in every damn astronomy video made like he really is in all of them and for the most part the videos are good and can be valuable to kids education he was accused this past week by several different women of various forms of sexual misconduct the accusations range from creepiness to rape i ve tried my best to be supportive of the metoo movement and all that s come with it so there s my dilemma should i continue to show videos to my students that are hosted by neil degrasse tyson,should student still be shown neil degrasse tyson video hi feminist i a middle school science teacher astronomy is part of my curriculum over the past year neil degrasse tyson seems to have devoted his life to appearing in every damn astronomy video made like he really is in all of them and for the most part the video are good and can be valuable to kid education he wa accused this past week by several different woman of various form of sexual misconduct the accusation range from creepiness to rape i tried my best to be supportive of the metoo movement and all that come with it so there my dilemma should i continue to show video to my student that are hosted by neil degrasse tyson
1,Does games like BFV and Call of Duty where there are female avatars promote/normalise violence towards women ?in the game women who get shot also cry in pain,post,AskFeminists,0,0,does games like bfv and call of duty where there are female avatars promote normalise violence towards women in the game women who get shot also cry in pain,doe game like bfv and call of duty where there are female avatar promote normalise violence towards woman in the game woman who get shot also cry in pain
2,What is Feminist political stance?,post,AskFeminists,1,0,what is feminist political stance,what is feminist political stance
3,"What makes some victim blaming okay, and some not okay?For example if someone leaved their wallet on a bench, you'd victim blame them.",post,AskFeminists,0,0,what makes some victim blaming okay and some not okay for example if someone leaved their wallet on a bench you d victim blame them,what make some victim blaming okay and some not okay for example if someone leaved their wallet on a bench you victim blame them
4,how do you feel that you are sandwich makers and dishwashers?,post,AskFeminists,1,0,how do you feel that you are sandwich makers and dishwashers,how do you feel that you are sandwich maker and dishwasher


In [23]:
men.head()

Unnamed: 0,text,type,subreddit,removed,deleted,clean_text_stop,lems
0,"E-Trade Implies Support for Female SupremacyFrom [https://us.etrade.com/knowledge/thematic-investing/gender-diversity](https://us.etrade.com/knowledge/thematic-investing/gender-diversity). You may need an etrade account to see that page. \n\nFrom that page, emphasis mine:\n\n&gt;Looking to invest in ways that support the values of gender diversity and efforts to achieve it? The fund below invests in companies that are leaders in areas such as equitable hiring, equal pay, and the ***advancement of women into managerial, executive, and director positions***.\n\nScreenshot of the page:\n\n[https://imgur.com/a/etn0B3g](https://imgur.com/a/etn0B3g)\n\nThey are in my opinion making an argument for female supremacy by only emphasizing ""powerful"" jobs. If they were advocating for equal opportunity in all jobs then that would be OK. They aren't though, they are just saying, ""advance women into managerial, executive, and director positions.""",post,MensRights,0,0,e trade implies support for female supremacyfrom you may need an etrade account to see that page from that page emphasis mine gt looking to invest in ways that support the values of gender diversity and efforts to achieve it the fund below invests in companies that are leaders in areas such as equitable hiring equal pay and the advancement of women into managerial executive and director positions screenshot of the page they are in my opinion making an argument for female supremacy by only emphasizing powerful jobs if they were advocating for equal opportunity in all jobs then that would be ok they aren t though they are just saying advance women into managerial executive and director positions,e trade implies support for female supremacyfrom you may need an etrade account to see that page from that page emphasis mine gt looking to invest in way that support the value of gender diversity and effort to achieve it the fund below invests in company that are leader in area such a equitable hiring equal pay and the advancement of woman into managerial executive and director position screenshot of the page they are in my opinion making an argument for female supremacy by only emphasizing powerful job if they were advocating for equal opportunity in all job then that would be ok they aren though they are just saying advance woman into managerial executive and director position
1,Public comment period now open for proposed Title IX regulations - PLEASE send in comments on the .gov website!,post,MensRights,1,0,public comment period now open for proposed title ix regulations please send in comments on the gov website,public comment period now open for proposed title ix regulation please send in comment on the gov website
2,Question regarding your opinions on the punishment for false rape accusationsWhat do you guys think would be a fair punishment for a woman falsely accusing a man of rape?,post,MensRights,0,0,question regarding your opinions on the punishment for false rape accusationswhat do you guys think would be a fair punishment for a woman falsely accusing a man of rape,question regarding your opinion on the punishment for false rape accusationswhat do you guy think would be a fair punishment for a woman falsely accusing a man of rape
3,what really annoys me.I'm a female abuse victim and I've had woman get mad when I tell them that not all men are trash. I have met one of the trashiest men in the world (aka my abuser) and not for one second did I believe all men are trash. \nby Brain can't even begin to think how you can find all men trash. \n\n,post,MensRights,0,0,what really annoys me i m a female abuse victim and i ve had woman get mad when i tell them that not all men are trash i have met one of the trashiest men in the world aka my abuser and not for one second did i believe all men are trash by brain can t even begin to think how you can find all men trash,what really annoys me i a female abuse victim and i had woman get mad when i tell them that not all men are trash i have met one of the trashiest men in the world aka my abuser and not for one second did i believe all men are trash by brain can even begin to think how you can find all men trash
4,New Proposed Title IX Guidelines - Comment period now open,post,MensRights,1,0,new proposed title ix guidelines comment period now open,new proposed title ix guideline comment period now open


In [24]:
men.sort_values(by='lems', ascending=True).head()

Unnamed: 0,text,type,subreddit,removed,deleted,clean_text_stop,lems
10470,,comment,MensRights,1,0,,
23597,😂😂,comment,MensRights,0,0,,
29057,,comment,MensRights,0,1,,
16355,,comment,MensRights,0,1,,
20359,https://www.health.harvard.edu/newsletter_article/marriage-and-mens-health,comment,MensRights,0,0,,


In [25]:
fem.sort_values(by='lems', ascending=True).head()

Unnamed: 0,text,type,subreddit,removed,deleted,clean_text_stop,lems
5098,,comment,AskFeminists,1,0,,
29065,,comment,AskFeminists,1,0,,
5366,,comment,AskFeminists,1,0,,
26686,,comment,AskFeminists,1,0,,
23315,,comment,AskFeminists,1,0,,


##### Saving to csv

In [26]:
men.to_csv('./data/men_clean_lem')

In [27]:
fem.to_csv('./data/fem_clean_lem')