# Matching "routine" comments to the image poster
Because the comment describing the user's hair care routine had to be downloaded separately from the image posts, and because I downloaded all comments on each post that I scraped, I have to determine which comment is the one that corresponds to the user who uploaded the image.

In [66]:
import pandas as pd
import numpy as np

In [67]:
# Read in the master CSV file
curly_df = pd.read_csv('curlyhair.csv')

curly_df = curly_df[curly_df['flair'].isin(['before and after', 'hair victory'])]
curly_df = curly_df[curly_df['n_comments'] != 0]
curly_df = curly_df.drop([ 'subreddit','flair'], axis=1)
curly_df.index = range(len(curly_df))

#curly_df.head(10)
len(curly_df)

11445

## Note about matching comments to posts:
The parent id of a comment can either be another comment id or the same as the link id. All top level comments will have a parent_id equal to the link id. All comments that are a reply to another comment will have a parent id that starts with t1 (because the direct parent of that comment is a comment and not the submission itself).

To summarize:

The link id of a comment will always be the parent submission.

The parent id of a comment will either be the submission (t3 object) or a comment (t1 object).

In [68]:
comm_df = pd.read_csv('comments_curlyhair.csv')
#comm_df = comm_df.drop(['parent_id'], axis=1)
#comm_df = comm_df[comm_df['is_subm'] == True]     # Sometimes this is nan, so not a good selector
#comm_df = comm_df[comm_df['parent_id'].str.contains('t3_')] # t1 comments are replies to comments
                                                            # t3 comments are replies to post
# top level comments will have a link id of 't3' + the sub id of the original post

comm_df = comm_df[comm_df['link_id'] == comm_df['parent_id']]


#mask = comm_df['text'].str.contains('routine', na=True, case=False)
#comm_df = comm_df[mask]
comm_df.index = range(len(comm_df))

comm_df.head(20)
print(len(comm_df))

302170


In [69]:
# Add new column for author's comment text
curly_df['comm_text'] = np.nan

# Find the matching comment and append it to the curly_df dataframe
for i, ids in enumerate(curly_df['sub_id']):

    temp = comm_df[comm_df['link_id'].str.contains('t3_'+ids)]
    temp = temp[temp['author'] == curly_df['author'].iloc[i]]
    
    if temp.empty != True:
        curly_df['comm_text'].iloc[i] = temp.text.values[0]

curly_df.head(25)

Unnamed: 0,sub_id,image_url,permalink,text,author,created,n_comments,comm_text
0,4lmx9r,http://imgur.com/a/Qo6NW,/r/curlyhair/comments/4lmx9r/before_and_after_...,Before and After Deva Cut!,moe-money,2016-05-29 19:55:49,3,Wasn't planning on doing a big chop for my fir...
1,4lwpbs,https://www.reddit.com/r/curlyhair/comments/4l...,/r/curlyhair/comments/4lwpbs/awesome_hair_day/,Awesome Hair Day!,MyMelancholyBaby,2016-05-31 14:27:18,0,
2,4lws1g,http://i.imgur.com/tZJAaPv.jpg,/r/curlyhair/comments/4lws1g/a_little_frizzy_t...,A little frizzy (the way I like it) but it's m...,leeloospanties,2016-05-31 14:41:16,12,
3,4lwvs7,http://imgur.com/gd5lj1b,/r/curlyhair/comments/4lwvs7/im_finally_lettin...,I'm finally letting my hair be curly and I lov...,phoeniix,2016-05-31 15:00:28,0,
4,4lx3cw,https://imgur.com/a/Ttc96,/r/curlyhair/comments/4lx3cw/thanks_to_those_o...,Thanks to those of you who recommended Kinky C...,brodyqat,2016-05-31 15:41:14,12,I was browsing /r/curlyhair recently as I was ...
5,4lmx9r,http://imgur.com/a/Qo6NW,/r/curlyhair/comments/4lmx9r/before_and_after_...,Before and After Deva Cut!,moe-money,2016-05-29 19:55:49,3,Wasn't planning on doing a big chop for my fir...
6,4lwpbs,https://www.reddit.com/r/curlyhair/comments/4l...,/r/curlyhair/comments/4lwpbs/awesome_hair_day/,Awesome Hair Day!,MyMelancholyBaby,2016-05-31 14:27:18,0,
7,4lws1g,http://i.imgur.com/tZJAaPv.jpg,/r/curlyhair/comments/4lws1g/a_little_frizzy_t...,A little frizzy (the way I like it) but it's m...,leeloospanties,2016-05-31 14:41:16,12,
8,4lwvs7,http://imgur.com/gd5lj1b,/r/curlyhair/comments/4lwvs7/im_finally_lettin...,I'm finally letting my hair be curly and I lov...,phoeniix,2016-05-31 15:00:28,0,
9,4lx3cw,https://imgur.com/a/Ttc96,/r/curlyhair/comments/4lx3cw/thanks_to_those_o...,Thanks to those of you who recommended Kinky C...,brodyqat,2016-05-31 15:41:14,12,I was browsing /r/curlyhair recently as I was ...


In [70]:
# Save the matched comments/posts to new file
curly_df.to_csv('matched_posts.csv', index=False)