### free-marketing-watch
Search social media for mentions of brands and collect the comments/tweets/etc.
Count mentions of each and perform sentiment analysis on the strings.
- [x] Start with r/malefashionadvice and fashion brands dataframe.
- [ ] Add matches of brands in brand column.

In [1]:
import praw
import pandas as pd
from secrets import *
from pathlib import Path
from brands import fashion

In [2]:
reddit = praw.Reddit(client_id=client_id,
               client_secret=client_secret,
               user_agent=user_agent)

Now to get the comments data, put it in a dataframe, and clean the data to get what we want.

In [3]:
def create_comments_df(subreddit_):
    """Returns a pandas df with the information about comments from this year.

    Inputs
    -----
    str: subreddit to be searched.
    Return
    ------
    Pandas dataframe with all the data from the praw object.m
    """
    subreddit = reddit.subreddit("malefashionadvice")
    submission_list = subreddit.top(
        time_filter="year", limit=1000
    )  # generator of submissions in the subreddit
    comment_list = []
    for submission in submission_list:
        submission.comments.replace_more(limit=0)
        for comment in submission.comments.list():
            comment_list.append(comment)

    df = pd.DataFrame([vars(comment) for comment in comment_list])
    df2 = df.loc[:,['link_id','id','score','body']]
    df2['Subreddit'] = subreddit
    return df2


In [4]:
def brand_check(df,brandlist):
    """Checks comment body against a list of brands to see if it mentions any.
       Adds what brand was found if any in the brands column.
       
       Inputs
       ------
       Dataframe you will search over and a list of brands in a separate file. 
       Return
       ------
       Dataframe with column indicating what brand was found in the values.
       """
    for brand in brandlist:
        df['Brand'] = df.body.str.extract(pat=brandlist, expand=False)
    return df

### This takes a long time, probably around 30 minutes per 100,000 comments.

In [43]:
df = create_comments_df('malefashionadvice')
df

Unnamed: 0,link_id,id,score,body,Subreddit
0,t3_ems9z1,fdqql3j,1160,"the warm weather ""jacket"" that's two separate ...",malefashionadvice
1,t3_ems9z1,fdqnsu8,103,Guess I need more knitwear,malefashionadvice
2,t3_ems9z1,fdqn7cd,1101,I don’t get people complaining that this is bo...,malefashionadvice
3,t3_ems9z1,fdr800k,68,"A fan of the freedom I see.. given the detail,...",malefashionadvice
4,t3_ems9z1,fdqmq0l,253,"My Version of the ""Basic Bastard"" Wardrobe but...",malefashionadvice
...,...,...,...,...,...
93827,t3_hhtglp,fwdy6x2,1,"""I love you enough to bite you playfully."" Oh,...",malefashionadvice
93828,t3_hhtglp,fwd94u0,0,Very cool I do all this already,malefashionadvice
93829,t3_hhtglp,fwdt931,2,Vogue Runaway wasn't it?,malefashionadvice
93830,t3_hhtglp,fwdu6v8,1,"Must be an apple exclusive, I'm not finding it...",malefashionadvice


Run to export the df to csv. Careful about overwriting. Use the mode = 'a' line to add to an existing file.


In [45]:
p = Path.cwd() / 'data' / 'commentdf.csv'
df.to_csv(path_or_buf = p)
#df.to_csv(path_or_buf = p, mode = 'a', header=False)

In [5]:
p = Path.cwd() / 'data' / 'commentdf.csv'
df = pd.read_csv(p)

In [6]:
#df['body'] = df['body'].str.lower()
df = df.drop(columns =['Unnamed: 0'])

In [7]:
df2 = brand_check(df,fashion)

In [12]:
df2['Brand'].value_counts()

Uniqlo             1134
Patagonia           505
H&M                 274
Gap                 249
Carhartt            244
Everlane            206
Brooks Brothers     197
Target              190
Zara                129
JCrew               128
Apple               109
Levis                55
Name: Brand, dtype: int64