## Exploratory Analysis of Skincare Subreddits

For this exploratory analysis, we are looking at skincare trends through reddit. Reddit has various skincare platforms, when searching "Skincare on Reddit" these are the top 3 subreddits: 

1. r/SkincareAddcition with 4.3m members
2. r/AsianBeauty with 2.9m members
3. r/30PlusSkinCare with 2.1m members

Additional, country specific subreddits: 
1. r/SkincareAddictionUK with 484k members
2. r/IndianSkincareAddicts with 242k members
3. r/AusSkincare with 177k members

There are other large subreddits such as:
1. r/Skincare_Addiction with 1.8 members
2. r/SkincareAddicts with 1m members
However, these two subreddits are likely spin-offs of r/SkincareAddition. While the rest of the subreddits target a nicher group (Asian brands, >30, UK, India, Aus), the demographic of these two would be similar to r/SkincareAddiction and thus will not be used. 

Exploratory data analysis will be on these three subreddits. We are going to first explore the top posts of each subreddits. 

1st = just a table to see the top 100 posts 
2nd = table to see how often certain ingredients have been referenced over the three subs
3rd = table to see top posts which have referenced these ingredients in title 

### Step 1: Importing necessary functions and setting up PRAW

In [5]:
import datetime
import praw 
import pandas as pd

client_id = '5KkxQHtUgHzvz6pPMTbvSw'
client_secret = 'mWXvZcxvcpyEheEt_gM_3ODTvOBw7g'
user_agent = 'cryinginpython98'

reddit = praw.Reddit(client_id=client_id,client_secret=client_secret,user_agent=user_agent)

#create a list of subreddits 
#create empty list for the posts 
#loop through - take title, body, upvotes, comment, created 

#### Step 1.1 Checking if reddit API Key is working
Output = true if working

In [10]:
print(reddit.read_only) #check if it is working, needs to output == True

True


### Step 2: Selecting Subreddits 
This exploratory analysis will be looking at 6 different skincare subreddits. As mentioned above, a general skincare subreddit (which is also the most popular), as well as more niche subreddits that are targetted at people who like specific brands (Asian Beauty), people from different countries (UK, Aus/NZ, India), or a different age group (>30).

In [15]:
subreddit1 = reddit.subreddit('SkincareAddiction')
subreddit2 = reddit.subreddit('AsianBeauty')
subreddit3 = reddit.subreddit('30PlusSkinCare')
subreddit4 = reddit.subreddit('SkincareAddictionUK')
subreddit5 = reddit.subreddit('IndianSkincareAddicts')
subreddit6 = reddit.subreddit('AusSkincare')
subreddits = [subreddit1,subreddit2,subreddit3,subreddit4,subreddit5,subreddit6]

for subreddit in subreddits:
    # Display the name of the Subreddit
    print("Display Name:", subreddit.display_name)
    # Display the title of the Subreddit
    print("Title:", subreddit.title)

Display Name: SkincareAddiction
Title: For anything and everything having to do with skincare!
Display Name: AsianBeauty
Title: AsianBeauty
Display Name: 30PlusSkinCare
Title: Skin care for people over 30
Display Name: SkincareAddictionUK
Title: A UK-centric skincare subreddit.
Display Name: IndianSkincareAddicts
Title: IndianSkincareAddicts
Display Name: AusSkincare
Title: Australian & New Zealand Skincare


### Step 3: Loading of Subreddit Data into DataFrame
Inital runs of this exploratory analysis only looked into the top posts, however, due to the casual nature of these forums, there are many joke (meme) posts. This was not as condusive to looking into the skincare side of the skincare subreddit. Hence, we are going to look at specific ingridents and skin concerns. Popular or trending skincare ingridents are identified with the a google, and a couple ingridents manually added based on own knowledge. Similarly, common skin concerns are identified with google.

#### Step 3.1 Defining Skin Concerns and Ingredients

In [43]:
ing = [
    'retinol', 'vitamin c', 'hyaluronic', 'niacinamide', 'salicylic',
    'benzoyl peroxide', 'glycerin', 'peptide', 'ceramide',
    'bakuchiol', 'vitamin e', 'glycolic', 'AHA', 'BHA', 'PHA', 
    'squalene', 'jojoba', 'azelaic', 'hydroquinone', 'lactic','SPF'
]
concerns = [
    'acne', 'dry', 'dull', 'redness', 'dark circles', 'eye bags', 
    'wrinkle', 'aging', 'uneven', 'rough', 'hyperpigmentation', 'sunscreen'
]

#### Step 3.1 Looping through the subreddits to extract the relevant data

In [45]:
info = []
for subreddit in subreddits:
    # Loop through the top 1000 posts
    for sub in subreddit.top(limit=1000):
        title = sub.title.lower()  # Make title lowercase for case-insensitive matching
        body = sub.selftext.lower()

        ing_pres = None
        concern_pres = None 
    
        for ingredient in ing:
            if ingredient in title or ingredient in body:
                ing_pres = ingredient
    
        for concern in concerns:
            if concern in title or concern in body:
                concern_pres = concern
    
        if ing_pres or concern_pres:
                sub_data = {
                    'subreddit': subreddit.display_name,
                    'title': sub.title,
                    'body': sub.selftext,
                    'upvotes': sub.score,
                    'num_comments': sub.num_comments,
                    'url': sub.url,
                    'ingredient' : ing_pres,
                    'concern': concern_pres
                }
                info.append(sub_data)

print(len(info))

2106


#### 3.3 Loading into DataFrame

In [47]:
df=pd.DataFrame(info)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2106 entries, 0 to 2105
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   subreddit     2106 non-null   object
 1   title         2106 non-null   object
 2   body          2106 non-null   object
 3   upvotes       2106 non-null   int64 
 4   num_comments  2106 non-null   int64 
 5   url           2106 non-null   object
 6   ingredient    717 non-null    object
 7   concern       1850 non-null   object
dtypes: int64(2), object(6)
memory usage: 131.8+ KB


In [51]:
df.head(10)

Unnamed: 0,subreddit,title,body,upvotes,num_comments,url,ingredient,concern
0,SkincareAddiction,Posted here over a month ago showing how [acne...,,17350,257,https://i.redd.it/dtm3c3p277z41.jpg,,acne
1,SkincareAddiction,[Anti-Aging] I may have used too much retinol ...,,15802,115,https://i.redd.it/r8g7c71mti3a1.jpg,retinol,sunscreen
2,SkincareAddiction,[Selfie] 2 year transformation and glow up. Cy...,,11640,287,https://i.redd.it/8l4z6jzyogb51.jpg,,acne
3,SkincareAddiction,[Personal] My Mother at 53 years old. She's th...,,11229,272,http://i.imgur.com/Ph4JiDD.jpg,,sunscreen
4,SkincareAddiction,"[B&A] [Selfie] 3 microneedling sessions, 1 las...",,11147,340,https://i.redd.it/ruermpk6cwg31.jpg,hyaluronic,
5,SkincareAddiction,[Before&After] Finding the right dermatologist...,,11143,484,https://www.reddit.com/gallery/nn7th5,,sunscreen
6,SkincareAddiction,[PSA] SKIN CARE FOR PROTESTERS,\nFOR PEPPER SPRAY: \n\n-Don’t touch the expos...,10867,333,https://www.reddit.com/r/SkincareAddiction/com...,,sunscreen
7,SkincareAddiction,Puberty is making [Acne] hit hard but we’re tr...,,10508,435,https://i.redd.it/o74bastn0iq41.jpg,,rough
8,SkincareAddiction,[B&A] I posted my acne scar treatment progress...,,10306,350,https://i.redd.it/g7dlglcqhn011.jpg,,acne
9,SkincareAddiction,[Acne] One year apart ✨,,9532,259,https://www.reddit.com/gallery/ltlqvl,,acne


In [53]:
df.tail(10)

Unnamed: 0,subreddit,title,body,upvotes,num_comments,url,ingredient,concern
2096,AusSkincare,Thoughts on reuseable silicone eye gels?,What the title says - I’ve been looking into g...,42,19,https://i.redd.it/oc22e1t7kjya1.jpg,,redness
2097,AusSkincare,I need help with blackheads,I'm currently using the hydro boost as my clea...,38,44,https://i.redd.it/0z85d7qo9jv91.jpg,,rough
2098,AusSkincare,Meccas Sunscreen serum ☀️,Has anyone tried this yet? It looks so interes...,43,9,https://i.redd.it/yp6u27kfaqj91.jpg,,sunscreen
2099,AusSkincare,We can now drop off all brands of empty beauty...,,40,2,https://i.redd.it/wfnxr6c4tdg91.png,,aging
2100,AusSkincare,Your favourite SPF50+ long lasting sunscreen,I’ve just got a job as a Traffic Controller wh...,41,21,https://www.reddit.com/r/AusSkincare/comments/...,,sunscreen
2101,AusSkincare,Whats the point of having actives in cleansers...,Unpopular opinion i know. I'm not new to skinc...,42,15,https://www.reddit.com/r/AusSkincare/comments/...,vitamin c,rough
2102,AusSkincare,Hey what’s everyone’s experience with these tw...,,43,72,https://www.reddit.com/gallery/jsqrds,,sunscreen
2103,AusSkincare,"Priceline 3 Day Sale (40% off Skincare, Suncar...",Priceline is doing another big 3 day sale! \n\...,45,57,https://www.reddit.com/r/AusSkincare/comments/...,jojoba,
2104,AusSkincare,PSA: Moo Goo has launched a 1% bakuchiol serum!,"So I was on Moo Goo's website, clicking around...",41,21,https://www.reddit.com/r/AusSkincare/comments/...,bakuchiol,
2105,AusSkincare,Best Of/ Holy Grail Products: EXFOLIANTS,Hi there and welcome to the Best Of/ Holy Grai...,42,19,https://www.reddit.com/r/AusSkincare/comments/...,,rough


#### Step 3.4 Export into CSV

In [56]:
df.to_csv('SkincareReddit.csv')