# Project 3 - Reddit Web APIs & NLP Classification

I will use submission data from Reddit app, to build a classificiation model, predicting which of two subreddits, r/Anticonsumption and r/minimalism a given submission belong to, in order to improve campaign performance of environmental organizations.

# Part 1 - Reddit Data Collection

This notebook will focus on collecting posts from two subredits using the Python Reddit API Wrapper (PRAW). 
Subreddits are listed below:

1) **/r/Anticonsumption**  
189k members  
This subredit is primarily for criticizing, questioning, and discussing consumerism and current consumption standards. Consumerism, Planned Obsolescence, Economic Materialism, Inefficiency, Marketing, Advertising, and Branding, Sustainability, Exploitation, Conspicuous Consumption, Intellectual Property, etc.  

2) **/r/minimalism**  
381k members  
This subredit is primarily for sharing insights about minimalist lifestyle including decluttering of possessions and thoughts as well as minimal art, design, and music.

Before starting the project, my assumption about these two subreddits is that both topics are advising similar things such as reducing consumption, getting rid of possessions that we don't need, and live a more simple, less materialistic lifestyle. However, the difference between two is saving the world vs saving yourself. Anticonsumption is based on environment, stopping climate change, respecting nature, leaving a legacy to future generations; minimalism is more about living a quality lifestyle, getting rid of financial burdens, tuning out the noise in your life, living life based on experiences rather than worldly possessions.  
To summarize, anticonsumption is an activism, minimalism is a movement.

Let's see if my assumption is correct or these subreddits are about something else.

In [1]:
# Imports
import pandas as pd
import requests, json, time
from datetime import datetime

import praw

import os

In [2]:
# read credentials file

creds_file = open('creds.json', 'r')

# define credentials dictionary
# id, secret, username, password

reddit_creds = json.loads(creds_file.read())

Reddit credentials file contains a dictionary with key-value pairs of id, secret, username, and password obtained via Reddit user account. For further inmormation, please check README.md document.

In [3]:
# instantiate Reddit object using praw library

reddit = praw.Reddit(
    client_id     = reddit_creds['id'],
    client_secret = reddit_creds['secret'],
    username      = reddit_creds['user'],
    password      = reddit_creds['pass'],
    user_agent    = 'pink panther'
)

In [4]:
# get first subreddit "Anticomsumption"

anticonsumption = reddit.subreddit('Anticonsumption')

In [5]:
# check few attributes

print(f"Created UTC: {anticonsumption.created_utc} \nDescription: {anticonsumption.description.split('. ')[0]}")

Created UTC: 1253909397.0 
Description: Anticon is a sub for criticizing and questioning current consumption standards


In [6]:
# get second subreddit "minimalism"

minimalism = reddit.subreddit('minimalism')

In [7]:
# check few attributes

print(f"Created UTC: {minimalism.created_utc} \nDescription: {minimalism.description.split('.')[0]}")

Created UTC: 1245864140.0 
Description: ****

# For those who appreciate simplicity in any form


In [8]:
# create a list of newest 1000 Anticonsumption submissions

new_anticons = [sub for sub in anticonsumption.new(limit=1500)]
# hot_anticons = [sub for sub in anticonsumption.hot(limit=1500)]

In [9]:
# create a list of newest 1000 minimalism submissions

new_minimals = [sub for sub in minimalism.new(limit=1500)]
# hot_minimals = [sub for sub in minimalism.hot(limit=1500)]

In [10]:
# define a function to convert UTC to local date time format

def convert_utc_to_datetime(utc):
    return datetime.fromtimestamp(utc)

In [11]:
# create an Anticonsumption dictionary with column name (key) and list of values (value)
# columns: Subreddit, Title, Body, Created, Total Comments, Total Upvotes

anticon_dict = {
    'subreddit': [post.subreddit for post in new_anticons],
    'title': [post.title for post in new_anticons],
    'body': [post.selftext for post in new_anticons],
    'created': [convert_utc_to_datetime(post.created_utc) for post in new_anticons],
    'total_comments': [post.num_comments for post in new_anticons],
    'total_upvotes': [post.score for post in new_anticons]
    }

# convert anticon dictionary to dataframe

anticon_df = pd.DataFrame(anticon_dict)

In [12]:
# create a minimalism dictionary with column name (key) and list of values (value)
# columns: Subreddit, Title, Body, Created, Total Comments, Total Upvotes

minimal_dict = {
    'subreddit': [post.subreddit for post in new_minimals],
    'title': [post.title for post in new_minimals],
    'body': [post.selftext for post in new_minimals],
    'created': [convert_utc_to_datetime(post.created_utc) for post in new_minimals],
    'total_comments': [post.num_comments for post in new_minimals],
    'total_upvotes': [post.score for post in new_minimals]
}

# convert minimal dictionary to dataframe

minimal_df = pd.DataFrame(minimal_dict)

In [13]:
# define a funtion to save csv files with timestamp for uniqueness

def create_csv(df, subreddit):
    timestamp = datetime.now().strftime('_%Y-%m-%d-%H-%M-%S')
#     os.mkdir('./csv_folder/' + timestamp)
    df.to_csv('./csv_folder/' + subreddit + timestamp + '.csv', index=False)

In [14]:
# save anticonsumption csv

create_csv(anticon_df, 'anticonsumption')

In [15]:
# save minimalism csv

create_csv(minimal_df, 'minimalism')