## Extract, Transform, Load 
This note book will be responsible for connecting to reddit api, extracting data, and storing it automatically. It will also use the python library, yfinance, to gather Yahoo Finance stock data. 

The goal is to extract data from the yfinance library, extract post content from reddit, automatically transform/clean the data and append it to a MongoDB database (via pymongo). 

Ultimately, this process has the potential to be automated.

In [146]:
# Import dependencies
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import pymongo
import requests
import json
import praw
from config import KEY, CLIENT_ID, PW

In [147]:
# Create variables for API credentials
client_id = CLIENT_ID
client_k =KEY
usr_agent = 'etlAPP'
username = 'joechancey11'
pw = PW

In [148]:
# Create object for PRAW login credentials
def reddit_request():
    reddit = praw.Reddit(client_id=client_id, client_secret=client_k, user_agent=usr_agent, username=username, password=pw)
    return reddit

In [149]:
# Make reddit equal to our object
reddit = reddit_request()

In [150]:
# Choose our subreddit - Can be swapped
subreddit = reddit.subreddit("wallstreetbets")

In [151]:
# # Skip this Cell - This is a sample search so that we can get keys and understand for Reddit API is giving back results. - PRAW makes this irrelevant. 
# first_search = subreddit.search("GME", limit=5, sort='top')
# # This is commented out due to the length of the response - Feel free to uncomment to view keys. As stated above: PRAW makes this irrelevant. 
# [vars(x) for x in first_search]

In [152]:
# Create an empty DataFrame to add our data
df = pd.DataFrame(columns=['Title', 'Date', 'Upvote Ratio', 'Total Comments'])
df

Unnamed: 0,Title,Date,Upvote Ratio,Total Comments


In [153]:
# Query Reddit API for submissions that include GME
for submission in subreddit.search("GME", limit=50):
    df = df.append({'Title': submission.title, 'Date': submission.created_utc, 'Upvote Ratio': submission.upvote_ratio, 'Total Comments': submission.num_comments}, ignore_index=True)
df

Unnamed: 0,Title,Date,Upvote Ratio,Total Comments
0,"Daily Popular Tickers Thread for September 16,...",1631790000.0,0.93,12391
1,"Daily Popular Tickers Thread for September 15,...",1631707000.0,0.92,7229
2,I just quit my job so that I could roll over m...,1630590000.0,0.82,2079
3,Today is the day. Over 2M in my favorite stock...,1631101000.0,0.89,1348
4,"Daily Popular Tickers Thread for September 20,...",1632132000.0,0.92,2139
5,GME GANG IS BACK,1629831000.0,0.85,1526
6,"Daily Popular Tickers Thread for September 21,...",1632218000.0,0.92,1780
7,"I made a lot of money on GME and quit my job, ...",1630343000.0,0.77,2961
8,My GME gain from Tuesday. Went all in with my ...,1629889000.0,0.85,1445
9,"Daily Popular Tickers Thread for September 07,...",1631016000.0,0.91,2882


In [164]:
# Ensure our DataFrame contains GME by dropping items that do not have GME in the title
non_gm = df.loc[:,~df.columns.str.contains('GME', case=False)]
non_gm

Unnamed: 0,Title,Date,Upvote Ratio,Total Comments
0,"Daily Popular Tickers Thread for September 16,...",1631790000.0,0.93,12391
1,"Daily Popular Tickers Thread for September 15,...",1631707000.0,0.92,7229
2,I just quit my job so that I could roll over m...,1630590000.0,0.82,2079
3,Today is the day. Over 2M in my favorite stock...,1631101000.0,0.89,1348
4,"Daily Popular Tickers Thread for September 20,...",1632132000.0,0.92,2139
5,GME GANG IS BACK,1629831000.0,0.85,1526
6,"Daily Popular Tickers Thread for September 21,...",1632218000.0,0.92,1780
7,"I made a lot of money on GME and quit my job, ...",1630343000.0,0.77,2961
8,My GME gain from Tuesday. Went all in with my ...,1629889000.0,0.85,1445
9,"Daily Popular Tickers Thread for September 07,...",1631016000.0,0.91,2882


In [None]:
# Convert to datetime
df['Date'] = pd.to_datetime(df['Date'], unit='s')
df

Unnamed: 0,Title,Date,Upvote Ratio,Total Comments
0,"Daily Popular Tickers Thread for September 16,...",2021-09-16 10:56:14,0.93,12391
1,"Daily Popular Tickers Thread for September 15,...",2021-09-15 12:00:49,0.92,7229
2,I just quit my job so that I could roll over m...,2021-09-02 13:32:16,0.82,2079
3,Today is the day. Over 2M in my favorite stock...,2021-09-08 11:34:14,0.89,1348
4,"Daily Popular Tickers Thread for September 20,...",2021-09-20 10:00:22,0.92,2139
5,"Daily Popular Tickers Thread for September 21,...",2021-09-21 10:00:23,0.92,1780
6,GME GANG IS BACK,2021-08-24 18:48:49,0.85,1526
7,"I made a lot of money on GME and quit my job, ...",2021-08-30 17:04:15,0.77,2961
8,"Daily Popular Tickers Thread for September 07,...",2021-09-07 12:01:42,0.91,2882
9,"Daily Popular Tickers Thread for September 22,...",2021-09-22 10:00:23,0.91,1455
