## Extract, Transform, Load 
This note book will be responsible for connecting to reddit api, extracting data, and storing it automatically. It will also use the python library, yfinance, to gather Yahoo Finance stock data. 

The goal is to extract data from the yfinance library, extract post content from reddit, automatically transform/clean the data and append it to a MongoDB database (via pymongo). 

Ultimately, this process has the potential to be automated.

In [45]:
# Import dependencies
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import pymongo
import requests
import json
import praw
from config import KEY, CLIENT_ID, PW

In [46]:
# Create variables for API credentials
client_id = CLIENT_ID
client_k =KEY
usr_agent = 'etlAPP'
username = 'joechancey11'
pw = PW

In [47]:
# Create object for PRAW login credentials
def reddit_request():
    reddit = praw.Reddit(client_id=client_id, client_secret=client_k, user_agent=usr_agent, username=username, password=pw)
    return reddit

In [48]:
# Make reddit equal to our object
reddit = reddit_request()

In [51]:
# Choose our subreddit - Can be swapped
subreddit = reddit.subreddit("wallstreetbets")

In [58]:
# This is a sample search so that we can get keys and understand for Reddit API is giving back results. - PRAW makes this irrelevant. 
first_search = subreddit.search("GME", limit=5, sort='top')
[vars(x) for x in first_search]

[{'comment_limit': 2048,
  'comment_sort': 'confidence',
  '_reddit': <praw.reddit.Reddit at 0x17d6d03a4c0>,
  'approved_at_utc': None,
  'subreddit': Subreddit(display_name='wallstreetbets'),
  'selftext': '',
  'author_fullname': 't2_49l8qytq',
  'saved': False,
  'mod_reason_title': None,
  'gilded': 127,
  'clicked': False,
  'title': 'GME YOLO update — Jan 28 2021',
  'link_flair_richtext': [{'e': 'text', 't': 'YOLO'}],
  'subreddit_name_prefixed': 'r/wallstreetbets',
  'hidden': False,
  'pwls': 7,
  'link_flair_css_class': 'yolo',
  'downs': 0,
  'top_awarded_type': None,
  'hide_score': False,
  'name': 't3_l78uct',
  'quarantine': False,
  'link_flair_text_color': 'light',
  'upvote_ratio': 0.97,
  'author_flair_background_color': '',
  'subreddit_type': 'public',
  'ups': 284317,
  'total_awards_received': 8757,
  'media_embed': {},
  'author_flair_template_id': None,
  'is_original_content': False,
  'user_reports': [],
  'secure_media': None,
  'is_reddit_media_domain': Tru

In [100]:
# Create an empty DataFrame to add our data
df = pd.DataFrame(columns=['Title', 'Date', 'Upvote Ratio', 'Total Comments'])
df

Unnamed: 0,Title,Date,Upvote Ratio,Total Comments


In [101]:
# Query Reddit API for submissions that include GME
for submission in subreddit.search("GME", limit=50):
    df = df.append({'Title': submission.title, 'Date': submission.created_utc, 'Upvote Ratio': submission.upvote_ratio, 'Total Comments': submission.num_comments}, ignore_index=True)
df

Unnamed: 0,Title,Date,Upvote Ratio,Total Comments
0,"Daily Popular Tickers Thread for September 16,...",1631790000.0,0.93,12393
1,"Daily Popular Tickers Thread for September 15,...",1631707000.0,0.92,7229
2,I just quit my job so that I could roll over m...,1630590000.0,0.82,2079
3,Today is the day. Over 2M in my favorite stock...,1631101000.0,0.89,1348
4,"Daily Popular Tickers Thread for September 20,...",1632132000.0,0.92,2137
5,GME GANG IS BACK,1629831000.0,0.85,1526
6,"Daily Popular Tickers Thread for September 21,...",1632218000.0,0.92,1780
7,My GME gain from Tuesday. Went all in with my ...,1629889000.0,0.85,1445
8,"I made a lot of money on GME and quit my job, ...",1630343000.0,0.77,2961
9,"Daily Popular Tickers Thread for September 07,...",1631016000.0,0.91,2882


In [102]:
# Ensure our DataFrame contains GME
df[df['Title'].str.contains("GME")]

Unnamed: 0,Title,Date,Upvote Ratio,Total Comments
0,"Daily Popular Tickers Thread for September 16,...",1631790000.0,0.93,12393
1,"Daily Popular Tickers Thread for September 15,...",1631707000.0,0.92,7229
2,I just quit my job so that I could roll over m...,1630590000.0,0.82,2079
3,Today is the day. Over 2M in my favorite stock...,1631101000.0,0.89,1348
4,"Daily Popular Tickers Thread for September 20,...",1632132000.0,0.92,2137
5,GME GANG IS BACK,1629831000.0,0.85,1526
6,"Daily Popular Tickers Thread for September 21,...",1632218000.0,0.92,1780
7,My GME gain from Tuesday. Went all in with my ...,1629889000.0,0.85,1445
8,"I made a lot of money on GME and quit my job, ...",1630343000.0,0.77,2961
9,"Daily Popular Tickers Thread for September 07,...",1631016000.0,0.91,2882
