## Extract, Transform, Load 
This note book will be responsible for connecting to reddit api, extracting data, and storing it automatically. It will also use the python library, yfinance, to gather Yahoo Finance stock data. 

The goal is to extract data from the yfinance library, extract post content from reddit, automatically transform/clean the data and append it to a MongoDB database (via pymongo). 

Ultimately, this process has the potential to be automated.

In [1]:
# Import API auth requirements
from config import KEY, CLIENT_ID, PW
import requests

In [2]:
# Pass API auth through to reddit
reddit_auth = requests.auth.HTTPBasicAuth(CLIENT_ID, KEY)

In [3]:
# Pass login credentials
reddit_login = {
    'grant_type': 'password',
    'username': 'joechancey11',
    'password': PW
}

In [4]:
# Pass header arguement for API 
reddit_api_headers = {'User-Agent': 'etlAPI/0.0.1'}

In [5]:
# Send request to 1. Check if we are logged in 2. Obtain temp access token 
reddit_request = requests.post('https://www.reddit.com/api/v1/access_token', auth=reddit_auth, data=reddit_login, headers=reddit_api_headers)

In [6]:
# Store temp access token 
reddit_token = reddit_request.json()['access_token']

In [7]:
# We will need to add this token to the header so we can use API
reddit_api_headers['Authorization'] = f'bearer {reddit_token}'

In [8]:
# Check connection
requests.get('https://www.reddit.com/api/v1/me', reddit_api_headers)

<Response [429]>

In [9]:
grab_post = requests.get('https://oauth.reddit.com/r/wallstreetbets/hot', headers=reddit_api_headers)
grab_post.json()

{'kind': 'Listing',
 'data': {'after': 't3_pteqgn',
  'dist': 27,
  'modhash': None,
  'geo_filter': None,
  'children': [{'kind': 't3',
    'data': {'approved_at_utc': None,
     'subreddit': 'wallstreetbets',
     'selftext': 'What are your moves tomorrow? Please keep the shitposting at a slow boil. \n\n^Navigate ^WSB|^We ^recommend ^best ^daily ^DD\n:--|:--\n**Discussion** | [All](https://reddit.com/r/wallstreetbets/search?sort=new&amp;restrict_sr=on&amp;q=flair%3ADiscussion) / [**Best Daily**](https://www.reddit.com/r/wallstreetbets/search?sort=top&amp;q=flair%3ADiscussion&amp;restrict_sr=on&amp;t=day) / [Best Weekly](https://www.reddit.com/r/wallstreetbets/search?sort=top&amp;q=flair%3ADiscussion&amp;restrict_sr=on&amp;t=week)\n**DD** | [All](https://reddit.com/r/wallstreetbets/search?sort=new&amp;restrict_sr=on&amp;q=flair%3ADD) / [**Best Daily**](https://www.reddit.com/r/wallstreetbets/search?sort=top&amp;q=flair%3ADD&amp;restrict_sr=on&amp;t=day) / [Best Weekly](https://www.red

## API Check is complete

I have never used the RedditAPI before this project so this notebook was for my practice/learning. Now that I have made a connection and successfully pulled data, I want to automate this process.

It's one thing to have it run in a notebook cell by cell, but I'd like to streamline and automate this process by putting this into a callable function. See "etl-reddit-finance.ipynb" for the complete data collection.