[< index](README.md)
# 23 - Accessing the Reddit API using the `praw` library

In [9]:
# In chapter 13 you've learned to work with the Wikipedia REST API. Many other sites also 
# have an API, such as Reddit. This API uses authentication, which means it's a bit harder than the
# Wikipedia API to use. Because of that, it's easier to use a library to do the difficult stuff for you. 
# We're going to use the 'praw' library. To install it, open a terminal and type 'pip install praw'.
# If you're working on Windows with the Anaconda package, try the 'Anaconda prompt' to do this.
# 
# You first need to create credentials for this app, and you also need to make a Reddit account first. 
# After that, follow this tutorial (http://www.storybench.org/how-to-scrape-reddit-with-python/)
# until the heading about the 'shebang line'.
#
# To use praw, we first need to import the library
import praw

# Also, import pandas for later in this chapter
import pandas as pd

In [10]:
# We need to define two 'keys' to use with the Reddit API. These are supposed to be secret, 
# so enter them here, don't share them with anyone! We also need a 'user agent string', 
# simply replace the place where it says 'YOUR_NAME' with your Reddit username
#
# Note how we CAPITALIZE the variable names. This is a convention
# to indicate that these are 'constants' and shouldn't be changed
CLIENT_ID = "CLIENT_ID_HERE"
CLIENT_SECRET = "CLIENT_SECRET_HERE"
USER_AGENT = f"python:{CLIENT_ID}:0.1 (by /u/YOUR_NAME)"

In [11]:
# Okay! Now we can create an instance of the api by using this command
api = praw.Reddit(
    client_id = CLIENT_ID,
    client_secret = CLIENT_SECRET,
    user_agent = USER_AGENT
)

In [12]:
# To test the api, see if this returns 'True'
api.read_only

True

In [13]:
# Let's try getting the 3 'hottest' submission on the popular 'Ask Reddit' subreddit.
# 'Hot' means they're the most popular at the moment.
# Note the 'limit' argument, we can pass these with an equals sign, just like variable declaration
submissions = api.subreddit('askReddit').hot(limit = 3)
submissions

<praw.models.listing.generator.ListingGenerator at 0x106fb70f0>

In [14]:
# As you can see this is an 'iterator', we can use 'for' to loop through these submissions
for sub in submissions:
    # Note how these objects don't use dictionaries, and you access
    # data by the dot notation. These data points are called object attributes.    
    print(sub.title)
    
    # You can even get a nested property this way
    print(sub.author.name)

What is the worst casting decision in the history of film or tv?
judgejb63
Make-A-Wish recipients and workers, what were some wishes you HAD to say no to?
abeannis
[Serious] What's something dark you've had to acknowledge about yourself?
Cormsterr


In [15]:
# Note that when you want to use the data from the Praw library in Pandas you need to convert it first to 
# a list with dictionaries
results = api.subreddit('askReddit').hot(limit = 10)
submissions = []
for result in results:
    submissions.append({
        "title" : result.title,
        "score" : result.score,
        "comments" : result.num_comments
    })

In [16]:
df = pd.DataFrame(submissions)
df.head()

Unnamed: 0,comments,score,title
0,15738,24143,What is the worst casting decision in the hist...
1,4917,20555,"Make-A-Wish recipients and workers, what were ..."
2,1488,1434,[Serious] What's something dark you've had to ...
3,5846,36005,What’s something that’s really useful on the i...
4,3692,8045,What is your most NSFW story about walking in ...


In [17]:
# There are lots of properties ('attributes') that come with every submission, you can use python's vars() 
# method to check them out. We need to convert the submissions to a list first to do that. We also need to
# re-do the method, because a 'generator' is over after one go
submissions = api.subreddit('askReddit').hot(limit = 3)
submissions = list(submissions)
vars(submissions[0])

{'_reddit': <praw.reddit.Reddit at 0x106fa2908>,
 'approved_at_utc': None,
 'subreddit': Subreddit(display_name='AskReddit'),
 'selftext': '',
 'author_fullname': 't2_zvcg7',
 'saved': False,
 'mod_reason_title': None,
 'gilded': 0,
 'clicked': False,
 'title': 'What is the worst casting decision in the history of film or tv?',
 'link_flair_richtext': [],
 'subreddit_name_prefixed': 'r/AskReddit',
 'hidden': False,
 'pwls': 6,
 'link_flair_css_class': None,
 'downs': 0,
 'parent_whitelist_status': 'all_ads',
 'hide_score': False,
 'name': 't3_9wtijh',
 'quarantine': False,
 'link_flair_text_color': 'dark',
 'author_flair_background_color': None,
 'subreddit_type': 'public',
 'ups': 24141,
 'domain': 'self.AskReddit',
 'media_embed': {},
 'author_flair_template_id': None,
 'is_original_content': False,
 'user_reports': [],
 'secure_media': None,
 'is_reddit_media_domain': False,
 'is_meta': False,
 'category': None,
 'secure_media_embed': {},
 'link_flair_text': None,
 'can_mod_post': F

In [18]:
# Another way is to use the .json 'endpoint' that is available for most Reddit URLS, for example
# this page:
# < https://www.reddit.com/r/askreddit >
# Is also available in the JSON format like this:
# < https://www.reddit.com/r/askreddit.json?limit=10 >

In [19]:
# To get the comments for a submission you can use the 'comments' attribute of a submission,
# we do need to a little magic to filter out the comments that aren't really comments
# but 'read more' buttons
# We're just going to take the first submission from the list as an example
for comment in submissions[0].comments:
    # Check if this is a 'more comments' node, skip those
    # 'continue' wil go through the next loop
    if isinstance(comment, praw.models.MoreComments):
        continue
        
    # Limit the body text to the first 100 chars
    print(comment.body[0:100])

The fake baby on American Sniper.
Imagine this. It's the mid-'80s. You're making a badass action movie about an immortal Scottish warr
Will Smith’s kid in Day the Earth Stood Still
remember that time on that 70s show where they replaced laurie and didn’t expect anyone to notice?
Didn't actually end up getting cast, but Nicolas Cage was almost cast as Aragorn in Peter Jackson's 
Remember that movie where Josh Peck and Chris Hemsworth were supposed to be brothers?

edit: typo
Abigail Breslin as Baby in the abc Dirty Dancing remake. It was fucking awful.
For a cameo, Jimmy Fallon in Band of Brothers. Takes me right out of the scene.

Oddly enough, you'd
Mickey Rooney as Mr. Yunioshi in Breakfast at Tiffany's
John Wayne as Genghis Khan in The Conquerer
Everyone in The Last Air Bender, bonus points for also having terrible screen writer and director de
Moisés Arias as Bonzo in Enders Game. The bullying scenes don't work when Bonzo is half the size of 
Dakota Johnson and Jamie Dornan in Fift