In [1]:
# !pip install praw (Python Reddit API Wrapper)
import praw

# !pip install tweepy (Twitter API client)
import tweepy

import pandas as pd

# Introduction to Social Media Data

Data from social media platforms are more important than ever. However, fact-checking is also more important than ever as misinformation, hatespeech, bots, and trolling become more and more prevalent. [Reddit](https://www.reddit.com/) and [Twitter](https://twitter.com/) are two of the most popular social media websites. 

# Reddit

Let's follow this tutorial to see how you can connect to Reddit's [API](https://en.wikipedia.org/wiki/Application_programming_interface): 

https://pythonprogramming.net/introduction-python-reddit-api-wrapper-praw-tutorial/

[The praw documentation is also very useful](https://praw.readthedocs.io/en/latest/getting_started/quick_start.html)!

1. Visit https://www.reddit.com/
2. Click Sign Up/Log In to create an account or login
3. Click the drop-down menu next to your username and select "Visit Old Reddit"
4. Click Preferences --> Apps --> Create New App
5. Click the "script" radio button
6. Give your project a name and description
7. Enter an "about URL" if you choose (such as your project name)
8. Enter http://localhost:8080 for your the "redirect uri"

# 1. Authenticate!

Create an instance and add your information: 

- client_id = the code under your project name in the upper-left
- client_secret = your API access token
- password = your password
- user_agent = put something like: 'dhe 1.0 by /u/dh_example'
- username = your username

# Create a Reddit API instance

In [2]:
# We need these 5 things - let's overwrite them with our own!
reddit = praw.Reddit(client_id='clientid',
                     client_secret='secret', password='password',
                     user_agent='PrawTut', username='username')

In [3]:
reddit = praw.Reddit(client_id='????????',
                     client_secret='????????', password='????????',
                     user_agent='dhe 1.0 by /u/dh_example', username='dh_example')

In [4]:
# Check out a subreddit
subreddit = reddit.subreddit('BlackLivesMatter')

In [5]:
# Filter the Python subreddit by hot topics
hot_blm = subreddit.hot()

In [None]:
# Check out reddit methods
reddit.

In [None]:
# Check out subreddit methods
subreddit.

In [6]:
# Iterate to get the object IDs
for submission in hot_blm:
    print(submission)

gswtuh
gyywij
gyt2gq
gz1mnj
gyy56b
gyqw5w
gyfsgy
gymee8
gyxuya
gyrjrg
gyk7z3
gz4578
gyxom0
gyr88a
gylgp0
gyyhap
gyzn9l
gz0s81
gyzw17
gyie43
gyqm4p
gydwol
gyq441
gyrcl3
gyj8f8
gz0idm
gytx6h
gz46pr
gz2ntq
gz3qdg
gypxuu
gz127m
gz23v1
gyhtpi
gz3415
gz1n3s
gys2st
gyw1wi
gygw6r
gz3oyb
gywfas
gyu6q5
gyn4vd
gz0056
gz5kpw
gymfjr
gyzlow
gywwpf
gz4y7q
gz0n5v
gykono
gz21es
gz4fw4
gz065g
gytecq
gz497n
gz415m
gz1ijt
gz3uxl
gyrzv7
gz2x65
gz2udf
gyfck7
gy5zwq
gyo0pa
gz2a49
gz29ft
gyzkbr
gz1s58
gz1n8g
gz1m5x
gyz4kr
gz5iqs
gz5ch8
gz5b4p
gz58xa
gz580a
gz578h
gyvsvv
gyvs6w
gz4vys
gye4s7
gz4mjj
gz4hda
gy6f23
gz06hv
gypsev
gyinns
gz028n
gyzzem
gyv3nu
gz401c
gyvs0v
gyebjw
gz3ngd
gyq15z
gz3kfo
gz3ju3
gz3ijp
gynw7k


In [7]:
# Return just the first 5 and print their titles
hot_blm = subreddit.hot(limit = 5)
for submission in hot_blm:
    print(submission.title)

If you are looking for a local protest, try searching on facebook!
Open Letter to Steve Huffman and the Board of Directors of Reddit, Inc– If you believe in standing up to hate and supporting black lives, you need to act
In LA today. Biggest protests yet.
Great Energy Here in London UK during the Protests tiktok (Lovechild1999)
Apple Maps satellite view updated to show Black Lives Matter street painting.


In [8]:
# Define a blank dictionary to store the metadata
conversedict = {}

# Get more information
hot_blm = subreddit.hot(limit = 5)
for submission in hot_blm:
    if not submission.stickied:
        print('Title: {}, ups: {}, downs: {}, Have we visited?: {}'.format(submission.title,
                                                                           submission.ups,
                                                                           submission.downs,
                                                                           submission.visited))
        
        submission.comments.replace_more(limit=0)
        for comment in submission.comments.list():
            if comment.id not in conversedict:
                conversedict[comment.id] = [comment.body,{}]
                if comment.parent() != submission.id:
                    parent = str(comment.parent())
                    conversedict[parent][1][comment.id] = [comment.ups, comment.body]

Title: In LA today. Biggest protests yet., ups: 2002, downs: 0, Have we visited?: False
Title: Great Energy Here in London UK during the Protests tiktok (Lovechild1999), ups: 169, downs: 0, Have we visited?: False
Title: Apple Maps satellite view updated to show Black Lives Matter street painting., ups: 66, downs: 0, Have we visited?: False


In [9]:
for post_id in conversedict:
    message = conversedict[post_id][0]
    replies = conversedict[post_id][1]
    if len(replies) > 1:
        print('Original Message: {}'.format(message))
        print(35*'_')
        print('Replies:')
        for reply in replies:
            print(replies[reply])

Original Message: Image how big it could be if a lot of people weren't still rightfully afraid of being in large crowds with no distancing like this.
___________________________________
Replies:
[25, 'I live in L.A. and have been involved in a lot of progressive community activism in L.A. I know there are a ton of people sitting out over COVID fears and to still see this turnout is so fucking wonderful.']
[22, "Then they'd all be busy with their wage slavery"]
[1, 'Or afraid of the police themselves. I’m asthmatic, and worry pepper spray or tear gas could end my life. I’m sure I’m not alone having difficulty weighing these decisions.']
Original Message: I can’t help but consider that COVID-19 is still going on. I’m proud of them for standing up for their rights despite the risk, I wish I could take that risk to. Unfortunately immuno compromised
___________________________________
Replies:
[32, "You DON'T have to protest to make a difference! Talk with family and friends about the probl

In [10]:
# Define a blank list for export to data frame
reddit_output = []

for post_id in conversedict:
    message = conversedict[post_id][0]
    replies = conversedict[post_id][1]
    if len(replies) > 1:
        print('Original Message: {}'.format(message))
        print(35*'_')
        print('Replies:')
        for reply in replies:
            reddit_output.append(replies[reply])

Original Message: Image how big it could be if a lot of people weren't still rightfully afraid of being in large crowds with no distancing like this.
___________________________________
Replies:
Original Message: I can’t help but consider that COVID-19 is still going on. I’m proud of them for standing up for their rights despite the risk, I wish I could take that risk to. Unfortunately immuno compromised
___________________________________
Replies:
Original Message: Black Lives Matter, Smash the patriarchy, and crush the police along with capitalism. I've never been so proud of my country's people. I love you all.
___________________________________
Replies:
Original Message: I feel like this is getting out of control . It’s literally global now . I sincerely hope that it all works out. And even more , I sincerely hope this doesn’t spread corona like wildfire.
___________________________________
Replies:
Original Message: I'm flabbergasted that all the messages about Covid-19 are being

In [11]:
# View the output of the variable
reddit_output

[[25,
  'I live in L.A. and have been involved in a lot of progressive community activism in L.A. I know there are a ton of people sitting out over COVID fears and to still see this turnout is so fucking wonderful.'],
 [22, "Then they'd all be busy with their wage slavery"],
 [1,
  'Or afraid of the police themselves. I’m asthmatic, and worry pepper spray or tear gas could end my life. I’m sure I’m not alone having difficulty weighing these decisions.'],
 [32,
  "You DON'T have to protest to make a difference! Talk with family and friends about the problems, write letters and call legislators, vote especially for local elections, sign petitions like the mail in voting one going around, donate to ACLU NAACP etc! There's so much work to be done and we can all play a part!"],
 [21,
  'I would rather be at an outdoor protest in the sunshine, with a mask on, than in an indoor bar or restaurant without one'],
 [6,
  "I'm immunocompromised too and there are definitely  a  ton of ways to get i

# Convert to data frame

In [12]:
reddit_df = pd.DataFrame(reddit_output, columns = ["Upvotes", "Text"])
reddit_df.head()

Unnamed: 0,Upvotes,Text
0,25,I live in L.A. and have been involved in a lot...
1,22,Then they'd all be busy with their wage slavery
2,1,Or afraid of the police themselves. I’m asthma...
3,32,You DON'T have to protest to make a difference...
4,21,I would rather be at an outdoor protest in the...


In [13]:
# Save original message as file name...
reddit_df.to_csv("blm reddit.csv")

# Twitter

Twitter works similarly, but you have to fill out more information to get permission to use their API. 

Here is a nice Tweepy walkthrough: 

https://realpython.com/twitter-bot-python-tweepy/

[The Tweepy docs are also very useful!](http://docs.tweepy.org/en/latest/)

1. Visit https://twitter.com/ and select create account/login
2. Visit the Twitter Developer site: https://developer.twitter.com/en and create an account/login (you can skip number 1 above if you wish)
3. Click the dropdown menu and click "Apps" --> "Create an app"
4. Fill out the information for your app and confirm your email to get authenticated.
5. Click "Tokens and keys" and generate your access keys

# Authenticate!

In [14]:
# Similar to reddit! 
auth = tweepy.OAuthHandler("????????", "????????")
auth.set_access_token("????????", "????????")

# What do these arguments do?
api = tweepy.API(auth, wait_on_rate_limit=True,
    wait_on_rate_limit_notify=True)

public_tweets = api.home_timeline()
for tweet in public_tweets:
    print(tweet.text)

In [15]:
# Sanity check - did you validate correctly?
try:
    api.verify_credentials()
    print("Success!")
except:
    print("Invalid authentication")

Success!


In [None]:
# tweepy methods!
tweepy.

In [None]:
# Maybe the Internet can provide better explanations?
tweepy.API?

In [16]:
# Find followers
user = api.get_user("billnye")

print("User details:")
print(user.name)
print(user.description)
print(user.location)

print("Last 20 Followers:")
for follower in user.followers():
    print(follower.name)

User details:
Bill Nye
Everyone you will ever meet knows something you don’t.
Los Angeles, CA
Last 20 Followers:
Supreme
Samuel
__lane__
Penta News
Philo
Amy Chip
Anderson Toribio
VanadiumCoffeecup
Matilda Butner
Derrick M
Mirka
Vanessa Pinheiro 🛰
Omar Ulises López López
LaVanda T. Luna
Josie
veecee Victor
Zora Johnson
Rainman
The Nerd
aja ussrey


In [17]:
# Get the User object for a twitter handle...
user = api.get_user('billnye')

In [18]:
print(user)

User(_api=<tweepy.api.API object at 0x114db7eb8>, _json={'id': 37710752, 'id_str': '37710752', 'name': 'Bill Nye', 'screen_name': 'BillNye', 'location': 'Los Angeles, CA', 'profile_location': None, 'description': 'Everyone you will ever meet knows something you don’t.', 'url': 'https://t.co/scBos2f6vs', 'entities': {'url': {'urls': [{'url': 'https://t.co/scBos2f6vs', 'expanded_url': 'http://billnye.com', 'display_url': 'billnye.com', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 6004779, 'friends_count': 124, 'listed_count': 20285, 'created_at': 'Mon May 04 17:42:04 +0000 2009', 'favourites_count': 150, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 2364, 'lang': None, 'status': {'created_at': 'Fri Jun 05 20:52:53 +0000 2020', 'id': 1269009321385320449, 'id_str': '1269009321385320449', 'text': 'Many of us are having a hard time looking up right now. It’s a painful moment. We here at The Plane

In [19]:
# Define an empty list
storage = []

def tweets(user_name):
          
    # Authorize yourself
    # consumer_key, consumer_secret
    auth = tweepy.OAuthHandler("????????", 
                                   "????????")
  
    # Provide your tokens
    # access_key, access_secret
    auth.set_access_token("????????", 
                              "????????")
  
    # Define an API instance
    api = tweepy.API(auth)
  
    # Define an empty list to store the tweets
    storage = []

    # Get 20 tweets
    num_tweets = 20
    tweets = api.user_timeline(screen_name = user_name)
  
    # Return user, tweet, date and time, and body
    tweet_data = [tweet.text for tweet in tweets]
    for i in tweet_data: 
  
        # Append storage
        storage.append(i)

    # What to return?
    return storage

In [20]:
twitter_output = tweets("billnye")
print(twitter_output)

['Many of us are having a hard time looking up right now. It’s a painful moment. We here at The Planetary Society rec… https://t.co/oKOnWmlSLX', 'RT @neiltyson: "Reflections on the Color of my Skin"\n\n[Commentary: 2400 words]\n\nhttps://t.co/QBG784yg3R https://t.co/ZdA5ni777x', 'I’ve had a lot on my mind — I hope you have, too. https://t.co/CbvlFF1nY6', 'Our future in space looks even brighter today!', 'A US-built Falcon 9 rocket, launched from the historic Kennedy Space Center, took two astronauts to the Internation… https://t.co/a94vVfndGI', 'On behalf of @exploreplanets, a tremendous congratulations to the teams at @SpaceX and @NASA!! #SpaceX made history… https://t.co/1V5oxzjlaA', "RT @exploreplanets: Planetary Society Member roll call, who's watching the #NASA #CrewDragon broadcast right now? 🙌\n\nhttps://t.co/hFlZUq4N9L", "Today marks the beginning of a new era of human spaceflight! Watch the #SpaceX Crew Dragon's first astronaut flight… https://t.co/EOT6PoovH6", 'Tomorrow, @Spa

In [21]:
# Convert to data frame
twitter_df = pd.DataFrame(twitter_output, columns = ["Tweet"])
twitter_df.head()

Unnamed: 0,Tweet
0,Many of us are having a hard time looking up r...
1,"RT @neiltyson: ""Reflections on the Color of my..."
2,"I’ve had a lot on my mind — I hope you have, t..."
3,Our future in space looks even brighter today!
4,"A US-built Falcon 9 rocket, launched from the ..."


In [22]:
twitter_df.to_csv("billnye tweets.csv")