# Social Media Mining: Twitter API
### Vincent Malic - Spring 2018

## Part I. Getting Data with Twitter API

### 1.1 Create Authorization and Credentials for API
* Save API key and API secret in variables
* Import Tweepy library
* Make "auth" object containing credentials to access API
* Create "api" object using tweepy

### Use tweepy `api` method to create instance with three parameters:
* auth object
* wait_on_rate_limit=True
* wait_on_rate_limit_notify=True

In [1]:
API_KEY = ""
API_SECRET = ""

In [2]:
import tweepy

In [3]:
auth = tweepy.AppAuthHandler(API_KEY, API_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

### Now we are ready to use api to get Twitter data
* Use get_user() method, with one argument, name of user to get data from
* Will return and object that represents the IUBloomington user
* Assign that output to variable called "user"

In [4]:
user = api.get_user("IUBloomington")

### User object contains a lot of information
* Want to look at pieces of the user object: name
* Other attributes: `location`, `time_zone`, `friends_count`, `user.followers_count`

In [5]:
user.name

'Indiana University'

In [6]:
user.location

'Bloomington, IN'

In [7]:
user.time_zone

'Eastern Time (US & Canada)'

In [8]:
user.friends_count

631

In [9]:
user.followers_count

210813

### Use `dir()` method to find Attributes and Methods?
* `dir()` function will print out all attributes and methods available to object
* alternately, `created_at` method, to find when the account was created
* `user.name` returns a string object of the user's name

In [10]:
dir(user)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_api',
 '_json',
 'contributors_enabled',
 'created_at',
 'default_profile',
 'default_profile_image',
 'description',
 'entities',
 'favourites_count',
 'follow',
 'follow_request_sent',
 'followers',
 'followers_count',
 'followers_ids',
 'following',
 'friends',
 'friends_count',
 'geo_enabled',
 'has_extended_profile',
 'id',
 'id_str',
 'is_translation_enabled',
 'is_translator',
 'lang',
 'listed_count',
 'lists',
 'lists_memberships',
 'lists_subscriptions',
 'location',
 'name',
 'notifications',
 'parse',
 'parse_list',
 'profile_background_color',
 'profile_background_image_url',
 'profile_back

In [11]:
user.created_at

datetime.datetime(2008, 12, 29, 20, 18, 36)

In [12]:
user.name

'Indiana University'

# Summary Information about User and Followers
* `user.followers()` function does not require any parameters
* Returns a list of dictionaries, assigned to iu followers
* iterate through list of all people who follow IUBloomington

In [13]:
iufollowers = user.followers()

for f in iufollowers[:10]:
    print(f.name)
    print(f.description)
    print("*"*50)

Gretchen Lightfoot
RYT-500, devoted yoga practitioner, hiker, daughter, wife, mother, avid reader
**************************************************
Brad Taylor
Mishawaka Football #39🏈
**************************************************
david vandeventer

**************************************************
Science on Tap
An educational outreach non-profit bringing scientists from IU to the greater Bloomington community through casual, 60-min.  science discussions over brew!🌱🍻
**************************************************
Micki Knuckles
~mom~wife~nurse~
**************************************************
Alice Burnette
Proverbs 3:5-6 • MHS • Live Life Laughing #KingdomWorker
**************************************************
Jess
vhs '19 // yike
**************************************************
Morgan Seeman
I’m just out here to have a good time with a stinker named Jake
**************************************************
Morgan Howard

************************************************

### Summary of Geospatial location of IUB Friends
* Create an empty list to store geolocation information
* Iterate through list of IUBloomington friends
* Append location to list of friends, print out location data

In [14]:
geolocations = []

for f in user.friends():
    l = f.location 
    geolocations.append(l)
    

In [15]:
geolocations[:25]

['Bloomington, IN',
 'Bloomington, IN',
 'Bloomington IN',
 'Bloomington, IN',
 'Jacobs School of Music',
 'Bloomington, IN',
 '',
 'Indiana University',
 'Indiana, USA',
 'Cardwell, Montana',
 'Bloomington, Indiana',
 'Bloomington, IN',
 'Bloomington, IN',
 'Bloomington, IN',
 'Bloomington, IN',
 'Bloomington ',
 'Bloomington, IN',
 'Indiana University Bloomington',
 'Bloomington, IN',
 'Indiana, USA']

# Explore Tweet Attributes
* Behind the scenes, Twitter refers to tweets as "statuses"
* Use `get_status()` method to get information about particular status
* Save data as status object for particular tweet, by taking url number as argument


In [16]:
status = api.get_status(962421117850963968)

In [17]:
dir(status)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_api',
 '_json',
 'author',
 'contributors',
 'coordinates',
 'created_at',
 'destroy',
 'entities',
 'favorite',
 'favorite_count',
 'favorited',
 'geo',
 'id',
 'id_str',
 'in_reply_to_screen_name',
 'in_reply_to_status_id',
 'in_reply_to_status_id_str',
 'in_reply_to_user_id',
 'in_reply_to_user_id_str',
 'is_quote_status',
 'lang',
 'parse',
 'parse_list',
 'place',
 'possibly_sensitive',
 'possibly_sensitive_appealable',
 'retweet',
 'retweet_count',
 'retweeted',
 'retweets',
 'source',
 'source_url',
 'text',
 'truncated',
 'user']

## Look at Tweet attributes using different methods: 
* `text` shows the actual text of the tweet
* `source` shows the device the tweet was sent from

### Proxy for popularity of tweet: retweet and favorite counts
* `retweet_count` shows how many times the tweet was retweeted
* `favorite_count` how many time tagged as favorite tweet

In [18]:
status.text

"While the world's eyes are on #PyeongChang2018, we want to wish a happy 68th birthday to our 9-time #Olympics champ… https://t.co/OGXM6wGhjC"

In [19]:
status.source

'Twitter for iPad'

In [20]:
status.retweet_count

28

In [21]:
status.favorite_count

159

### Identify the Author of status (i.e., "tweet")
* Itselt a user object, assigned to "u"
* Look at attributes such as: name, account
* Can jump from `user` object to `status` object, and from `status` to `user` **object using chain notation**

In [22]:
u = status.author

In [23]:
u.name

'Indiana University'

In [24]:
u.description

'The official Twitter feed from the Bloomington campus of Indiana University. Go Hoosiers!'

In [25]:
status.author.name

'Indiana University'

# Examine Retweets
* Assign results of retweet() method with status object to "retweets"
* Iterate through first 5 retweets using variable `r`, look at the text
* Look at author's screen name

In [26]:
retweets = status.retweets()

for r in retweets[:5]:
    print(r.text)

RT @IUBloomington: While the world's eyes are on #PyeongChang2018, we want to wish a happy 68th birthday to our 9-time #Olympics champion a…
RT @IUBloomington: While the world's eyes are on #PyeongChang2018, we want to wish a happy 68th birthday to our 9-time #Olympics champion a…
RT @IUBloomington: While the world's eyes are on #PyeongChang2018, we want to wish a happy 68th birthday to our 9-time #Olympics champion a…
RT @IUBloomington: While the world's eyes are on #PyeongChang2018, we want to wish a happy 68th birthday to our 9-time #Olympics champion a…
RT @IUBloomington: While the world's eyes are on #PyeongChang2018, we want to wish a happy 68th birthday to our 9-time #Olympics champion a…


In [27]:
for r in retweets[:5]:
    print("Who:")
    print(r.author.screen_name)

Who:
Emiel34
Who:
farmbrat_
Who:
missesaunt
Who:
in_bureau
Who:
HoosierMick


# Searching the Twitter API: 
* User `search()` with api object, save search results
* Argument takes `q` for query, indicate search topic, e.g.: `#IUBB`, IU Basketball
* Result returns a list of statuses, iteratable using a `for` loop

### Language filter
* Search may return foreign language status updates
* Can indicate the language setting as argument for search, e.g., English = "en"

In [28]:
search_results = api.search(q="#PyeongChang2018", lang="en")

for status in search_results[:5]:
    print(status.text)
    print(status.created_at)
    print("*"*50)

RT @CGTNOfficial: Live: Preview the opening ceremony of #PyeongChang2018 which takes place at 20:00 local time in subzero temperatures http…
2018-02-12 00:22:16
**************************************************
RT @tictoc: These are the biggest moments of the #PyeongChang2018 #Olympics opening ceremony https://t.co/FeOfCrrXJT https://t.co/HNpkW02xNx
2018-02-12 00:22:12
**************************************************
RT @Olympics: Good morning from PyeongChang! #PyeongChang2018 #Olympics https://t.co/3ceCVCTjAM
2018-02-12 00:22:12
**************************************************
What are we calling the camera used to film the #luge runs from the point of view of the sled as it goes down the c… https://t.co/jrHZT837J7
2018-02-12 00:22:10
**************************************************
RT @reuterspictures: Redmond Gerard wins USA’s first gold medal of #PyeongChang2018 with his victory in the snowboarding slopestyle competi…
2018-02-12 00:22:07
************************************

In [29]:
search_results = api.search(q="#opioids")

for status in search_results:
    print(status.text)
    print("*"*50)

RT @picardonhealth: #Kratom is hailed as a natural pain remedy, but it is chemically similar to #opioids, and it's deadly https://t.co/RyaQ…
**************************************************
RT @CannabisCulture: Medical Pot Is Our Best Hope to Fight the Opioid Epidemic #MMJ #addiction #opioids #epidemic https://t.co/VqoDUEpaIZ h…
**************************************************
RT @StormyVNV: #Military #Veterans defy Jeff Sessions, fight for #medical #marijuana to kick #opioids #Addiction https://t.co/lvs6Lqa3N6 #C…
**************************************************
RT @HollyKai2: "There is no recovery for someone falsely accused".   #ChronicPain patients are NOT addicts or "addicted" to #opioids becaus…
**************************************************
@realDonaldTrump u sure know how to pick them. Another judge #RoyMoore is what you had on Mitch @SenateMajLdr… https://t.co/tpJtyVu2Vz
**************************************************
RT @AAPainManage: Get the optimal management o

## Filter Search by Location: 
* Include `geocode` as argument in search() method 
* Three parameters: "latitude, longitude, radius(km)

In [30]:
search_results = api.search(q="#rio", geocode="48.86,2.35,20km")

for status in search_results[:5]:
    print(status.text)
    print("*"*50)

RT @cwillem: Vous êtes dispo demain? Avec @manuelapromo et toute l’équipe on vous donne rdv pour les 25 ans de @alpes_1 ambiance #Rio #live…
**************************************************
85km en moto, des vagues immenses, une plage secrète, 1800 coups de soleil, du glitter partout #rio #day2
**************************************************
RT @cwillem: Préparation tournée #Rio #live :-) j’espère que week-end était cool. Profitez de cette soirée au chaud. Take care 😊
**************************************************
RT @cwillem: Préparation tournée #Rio #live :-) j’espère que week-end était cool. Profitez de cette soirée au chaud. Take care 😊
**************************************************
RT @cwillem: Préparation tournée #Rio #live :-) j’espère que week-end était cool. Profitez de cette soirée au chaud. Take care 😊
**************************************************


# Identifying Trends by Location (trends_place)
* Location for the whole world is `1`
* Location ID for the US is 23424977
* Tweepy returns a JSON object 

## Unpacking the JSON object
* Remove object from the list and take a look at what's inside
* Dictionary has three keys: `as_of`, `created_at`, `locations`, `trends`
* Take first item in theobject as the trends

In [31]:
trending_us = api.trends_place(23424977)

theobject = trending_us[0]

thetrends = theobject['trends']

### See Trends object as list of dictionaries
* Peeled away layers to see individual trends
* Look at the first trend as dictionary, and select attributes using bracket notation

In [32]:
thetrends

[{'name': 'Paul Pierce',
  'promoted_content': None,
  'query': '%22Paul+Pierce%22',
  'tweet_volume': 91439,
  'url': 'http://twitter.com/search?q=%22Paul+Pierce%22'},
 {'name': '#ThingsThatPushMyButton',
  'promoted_content': None,
  'query': '%23ThingsThatPushMyButton',
  'tweet_volume': None,
  'url': 'http://twitter.com/search?q=%23ThingsThatPushMyButton'},
 {'name': 'Frank Reich',
  'promoted_content': None,
  'query': '%22Frank+Reich%22',
  'tweet_volume': 24349,
  'url': 'http://twitter.com/search?q=%22Frank+Reich%22'},
 {'name': '#TypoYourResume',
  'promoted_content': None,
  'query': '%23TypoYourResume',
  'tweet_volume': None,
  'url': 'http://twitter.com/search?q=%23TypoYourResume'},
 {'name': 'Jan Maxwell',
  'promoted_content': None,
  'query': '%22Jan+Maxwell%22',
  'tweet_volume': None,
  'url': 'http://twitter.com/search?q=%22Jan+Maxwell%22'},
 {'name': '#NASCAR',
  'promoted_content': None,
  'query': '%23NASCAR',
  'tweet_volume': 12236,
  'url': 'http://twitter.com

In [33]:
first_trend = thetrends[0]
first_trend["name"]

'Paul Pierce'

In [34]:
first_trend['tweet_volume']

91439

In [35]:
first_trend['query']

'%22Paul+Pierce%22'

In [36]:
first_trend['url']

'http://twitter.com/search?q=%22Paul+Pierce%22'

## Iterate through trends
* 

In [37]:
for trend in thetrends[:5]:
    print(trend['name'], trend['tweet_volume'])

Paul Pierce 91439
#ThingsThatPushMyButton None
Frank Reich 24349
#TypoYourResume None
Jan Maxwell None


## Combine all steps in one line

In [38]:
trends_russia = api.trends_place(23424936)

for trend in trends_russia[0]['trends'][:5]:
    print(trend['name'])
    

#Ан148
#СайризСосетВИнсту
Почты России
#Olympics
Евгения Медведева


# ITERATING through large LISTS

In [39]:
search_results = api.search(q="StarTrek50")

In [40]:
len(search_results)

15

In [41]:
user = api.get_user("IUBloomington")
followers = user.followers()

len(followers)

20

In [43]:
user.followers_count

210813

## Built-in object called Cursor
* Specify how many results you want, and it goes through results pages automatically
* First need to make "cursor" object as variable c, using .cursor() method
* Passing a function as argument to another function

In [None]:
search_results = api.search(q="#PyeongChang2018", lang="en")

In [44]:
c = tweepy.Cursor(api.search, q="#PyeongChang2018", lang="en")

In [46]:
c.items(500)

<tweepy.cursor.ItemIterator at 0x10d2bb5f8>

### Iterate through items in Cursor object
* Initialize empty list
* Iterate through status in cursor items, specifying the number of items
* Save first 500 results, take status, get text, store in tweet store


In [47]:
tweet_store = []

for status in c.items(500):
    statustext = status.text
    tweet_store.append(statustext)

In [48]:
len(tweet_store)

500

In [49]:
tweet_store[:100]

['Me: *watching Olympic freestyle skiing* \n\nNot a very sharp landing, that definitely costed her a tenth of a second.… https://t.co/bLctyD8c66',
 'RT @FlowerPrince_CY: Watch out for the K-pop kings EXO @weareoneEXO in #ClosingCeremony of #Olympics #PyeongChang2018 \nhttps://t.co/Vlu3QBi…',
 'RT @intelnews: Intel Drone Light Show Breaks Guinness World Records Title at Olympic Winter Games #PyeongChang2018 https://t.co/vFKTkD1NQM…',
 'RT @intel: #ExperienceTheMoment our Intel Shooting Star drones hit the slopes at #PyeongChang2018. More at https://t.co/Jkxn9vTpOt. https:/…',
 "[VID] Lim Hyojun, winner of South Korea's first golden medal at the #PyeongChang2018 games, recommends #BLACKPINK's… https://t.co/Iova0LS5fm",
 'RT @olympicchannel: Congratulations! Chris @mazdzer has won silver for #USA in #luge 🥈👏\n\n#PyeongChang2018 More events here: https://t.co/sM…',
 'RT @intel: #ExperienceTheMoment our Intel Shooting Star drones hit the slopes at #PyeongChang2018. More at https://t.co/Jkxn

### Getting a USER’s Tweets: user_timeline
* Use user_timeline api method to get most recent tweets of a user
* Combined with the Cursor method, we can get designated number of tweets from a designed user’s timeline



In [64]:
c = tweepy.Cursor(api.user_timeline, id="IndianaMBB")

In [65]:
tweet_store = []

for status in c.items(100):
    statustext = status.text
    tweet_store.append(statustext)

In [66]:
print(len(tweet_store))

100


In [67]:
tweet_store[:5]

['RT @IndianaMBB: The ⚪️🔴 put on a show last night. #IUBB https://t.co/J2SS7Ur73f',
 'Wishing a Happy Birthday to Steve Bouchie today! 🎉 https://t.co/Ul6hWjGHgD',
 'The ⚪️🔴 put on a show last night. #IUBB https://t.co/J2SS7Ur73f',
 'RT @ToastyToast123: Thank you @Mcswain_Jr21 for taking some time out last night to make my sister’s day! She’s at her happiest attending al…',
 "RT @IndianaMBB: 5️⃣ of last night's top plays with Don Fischer on the mic 🎙 #FischersFive #IUBB https://t.co/piAmJwuC5x"]

### LIMITATIONS
* Twitter search API is limited to Tweets published in past 7 days
* Places considerable restrictions on your final projects
* If you want a corpus of tweets from a while ago – no dice*
* If you want corpus of tweet produced in next few months…
  * make sure you collect tweets within 7 days of the event, or they disappear from API
* This limit does not apply to user timelines
