# Twitter Mining

---

### Objectives

* Understand that Twitter is influential in **branding** of:
  - companies as well as individuals
* use multiple twitter APIs

  -`tweepy` = python twitter API for mining tweets
  - search API
  - streaming API
  - trends API
* Use NLP to analyze tweets
* Spot trends 
* Map tweets
* Store tweets



### Introduction

* Prediction is common
* Prediction is hard 
* Prediction is costly
* **Quality** prediction is highly desirable (i.e., \$\$\$\$\$\$)
  - Awards are great

> AI + Big data **improve** prediction

---
**Goal** - Gather large quantity of data from Twitter; examine sentiment of tweets.

### Data Mining

**Data mining** is the process of querying large amounts of data to find insights valuable to individuals and organizations.

Tweets can be used to **predict**:
* elections, 
* movie ticket sales, 
* product satisfaction

We will
* Connect to Twitter APIs via (cloud-based) web services
  - Twitter Search API
  - Streaming API
  - Trends API

* Collect flood of new tweets

### What is Twitter?
* Founded 2006
* Goldmine for data mining
* Microblogging site (140 characters) 
  - Tweets
* One-way "friendships" called "following"
  - no reciprocity needed
* Hundreds of millions of users
  - users with millions of followers
  - live streaming tweets is "drinking from the firehose"
* A favorite big data source


Quality of data for which decisions are made is **mission-critical**.

### Tweepy library

* Authenticate API
* Users/Acounts API
* Trends API
* Tweets API (search)

### Restrictions

* Rate limits exist (see page 519)
  - Search tweets only for the last seven days and get limited number of tweets on free service.
  - 15 minute intervals
  - Can be configured to wait between calls.

* **Scraping without prior consent is strictly prohibited.**

* Violation of terms will result in account termination



#### URLs
- [Terms of Service](https://twitter.com/tos)
- [Developer agreement](https://developer.twitter.com/en/developer-terms/agreement-and-policy.html)

- [Developer Policy](https://developer.twitter.com/en/developer-terms/policy.html)

- [Other Restrictions](https://developer.twitter.com/en/developer-terms/more-on-restricted-use-cases)

- [Rate Limits](https://developer.twitter.com/en/docs/basics/rate-limiting)

- [More Rate Limit Information](https://developer.twitter.com/en/docs/batics/rate-limits)



# Let's Go! 

1. Apply for Twitter Developer Account (13.3)
  
  - https://developer.twitter.com/en/apply-for-access

  - Read and agree to the terms to complete application, confirm email.
  - Approval is **NOT** guaranteed
  
3. After you have an account, **obtain credentials** (for interacting with APIs)

## Obtain credentials by creating an **app**.
- Each app has separate credentials
- Store your credentials: https://developer.twitter.com/en/docs/authentication/guides/authentication-best-practices




# Twitter Anatomy

- API's return JSON (JavaScript Object Notation) data.  Structured data.
  - JSON much like python dictionary object.  `{key1:value1, key2:value2}`

Key attributes of a Tweet
1. user
2. text
3. retweet_count
4. coordinates (latitude/longitude)
5. id = unique identifier of the tweet
6. created_at (date/time)

See also:

https://develop.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json.html

https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet

https://github.com/twitterdev/getting-started-with-the-twitter-api-v2-for-academic-research

# Documentation

https://developer.twitter.com/en/docs

https://docs.tweepy.org/en/stable/index.html


# Tweepy Library
Python library for interacting with Twitter APIs.  Processes JSON objects.  See the [Tweepy Documentation](http://docs.tweepy.org/en/latest) for more information.

* Install [tweepy](http://www.tweepy.org) (a Twitter API to access tweets) 

  `pip install tweepy==4.5`

* Install [geopy library](https://github.com/geopy/geopy): for Geo-location 

  `conda install -c conda-forge geopy`



In [None]:
#!pip install tweepy==4.5  # Terminal Command

In [None]:
import tweepy

# Upload Keys

In [None]:
# Interact with colab files and upload keys.py 
from google.colab import files
files.upload()


In [None]:
import keys

In [None]:
# Check it out (Replace 'bear' with what you called your bearer_token)
print(keys.bear)

## Create a Client object (API)


In [None]:
api = tweepy.Client(keys.bear, wait_on_rate_limit=True)

# Get User Data

In [None]:
snoop = api.get_user(username='snoopdogg')

In [None]:
snoop.data.id
snoop.data.name
snoop.data.username

'SnoopDogg'

In [None]:
snoop.data.name

'Snoop Dogg'

In [None]:
snoop_followers = api.get_users_followers(snoop.data.id, max_results = 14)

In [None]:
snoop_followers

Response(data=[<User id=1142478989493190657 name=Doc Wobley username=DocWobley>, <User id=1494650206830292996 name=Amadou Tahirou Diakité username=AmadouT02030270>, <User id=1486617114366652425 name=Haitian Nft username=HaitianNft>, <User id=1494328007061741579 name=HabibiYalllahhh username=habibiyallahhh>, <User id=1494435597829607424 name=aphelele toto username=TotoAphelele>, <User id=898887379255042048 name=Miguel username=zeiskii>, <User id=1482540145647759360 name=Darren username=Darren15807833>, <User id=1491976483270807556 name=Atoy Perry username=AtoyPerry3>, <User id=713355429535944704 name=Rajat_thool username=rajat_thool>, <User id=1494657205613834253 name=ZOOL COOL 💛🐾 username=ZOOLCOO14648961>], includes={}, errors=[], meta={'result_count': 10, 'next_token': 'P1PTLRQOQ7O1EZZZ'})

In [None]:
snoop_followers.meta['result_count']

14

# Search Tweets

In [None]:
tweets = api.search_recent_tweets(query = 'covid -is:retweet', max_results = 11)

In [None]:
tweets

Response(data=[<Tweet id=1494671069864939520 text=COVID-19罹患時に重症化しない為に、普段からビタミンD3＋納豆 https://t.co/siXOG0SXYi>, <Tweet id=1494671069621465090 text=Tatile gelip covid olmakta tam benim şanssızlığıma uygun zaten>, <Tweet id=1494671069348777987 text=@ReutersFacts Oh fuck, so these vaccines are giving people HIV now? If I were dooped into taking that "vaccine" that does absolutely nothing against covid and saw the "experts" telling me that it doesn't cause HIV I'd be seriously concerned because everything they say doesn't happen happens.>, <Tweet id=1494671068228898816 text=#COVID #Denmark.

Gleiches Spiel. Fallzahlen sinken, Einweisungen steigen, Belegung steigt. Die Rate nur noch leicht, so daß das Dunkelfeld nicht mehr größer wird. https://t.co/7QXzHRBvK4>, <Tweet id=1494671068123910145 text=Saat ini persentase keterisian tempat tidur pasien COVID-19 di Sumsel mencapai 33 persen. #publisherstory https://t.co/5sqvOfVQmw>, <Tweet id=1494671068086239233 text=Strict protocols to stay in effe

In [None]:
# print(tweets)
for tweet in tweets.data:
  print(tweet.id, tweet.text)

In [None]:
# query to search for tweets
query =  'from:snoopdogg -is:retweet'

# your start and end time for fetching tweets
start_time = "2022-02-12T00:00:00Z"
end_time = "2022-02-18T00:00:00Z"

# get tweets from the API
tweets = api.search_recent_tweets(query=query,
                                     tweet_fields = ["created_at", "text", "source"],
                                     user_fields = ["name", "username", "location", "verified", "description"],
                                     max_results = 10
                                     )

In [None]:
for tweet in tweets.data:
  print(tweet.id)
  print(tweet.text)

1494628348277911553
New theme song comin 🔜.  Party bear aka suga 🐻 🔥✨✨
1494455034880225289
Coming Soon 🎥✨✨💨 https://t.co/zusyagDZZR
1494341450158391299
I wanna change the game and reinvent the system. STASH BOXES  live 4 a limited time. #deathrowrecords #BODR #NFTs https://t.co/cgUSGn77xI
1494071263551111170
@SHAQ If you own the early access pass then you are automatically on the whitelist. https://t.co/ckEOApZwL6
1494070598841946113
If you won the early access pass the you are already on the whitelist.  https://t.co/ckEOApZwL6
1494053108988596224
Tha Doggies. On Tha Sandbox. My 1st ever 10,000 avatars droppin 2/22/22. Be ready. 👊🏿🔥🔥👊🏿👊🏿💪🏿🎤 checc it @TheSnoopAvatars #twotwotwentytwo https://t.co/zbad8YDrKZ
1494026987974762497
https://t.co/BbiPrNiile https://t.co/lXS4sQslUg
1493994288148926465
🔥🚀👊🏾 https://t.co/LxjJ9rXdxX
1493747528776380417
See yall at VeeCon. 👊🏿👊🏿 N.F.T. never 4get 2 pass it 💪🏿📈🎤🤣🔥🔥

https://t.co/w8g4GAPTNp
@veefriends @veeconference #SeeyouatVeeCon https://t.co/nSESF

In [None]:
for tweet in tweets.data:
  print(tweet.id)

1494628348277911553
1494455034880225289
1494341450158391299
1494071263551111170
1494070598841946113
1494053108988596224
1494026987974762497
1493994288148926465
1493747528776380417
1493725265280655362


# Tweet Volume (Count)

In [None]:
# Search query
query = 'from:gordongee -is:retweet'

counts = api.get_recent_tweets_count(query=query, granularity='day')

for count in counts.data:
    print(count)

{'end': '2022-02-12T00:00:00.000Z', 'start': '2022-02-11T14:05:07.000Z', 'tweet_count': 0}
{'end': '2022-02-13T00:00:00.000Z', 'start': '2022-02-12T00:00:00.000Z', 'tweet_count': 0}
{'end': '2022-02-14T00:00:00.000Z', 'start': '2022-02-13T00:00:00.000Z', 'tweet_count': 0}
{'end': '2022-02-15T00:00:00.000Z', 'start': '2022-02-14T00:00:00.000Z', 'tweet_count': 1}
{'end': '2022-02-16T00:00:00.000Z', 'start': '2022-02-15T00:00:00.000Z', 'tweet_count': 0}
{'end': '2022-02-17T00:00:00.000Z', 'start': '2022-02-16T00:00:00.000Z', 'tweet_count': 0}
{'end': '2022-02-18T00:00:00.000Z', 'start': '2022-02-17T00:00:00.000Z', 'tweet_count': 0}
{'end': '2022-02-18T14:05:07.000Z', 'start': '2022-02-18T00:00:00.000Z', 'tweet_count': 0}


In [None]:
count['tweet_count']

0

In [None]:
count['end']

'2022-02-18T14:05:07.000Z'

# Get Snoop's Followers

In [None]:
id = snoop.data.id

snoopers = api.get_users_followers(id=id, 
                                user_fields=['profile_image_url'], 
                                max_results = 10)
for user in snoopers.data:
    print(user.id)

1494407697311150083
1494673353558401032
1224383199653924865
1494674278842724352
268264909
3277767690
1362763713107886080
1494671953256493061
1494582705366913024
1304121213380329476


# Who does Snoop follow?

In [None]:
id = snoop.data.id

snooping = api.get_users_following(id=id, max_results = 10)

for user in snooping.data:
    print(user.id, user.name)

109278089 Bux
1406017600652550149 Sass
1288572182444961793 Gala Games
1150325930 Jason Brink aka BitBender
1488953151914840064 Gala Music
50953406 Death Row Records
1384711969912102912 Garga.eth
96860504 KAWS
1374934393140117506 Altered State Machine
1486424622115565568 The Doggies


# Who Liked one of Snoop's Tweets?

In [None]:
tweet_id = '1494341450158391299'

who = api.get_liking_users(id = tweet_id)

for _ in who.data:
  print(_)

# Who retweeted Snoop's Tweet?

In [None]:
tweet_id = '1494341450158391299'

whodis = api.get_retweeters(id=tweet_id, user_fields=['location', 'created_at'], max_results = 10)

for retweeter in whodis.data:
    print(retweeter, retweeter.location, retweeter.created_at)

Unexpected parameter: max_results


RaymondTjahyad2 Jakarta Capital Region 2018-06-27 11:16:04+00:00
Gabriel58507901 None 2021-05-13 13:32:13+00:00
mejvineb None 2016-06-28 11:58:10+00:00
SoldiersMeta None 2022-01-28 00:07:44+00:00
LinaKim62089954 None 2021-12-04 10:03:41+00:00
kogayon39 None 2018-01-07 00:44:35+00:00
GOPALCH7396 None 2021-10-05 12:39:31+00:00
PabzSantos30 None 2021-08-24 05:00:25+00:00
JaimePe02372569 None 2021-05-11 20:28:47+00:00
RoseEAguilar2 Rialto, CA 2018-06-14 05:42:33+00:00


# Save Tweets (from Snoop) to File

In [None]:
query = 'from:snoopdogg -is:retweet'

# Name and path of the file where you want the Tweets written to
file_name = 'tweets.txt'

with open(file_name, 'a+') as f:
    for tweet in tweepy.Paginator(api.search_recent_tweets, query=query,
                                  tweet_fields=['context_annotations', 'created_at'], 
                                  max_results=100).flatten(limit=1000):
        f.write('%s\n' % tweet.id)

# Tweeping

In [None]:
# Read some tweets
query = 'critical race theory -is:retweet'

tweets = api.search_recent_tweets(query=query, max_results = 10)

In [None]:
tweets = api.search_recent_tweets(query = query, 
                                  tweet_fields = ["created_at", "text", "lang"],
                                  user_fields = ["name", "username", "location", "verified", "description"] ,
                                  expansions = ["geo.place_id"])

In [None]:
tweets = api.search_recent_tweets(query = query, max_results = 10, expansions = ['author_id'])

for x in tweets.data:
  print(x.name)

AttributeError: ignored

In [None]:
# places = {p['id']: p for p in tweets.includes['places']}
tweets.includes

{}

In [None]:
import re
from textblob import TextBlob

In [None]:
for tweet in tweets.data:
  text = re.sub(r'^https?:\/\/.*[\r\n]*', '', tweet.text)
  text = re.sub(r'http\S+','',text)
  blob = TextBlob(text)
  print(tweet.id, tweet.usernamef, text)
  print(blob.sentiment)
  print('\n')

# Location

In [1]:
from locationlistener import LocationListener

ModuleNotFoundError: ignored

# Tweet on your account
You will need your consumer key and tokens.


In [None]:
client = tweepy.Client(api_key='REPLACE_ME',
                       api_secret='REPLACE_ME',
                       access_token='REPLACE_ME',
                       access_token_secret='REPLACE_ME')

response = client.create_tweet(text='hello world')

API v1.0 - Cannot get to work :(

In [None]:
#auth = tweepy.OAuthHandler(keys.api_key,keys.api_key_secret)
#auth.set_access_token(keys.access_token,keys.access_token_secret)

#api = tweepy.API(auth, wait_on_rate_limit = True)

In [None]:
# nasa = api.get_user(username = 'nasa')

## Other Resources

https://towardsdatascience.com/twitter-pulse-checker-an-interactive-colab-notebook-for-data-sciencing-on-twitter-76a27ec8526f