<a href="https://colab.research.google.com/github/mratanusarkar/twitter-sentiment-analysis/blob/feature%2Ftwitter-api/Notebooks/Twitter_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Twitter API

Aim is to connect to twitter via API (v2) and use it to pull tweets based on filters and conditions such as:
- hashtags (#)
- userId or mentions (@)
- keywords (string)

Using the retrieved data, sort and order the tweets based on various parameters such as:
- number of likes
- number of comments
- number of retweets
- number of engagement or view count

So that, the data can be used for sentiment analysis.

In future (v2), we can also go in depth into each "tweet thread" and "quote tweet (retweet)" to find links and relations with each other, and get different analysis.

# Using "twitter-api v2"
https://developer.twitter.com/en/docs/twitter-api

## Import Packages

In [None]:
import tweepy
import configparser
import pandas as pd
from pprint import pprint

## Input secrets and keys

In [None]:
# secrets and keys
api_key = "<API_KEY>"
api_key_secret = "<API_KEY_SECRET>"
bearer_token = "<BEARER_TOKEN>"
access_token = "<ACCESS_TOKEN>"
access_token_secret = "<ACCESS_TOKEN_SECRET>"

## Connect to Twitter API

In [None]:
# set auth handler
auth = tweepy.OAuthHandler(api_key, api_key_secret)
auth.set_access_token(access_token, access_token_secret)

# authenticate and get api handler
api = tweepy.API(auth)

## Pull tweets using various methods

In [None]:
# get public tweets from my timeline
limit = 1

tweets_from_public_timeline = api.home_timeline(count=limit)
# pprint(tweets_from_public_timeline[0]._json)
print(tweets_from_public_timeline[0].text)

A Review in @NatRevClinOncol summarizes the multidimensional cellular and molecular profiling technologies that hav… https://t.co/QVAh9RWj88


In [None]:
# get tweets from specific user
user = 'mratanusarkar'
limit = 1

tweets_from_user_timeline = api.user_timeline(screen_name=user, count=limit, tweet_mode='extended')
# pprint(tweets_from_user_timeline[0]._json)
print(tweets_from_user_timeline[0].full_text)

Good to see latest technology getting into chess as well!! https://t.co/8k9jQtSB1K


In [None]:
# get tweets using keywords, #hashtags or @mentions

In [None]:
# get tweets from keywords
keywords = "physics"
limit = 1

tweets_from_keywords = api.search(q=keywords, count=limit, tweet_mode='extended')
# pprint(tweets_from_keywords[0]._json)
print(tweets_from_keywords[0].full_text)

RT @1MASHMARTIN: What do you remember in Physics?


In [None]:
# get tweets from #hashtags
keywords = "#physics"
limit = 1

tweets_from_keywords = api.search(q=keywords, count=limit, tweet_mode='extended', include_entities=True)
# pprint(tweets_from_keywords[0]._json)
print(tweets_from_keywords[0].full_text)

❤️‍🔥❤️‍🔥❤️‍🔥💛🤎💖 
 #science #education #biology #physics #chemistry #technology https://t.co/NN1UHwxt2d


In [None]:
# get tweets from @users or @mentions
keywords = "@3blue1brown"
limit = 1

tweets_from_keywords = api.search(q=keywords, count=limit, tweet_mode='extended', include_entities=True)
# pprint(tweets_from_keywords[0]._json)
print(tweets_from_keywords[0].full_text)

RT @michele_geraci: Domanda: Come fanno le reti neuronali e Intelligenza Artificiale ad imparare? 

Risposta: Gradient Descent

Secondo vid…


In [None]:
# combination of all
query = "#math OR #mathematics AND @3blue1brown"
limit = 1

tweets_from_keywords = api.search(q=query, count=limit, tweet_mode='extended', include_entities=True)
# pprint(tweets_from_keywords[0]._json)
print(tweets_from_keywords[0].full_text)

RT @TheGaloisCxn: I’d love some outside opinion on or verification of this proof! 🧐 

Has anyone in the #Mathematics #YouTube community see…


In [None]:
# use cursor to avoid the API cap
query = "#math OR #mathematics AND @3blue1brown OR physics"
limit = 300

tweets = tweepy.Cursor(api.search, q=query, count=100, tweet_mode='extended').items(limit)
print(tweets)

<tweepy.cursor.ItemIterator object at 0x7fd11bd33e50>


In [None]:
# print(list(tweets)[0].full_text)
# it seems that we can access the iterable only once!! 
# so better convert it to a df

In [None]:
# print all tweets
# for i, tweet in list(tweets):
#     print(i, ":", tweet.full_text)

## Save the data in a DataFrame

In [None]:
# define the column names
columns = ["Time", "User", "Tweet"]
data = []

In [None]:
for tweet in tweets:
    data.append([tweet.created_at, tweet.user.screen_name, tweet.full_text])
dataframe = pd.DataFrame(data, columns=columns)
dataframe

Unnamed: 0,Time,User,Tweet
0,2023-02-07 14:13:40,Editordon2,"Don't struggle, we are here to help\n#book rev..."
1,2023-02-07 14:05:16,russellmanthy,Richard Feynman’s path integral is both a powe...
2,2023-02-07 14:04:18,Earthworksjobs,Postdoc - Storyline scenarios of extreme event...
3,2023-02-07 14:03:47,YosleidiN,RT @dment37: d²=a²+b²+c² (Pythagoras in 3D): p...
4,2023-02-07 14:02:09,NewLeibniz,"#math #physics I can't read music, but I insta..."
...,...,...,...
295,2023-02-06 06:07:54,Legitwriters2,Experts available and reliable to ace academic...
296,2023-02-06 06:07:52,Legitwriters2,Experts available and reliable to ace academic...
297,2023-02-06 06:07:28,Legitwriters2,Experts available and reliable to ace academic...
298,2023-02-06 06:07:19,Legitwriters2,Experts available and reliable to ace academic...


## Export

In [None]:
# export data
# dataframe.to_json("tweets.json")
# dataframe.to_csv("tweets.csv")

# Using "snscrape"
https://github.com/JustAnotherArchivist/snscrape

## Import Packages

In [2]:
!pip install snscrape

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting snscrape
  Downloading snscrape-0.5.0.20230113-py3-none-any.whl (69 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.2/69.2 KB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: snscrape
Successfully installed snscrape-0.5.0.20230113


In [78]:
import snscrape.modules.twitter as sntwitter
import pandas as pd
from pprint import pprint
from tqdm.notebook import tqdm
import re

## Pull tweets using various methods

In [60]:
# list of available scrapers
[fn_names for fn_names in sntwitter.__all__ if "Scraper" in fn_names]

['TwitterSearchScraper',
 'TwitterUserScraper',
 'TwitterProfileScraper',
 'TwitterHashtagScraper',
 'TwitterTweetScraperMode',
 'TwitterTweetScraper',
 'TwitterListPostsScraper',
 'TwitterTrendsScraper']

In [46]:
# using Search Scraper
query = "india"
twitter_search = sntwitter.TwitterSearchScraper(query).get_items()
tweet = next(twitter_search)
print(vars(tweet))

{'url': 'https://twitter.com/VenubInfra/status/1623575416450748416', 'date': datetime.datetime(2023, 2, 9, 6, 51, 58, tzinfo=datetime.timezone.utc), 'rawContent': 'Just posted a photo @ Nagpur, Maharashtra, India https://t.co/NpgbQnWUDx', 'renderedContent': 'Just posted a photo @ Nagpur, Maharashtra, India instagram.com/p/CobqeTEoFaF/…', 'id': 1623575416450748416, 'user': User(username='VenubInfra', id=1604334593137528832, displayname='Venub Infra', rawDescription='Construction🏗️   Interior🏠   Renovation🏚️', renderedDescription='Construction🏗️   Interior🏠   Renovation🏚️', descriptionLinks=None, verified=False, created=datetime.datetime(2022, 12, 18, 4, 37, 29, tzinfo=datetime.timezone.utc), followersCount=20, friendsCount=169, statusesCount=112, favouritesCount=0, listedCount=0, mediaCount=13, location='Nagpur, India', protected=False, link=None, profileImageUrl='https://pbs.twimg.com/profile_images/1604335741542158336/Ttw4-fhK_normal.jpg', profileBannerUrl=None, label=None), 'replyCou

In [50]:
# using User Scraper
user = "mratanusarkar"
twitter_search = sntwitter.TwitterUserScraper(user).get_items()
tweet = next(twitter_search)
print(vars(tweet))

{'url': 'https://twitter.com/mratanusarkar/status/1623329586602995712', 'date': datetime.datetime(2023, 2, 8, 14, 35, 8, tzinfo=datetime.timezone.utc), 'rawContent': "Most important ingredients in life are:\n- Love\n- Friendship\n- Family\n\nrest all comes later...\n\nOf what use is everything else when there is no happiness? Friends, Family and Love in life brings unparalleled happiness that even heaven can't find! I bet!", 'renderedContent': "Most important ingredients in life are:\n- Love\n- Friendship\n- Family\n\nrest all comes later...\n\nOf what use is everything else when there is no happiness? Friends, Family and Love in life brings unparalleled happiness that even heaven can't find! I bet!", 'id': 1623329586602995712, 'user': User(username='mratanusarkar', id=3060824998, displayname='Atanu Sarkar', rawDescription='"An engineer by profession, Physics lover by passion"\n\nEmbedded Systems | IoT | Robotics | Machine Learning | Deep Learning\n\n🏫 Adamite\n🎓 KIITian\n🏢 Boschler', 

In [61]:
# using Profile Scraper
user = "mratanusarkar"
twitter_search = sntwitter.TwitterProfileScraper(user).get_items()
tweet = next(twitter_search)
print(vars(tweet))

{'url': 'https://twitter.com/mratanusarkar/status/1508072930886168578', 'date': datetime.datetime(2022, 3, 27, 13, 26, 20, tzinfo=datetime.timezone.utc), 'rawContent': 'check out this cli progress bar npm module I created to track and monitor any long running job/process in a loop and alert the user with sound notification when the task ends!\n\nlink: https://t.co/DSdNTkLo9Z\n\n#npm #JavaScript #NodeJS https://t.co/ePVC2yLm0J', 'renderedContent': 'check out this cli progress bar npm module I created to track and monitor any long running job/process in a loop and alert the user with sound notification when the task ends!\n\nlink: npmjs.com/package/progre…\n\n#npm #JavaScript #NodeJS https://t.co/ePVC2yLm0J', 'id': 1508072930886168578, 'user': User(username='mratanusarkar', id=3060824998, displayname='Atanu Sarkar', rawDescription='"An engineer by profession, Physics lover by passion"\n\nEmbedded Systems | IoT | Robotics | Machine Learning | Deep Learning\n\n🏫 Adamite\n🎓 KIITian\n🏢 Bosch

In [63]:
# using Hashtag Scraper
hashtag = "india"
twitter_search = sntwitter.TwitterHashtagScraper(hashtag).get_items()
tweet = next(twitter_search)
print(vars(tweet))

{'url': 'https://twitter.com/kalkineau/status/1623583224848060419', 'date': datetime.datetime(2023, 2, 9, 7, 23, tzinfo=datetime.timezone.utc), 'rawContent': '#India is considering extending a #ban on #wheat exports  \nhttps://t.co/MxKBBRXnM2', 'renderedContent': '#India is considering extending a #ban on #wheat exports  \nzcu.io/VMbr', 'id': 1623583224848060419, 'user': User(username='kalkineau', id=1005789907, displayname='Kalkine Media Australia', rawDescription='Stay Apprised, Invest Wise with Kalkine', renderedDescription='Stay Apprised, Invest Wise with Kalkine', descriptionLinks=None, verified=True, created=datetime.datetime(2012, 12, 12, 7, 13, 56, tzinfo=datetime.timezone.utc), followersCount=3770, friendsCount=4601, statusesCount=65660, favouritesCount=9093, listedCount=140, mediaCount=15806, location='Sydney, Australia', protected=False, link=TextLink(text='kalkinemedia.com/au', url='https://kalkinemedia.com/au', tcourl='https://t.co/LqanNERjAa', indices=(0, 23)), profileIma

In [51]:
# using Tweet Scraper
tweetId = "1623329586602995712"
twitter_search = sntwitter.TwitterTweetScraper(tweetId).get_items()
tweet = next(twitter_search)
print(vars(tweet))

{'url': 'https://twitter.com/mratanusarkar/status/1623329586602995712', 'date': datetime.datetime(2023, 2, 8, 14, 35, 8, tzinfo=datetime.timezone.utc), 'rawContent': "Most important ingredients in life are:\n- Love\n- Friendship\n- Family\n\nrest all comes later...\n\nOf what use is everything else when there is no happiness? Friends, Family and Love in life brings unparalleled happiness that even heaven can't find! I bet!", 'renderedContent': "Most important ingredients in life are:\n- Love\n- Friendship\n- Family\n\nrest all comes later...\n\nOf what use is everything else when there is no happiness? Friends, Family and Love in life brings unparalleled happiness that even heaven can't find! I bet!", 'id': 1623329586602995712, 'user': User(username='mratanusarkar', id=3060824998, displayname='Atanu Sarkar', rawDescription='"An engineer by profession, Physics lover by passion"\n\nEmbedded Systems | IoT | Robotics | Machine Learning | Deep Learning\n\n🏫 Adamite\n🎓 KIITian\n🏢 Boschler', 

In [64]:
# using ListPosts Scraper
listName = "Physics"
twitter_search = sntwitter.TwitterListPostsScraper(listName).get_items()
tweet = next(twitter_search)
print(vars(tweet))

{'url': 'https://twitter.com/myshoeandshisha/status/1621613685834760192', 'date': datetime.datetime(2023, 2, 3, 20, 56, 45, tzinfo=datetime.timezone.utc), 'rawContent': 'science tier list:\nphysics\nchemistry\nbiology\n\npsychology', 'renderedContent': 'science tier list:\nphysics\nchemistry\nbiology\n\npsychology', 'id': 1621613685834760192, 'user': User(username='myshoeandshisha', id=1236299057582219264, displayname='bobby', rawDescription='im bengali and maeesha 🛒🎡', renderedDescription='im bengali and maeesha 🛒🎡', descriptionLinks=None, verified=False, created=datetime.datetime(2020, 3, 7, 14, 34, 12, tzinfo=datetime.timezone.utc), followersCount=127, friendsCount=157, statusesCount=3322, favouritesCount=15435, listedCount=0, mediaCount=177, location='Houston, TX', protected=False, link=TextLink(text='www.com', url='http://www.com', tcourl='https://t.co/m8cfO8t77v', indices=(0, 23)), profileImageUrl='https://pbs.twimg.com/profile_images/1585314912657719297/C458OLZJ_normal.jpg', pro

In [None]:
# using Trends Scraper
user = "mratanusarkar"
twitter_search = sntwitter.TwitterProfileScraper(user).get_items()
tweet = next(twitter_search)
print(vars(tweet))

On analysis, useful methods that can be used for our purpose are:
* TwitterSearchScraper
* TwitterUserScraper
* TwitterProfileScraper
* TwitterHashtagScraper
* TwitterTrendsScraper

Since TwitterSearchScraper looks most useful for our use-case, let's study the response object:

In [65]:
# look at all the keys of the object
vars(tweet).keys()

dict_keys(['url', 'date', 'rawContent', 'renderedContent', 'id', 'user', 'replyCount', 'retweetCount', 'likeCount', 'quoteCount', 'conversationId', 'lang', 'source', 'sourceUrl', 'sourceLabel', 'links', 'media', 'retweetedTweet', 'quotedTweet', 'inReplyToTweetId', 'inReplyToUser', 'mentionedUsers', 'coordinates', 'place', 'hashtags', 'cashtags', 'card', 'viewCount', 'vibe'])

In [71]:
useful_keys = ['id', 'date', 'user', 'rawContent', 'viewCount', 'likeCount', 'replyCount', 'retweetCount', 'quoteCount', 'url']
data = [str(vars(tweet).get(keys)) for keys in useful_keys]
data

['1621613685834760192',
 '2023-02-03 20:56:45+00:00',
 'https://twitter.com/myshoeandshisha',
 'science tier list:\nphysics\nchemistry\nbiology\n\npsychology',
 '156',
 '1',
 '2',
 '0',
 '0',
 'https://twitter.com/myshoeandshisha/status/1621613685834760192']

In [75]:
data = [
    tweet.id,
    tweet.date,
    tweet.user.username,
    tweet.rawContent,
    tweet.viewCount,
    tweet.likeCount,
    tweet.replyCount,
    tweet.retweetCount,
    tweet.quoteCount,
    tweet.url
]
data

[1621613685834760192,
 datetime.datetime(2023, 2, 3, 20, 56, 45, tzinfo=datetime.timezone.utc),
 'myshoeandshisha',
 'science tier list:\nphysics\nchemistry\nbiology\n\npsychology',
 156,
 1,
 2,
 0,
 0,
 'https://twitter.com/myshoeandshisha/status/1621613685834760192']

## Build the scraper script

In [77]:
# scrape and build a dataframe

query = "mratanusarkar"
limit = 10
tweets = []
columns = []

twitter_search = sntwitter.TwitterSearchScraper(query).get_items()
for tweet in tqdm(twitter_search, total=limit):
    if len(tweets) == limit:
        columns = list(vars(tweet).keys())
        break
    else:
        tweets.append(list(vars(tweet).values()))

df = pd.DataFrame(tweets, columns=columns)
df.head(1)

  0%|          | 0/10 [00:00<?, ?it/s]

Unnamed: 0,url,date,rawContent,renderedContent,id,user,replyCount,retweetCount,likeCount,quoteCount,...,inReplyToTweetId,inReplyToUser,mentionedUsers,coordinates,place,hashtags,cashtags,card,viewCount,vibe
0,https://twitter.com/mratanusarkar/status/16233...,2023-02-08 14:35:08+00:00,Most important ingredients in life are:\n- Lov...,Most important ingredients in life are:\n- Lov...,1623329586602995712,https://twitter.com/mratanusarkar,0,0,0,0,...,,,,,,,,,7,


In [79]:
def get_tweets(query, limit, keep_keys=[]):
    tweets = []
    columns = []
    pattern = re.compile(r'(?<!^)(?=[A-Z])')
    twitter_search = sntwitter.TwitterSearchScraper(query).get_items()
    for tweet in tqdm(twitter_search, total=limit):
        if len(tweets) == limit:
            if len(keep_keys) > 0:
                columns = [pattern.sub('_', keys).lower() for keys in keep_keys]
            else:
                columns = list(vars(tweet).keys())
            break
        else:
            if len(keep_keys) > 0:
                data = [str(vars(tweet).get(keys)) for keys in keep_keys]
                tweets.append(data)
            else:
                tweets.append(list(vars(tweet).values()))

    df = pd.DataFrame(tweets, columns=columns)
    return df

In [81]:
useful_keys = ['id', 'date', 'user', 'rawContent', 'viewCount', 'likeCount', 'replyCount', 'retweetCount', 'quoteCount', 'url']
get_tweets("mratanusarkar", 10, useful_keys)

  0%|          | 0/10 [00:00<?, ?it/s]

Unnamed: 0,id,date,user,raw_content,view_count,like_count,reply_count,retweet_count,quote_count,url
0,1623329586602995712,2023-02-08 14:35:08+00:00,https://twitter.com/mratanusarkar,Most important ingredients in life are:\n- Lov...,7,0,0,0,0,https://twitter.com/mratanusarkar/status/16233...
1,1621363542137131008,2023-02-03 04:22:46+00:00,https://twitter.com/mratanusarkar,Good to see latest technology getting into che...,20,1,0,0,0,https://twitter.com/mratanusarkar/status/16213...
2,1620974119474044928,2023-02-02 02:35:20+00:00,https://twitter.com/mratanusarkar,"Be kind, selfless and generous to others,\nbut...",14,1,0,0,0,https://twitter.com/mratanusarkar/status/16209...
3,1618173644068499457,2023-01-25 09:07:15+00:00,https://twitter.com/mratanusarkar,@mochacoldcoffee 👍,22,0,0,0,0,https://twitter.com/mratanusarkar/status/16181...
4,1615401618236923906,2023-01-17 17:32:13+00:00,https://twitter.com/mratanusarkar,"@anishgiri Exactly 12 years later, on the exac...",1896,3,0,0,0,https://twitter.com/mratanusarkar/status/16154...
5,1614994292279488514,2023-01-16 14:33:39+00:00,https://twitter.com/mratanusarkar,India is rocking it 🤘\nProud to be born in suc...,15,0,0,0,0,https://twitter.com/mratanusarkar/status/16149...
6,1614885381870346241,2023-01-16 07:20:52+00:00,https://twitter.com/mratanusarkar,who am I? \nwho are you?\nwho are we?\nwhat's ...,7,1,0,0,0,https://twitter.com/mratanusarkar/status/16148...
7,1611986810539999232,2023-01-08 07:22:59+00:00,https://twitter.com/mratanusarkar,"Every action has its consequences.\nAlso ""you""...",11,0,0,0,0,https://twitter.com/mratanusarkar/status/16119...
8,1611920785291214850,2023-01-08 03:00:37+00:00,https://twitter.com/mratanusarkar,@ReheSamay What the hell ending was that!? 😂\n...,25,1,0,0,0,https://twitter.com/mratanusarkar/status/16119...
9,1611335496848011265,2023-01-06 12:14:54+00:00,https://twitter.com/mratanusarkar,Work like is a gaseous fluid. It takes up the ...,9,0,0,0,0,https://twitter.com/mratanusarkar/status/16113...


## Export

In [82]:
# export data
# dataframe.to_json("tweets.json")
# dataframe.to_csv("tweets.csv")