# Euphoria Twitter Analysis 

## Analysis 

**Project Strategy:**
- Tweet Mining 
- Data Cleansing and Processing 
- Extracting Hashtags and Main Characters 
- Location Geocoding 
- Sentiment Analysis 
- Tableau Dashboard 

# Tweet Mining

## Notebook Scope : 

The scope of this notebook is under the Tweet Mining Section of this analysis project. Over the course of seven weeks I scraped tweets from Twitter using the Twitter API in Python (main library used to extract tweets is Tweepy). I had to use my Twitter developer account credentials. All the tweets collected had the Euphoria hashtag in them. 

I would run this notebook each week to get the tweets about the show for that time period. 

Tweet Scraping began on 17th Jan 2022

## Import Relevant Libraries

In [1]:
import pandas as pd
import tweepy
import numpy as np
import config

The API to use would be the Twitter Premium API

In [2]:
consumer_key = config.api_key
consumer_secret = config.api_secret
access_key = config.access_token
access_secret = config.token_secret

In [3]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)

In [4]:
api = tweepy.API(auth, wait_on_rate_limit = True, wait_on_rate_limit_notify= True )



**Sanity check to ensure my API works and can extract tweets** 

In [5]:

posts = api.user_timeline(screen_name = 'BillGates', count = 100, lang = 'en', tweet_mode = 'extended')

In [6]:
for tweet in posts[0:5]:
    print(tweet.full_text)

I have never met anyone who was more passionate about reducing the world‚Äôs worst inequities in health than Paul Farmer. I continue to learn from my dear friend‚Äôs life and legacy today. https://t.co/Mk3HFoiqEn
More than 1B people globally still suffer from #NTDs, many caused by parasites. I got a firsthand look at these creepy crawlies at the Parasitic Museum in Tokyo. The more we understand them, the better we can address the diseases they cause. https://t.co/jDjFYCFVgv
This year's #BESummit2022 brought together investors, policymakers, and innovators to tackle one of the world‚Äôs toughest problems: climate change. https://t.co/nZ4SWXKrhk
Math shouldn‚Äôt be a gatekeeper, limiting a student‚Äôs dreams. It should be a gateway, helping students realize them. https://t.co/YjFMqGn5yz
TerraPower, the next-gen nuclear company I founded, just agreed with PacifiCorp to study the feasibility of building five Natrium reactors for its customers. https://t.co/S6F0ALOnFO


**The following cell block below works to actually extract tweets that have the texts in the text_query variable within them, goal was to extract maximum of 50,000 tweets each week**

In [8]:
text_query = "\"euphoria hbo\" OR #EUPHORIA OR #EuphoriaHBO OR #EuphoriaHBOMax"

#text_query = "love island OR Love Island OR #LoveIsland OR #loveisland OR #loveIsland OR loveisland"


search_query = text_query + " -filter:retweets AND -filter:replies" # Exclude retweets, replies

max_tweets = 50000


# Creation of query method using parameters
tweets = tweepy.Cursor(api.search, since = "2022-02-22", until="2022-03-01", q=search_query, lang = "en", tweet_mode = 'extended').items(max_tweets)
 
# Pulling information from tweets iterable object
# Add or remove tweet information you want in the below list comprehension
tweets_list = [[tweet.full_text, tweet.created_at, tweet.id, tweet.user.name, tweet.user.screen_name, tweet.user.id_str, tweet.user.location, tweet.user.url, tweet.user.description, tweet.user.verified, tweet.user.followers_count, tweet.user.friends_count, tweet.user.favourites_count, tweet.user.statuses_count, tweet.user.listed_count, tweet.user.created_at, tweet.user.profile_image_url_https, tweet.user.default_profile, tweet.user.default_profile_image, tweet.retweet_count, tweet.favorite_count] for tweet in tweets]
 
# Creation of dataframe from tweets_list
# Did not include column names to simplify code 
tweets_df = pd.DataFrame(tweets_list)

Rate limit reached. Sleeping for: 844
Rate limit reached. Sleeping for: 844
Rate limit reached. Sleeping for: 845
Rate limit reached. Sleeping for: 847
Rate limit reached. Sleeping for: 846
Rate limit reached. Sleeping for: 847
Rate limit reached. Sleeping for: 848
Rate limit reached. Sleeping for: 845
Rate limit reached. Sleeping for: 844
Rate limit reached. Sleeping for: 845
Rate limit reached. Sleeping for: 846
Rate limit reached. Sleeping for: 844
Rate limit reached. Sleeping for: 842
Rate limit reached. Sleeping for: 844
Rate limit reached. Sleeping for: 844
Rate limit reached. Sleeping for: 844
Rate limit reached. Sleeping for: 845
Rate limit reached. Sleeping for: 844


In [10]:
tweets_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,11,12,13,14,15,16,17,18,19,20
0,We deadass have so many unanswered questions f...,2022-02-28 23:59:59,1498447922664546304,(Not) Taylor ‚ô°,folklorealbum,1365176606386491396,JMA‚ô° | She/Her,https://t.co/3Issz5SZP2,I love you and that's all I really know. (NOT ...,False,...,1406,5503,4431,16,2021-02-26 05:47:33,https://pbs.twimg.com/profile_images/149415705...,True,False,0,1
1,i can sleep peacefully now #euphoria https://t...,2022-02-28 23:59:58,1498447920756105216,ÿ±Ÿä,fuckinRii,773271568998920192,your dreams,,,False,...,114,95,5749,0,2016-09-06 21:28:04,https://pbs.twimg.com/profile_images/149305363...,True,False,0,1
2,Me watching yall tweet about Euphoria and I ha...,2022-02-28 23:59:58,1498447918222782470,MER‚ú®üòà,whomorgg,1416973349616488448,"Philadelphia, PA",,NicetownüìçCheyneyU‚Äô24,False,...,85,1494,80,0,2021-07-19 04:09:16,https://pbs.twimg.com/profile_images/147657367...,True,False,0,1
3,Farewell ü•∫ü•∫ü•∫ü•∫\n#Euphoria #EuphoriaFinale #Euph...,2022-02-28 23:59:47,1498447873670889472,Rabiu KabirüÉèüéÆüòâ,rabioukabeer,774786300457222144,Nigeria,https://t.co/26Zgi9aRny,"@rabioukabeer on all platforms\n\nLive Much, L...",False,...,826,14706,10634,0,2016-09-11 01:47:03,https://pbs.twimg.com/profile_images/149410555...,True,False,0,0
4,The fact that we wait 2 years for s2 of euphor...,2022-02-28 23:59:37,1498447829836046339,ùíúùêøùêøùêºùíúùí¥ùíúùêª üßúüèæ‚Äç‚ôÄÔ∏è,AlliayahJ,570392528,504‚öúÔ∏è317üèÅvegasüèú,,22üíê |sc‚Ä¢alliayah.xo| PiscesüåûTaurusüåëGeminiüåÑ Bla...,False,...,461,39879,27469,0,2012-05-04 00:15:09,https://pbs.twimg.com/profile_images/148900835...,False,False,1,1


In [11]:
tweets_df.shape

(50000, 21)

## Final Step within notebook would be to convert the data to CSV format with a numeric label in an effort to differentiate my `.csv` files 

In [12]:
tweets_df.to_csv('euphoria07.csv')