## Social Media Data Analysis Using Tweeter Data

### Part 1 : Get the Data
Using tweepy for authenticating the consumer key, consumer secret, access token and access token secret.
Downloading public tweet data and storing the time, user and tweet details in a csv file.

In [1]:
import tweepy
import pandas as pd

In [2]:
%run ./keys.ipynb

In [3]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token,access_token_secret)
api = tweepy.API(auth)

In [4]:
public_tweets = api.home_timeline()

In [5]:
#Printing all tweets
#print(public_tweets)

In [6]:
#Printing tweets in public tweets
for tweet in public_tweets:
    print(tweet.text)

Health authorities around the world are investigating a mysterious increase in severe cases of hepatitis - inflamma… https://t.co/4VXkAxlg1z
U.S. military landlord put families at risk even after fraud plea, Senate probe says https://t.co/k5RfUZiERE https://t.co/lKEqADEIYs
Maersk says shipping boom will stabilise in H2, revises up profit guidance https://t.co/3XJKZSbGNx https://t.co/Xh3hQCObH0
UPS beats profit estimates as it rides e-commerce boom https://t.co/YptRTL4xDF https://t.co/kVpj04c6z7
One of Dorothy's dresses from the classic American film ‘The Wizard of Oz’ has been located and is set to be auctio… https://t.co/5CN6fId6HA
Kremlin says Gazprom working on implementing roubles-for-gas scheme https://t.co/2VBJEI4qZD https://t.co/NcQOU7nhcU
With giant mall, India's Reliance sets sights on next gold rush: luxury goods https://t.co/uPP64A8hkQ https://t.co/0E2k9wQQuo
GE sees 2022 earnings at lower end of forecast on supply chain, inflationary woes https://t.co/jSZDh3zmVn https://t.c

In [7]:
print(public_tweets[0].user.screen_name)
#retweet_count

Reuters


In [8]:
columns = ['Time', 'Retweets', 'User', 'Tweet']
data = []
for tweet in public_tweets:
    data. append([tweet.created_at, tweet.retweet_count, tweet.user.screen_name, tweet.text])

In [9]:
print (data)

[[datetime.datetime(2022, 4, 26, 12, 56, 37, tzinfo=datetime.timezone.utc), 3, 'Reuters', 'Health authorities around the world are investigating a mysterious increase in severe cases of hepatitis - inflamma… https://t.co/4VXkAxlg1z'], [datetime.datetime(2022, 4, 26, 12, 55, 21, tzinfo=datetime.timezone.utc), 0, 'Reuters', 'U.S. military landlord put families at risk even after fraud plea, Senate probe says https://t.co/k5RfUZiERE https://t.co/lKEqADEIYs'], [datetime.datetime(2022, 4, 26, 12, 50, 18, tzinfo=datetime.timezone.utc), 3, 'Reuters', 'Maersk says shipping boom will stabilise in H2, revises up profit guidance https://t.co/3XJKZSbGNx https://t.co/Xh3hQCObH0'], [datetime.datetime(2022, 4, 26, 12, 45, 20, tzinfo=datetime.timezone.utc), 3, 'Reuters', 'UPS beats profit estimates as it rides e-commerce boom https://t.co/YptRTL4xDF https://t.co/kVpj04c6z7'], [datetime.datetime(2022, 4, 26, 12, 45, tzinfo=datetime.timezone.utc), 3, 'Reuters', "One of Dorothy's dresses from the classic

In [10]:
df = pd.DataFrame(data, columns=columns)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype              
---  ------    --------------  -----              
 0   Time      20 non-null     datetime64[ns, UTC]
 1   Retweets  20 non-null     int64              
 2   User      20 non-null     object             
 3   Tweet     20 non-null     object             
dtypes: datetime64[ns, UTC](1), int64(1), object(2)
memory usage: 768.0+ bytes


In [11]:
print(df)

                        Time  Retweets     User  \
0  2022-04-26 12:56:37+00:00         3  Reuters   
1  2022-04-26 12:55:21+00:00         0  Reuters   
2  2022-04-26 12:50:18+00:00         3  Reuters   
3  2022-04-26 12:45:20+00:00         3  Reuters   
4  2022-04-26 12:45:00+00:00         3  Reuters   
5  2022-04-26 12:40:20+00:00         7  Reuters   
6  2022-04-26 12:35:21+00:00        10  Reuters   
7  2022-04-26 12:30:32+00:00         8  Reuters   
8  2022-04-26 12:30:00+00:00         5  Reuters   
9  2022-04-26 12:25:19+00:00        24  Reuters   
10 2022-04-26 12:25:00+00:00        30  Reuters   
11 2022-04-26 12:20:18+00:00        16  Reuters   
12 2022-04-26 12:20:00+00:00        14  Reuters   
13 2022-04-26 12:15:19+00:00        15  Reuters   
14 2022-04-26 12:15:00+00:00        39  Reuters   
15 2022-04-26 12:10:18+00:00         5  Reuters   
16 2022-04-26 12:05:17+00:00        43  Reuters   
17 2022-04-26 12:01:10+00:00        63  Reuters   
18 2022-04-26 12:00:09+00:00   

In [12]:
df.to_csv('tweets.csv')

In [13]:
# The Twitter API only allows access to the past few weeks of tweets, it does not allow you to go any further.

### Part 2:  Analyze the data

1.	Social Media Data Analysis
2.	Data Visualization
3.	Sentimental analysis
4.	Text mining

### Listing the top 10 most retweeted tweets

In [14]:
#Most liked tweets
most_retweeted = df.loc[df.Retweets.nlargest(19).index]
print(most_retweeted)

                        Time  Retweets     User  \
17 2022-04-26 12:01:10+00:00        63  Reuters   
16 2022-04-26 12:05:17+00:00        43  Reuters   
14 2022-04-26 12:15:00+00:00        39  Reuters   
10 2022-04-26 12:25:00+00:00        30  Reuters   
9  2022-04-26 12:25:19+00:00        24  Reuters   
18 2022-04-26 12:00:09+00:00        20  Reuters   
11 2022-04-26 12:20:18+00:00        16  Reuters   
13 2022-04-26 12:15:19+00:00        15  Reuters   
12 2022-04-26 12:20:00+00:00        14  Reuters   
6  2022-04-26 12:35:21+00:00        10  Reuters   
7  2022-04-26 12:30:32+00:00         8  Reuters   
5  2022-04-26 12:40:20+00:00         7  Reuters   
19 2022-04-26 12:00:01+00:00         6  Reuters   
8  2022-04-26 12:30:00+00:00         5  Reuters   
15 2022-04-26 12:10:18+00:00         5  Reuters   
0  2022-04-26 12:56:37+00:00         3  Reuters   
2  2022-04-26 12:50:18+00:00         3  Reuters   
3  2022-04-26 12:45:20+00:00         3  Reuters   
4  2022-04-26 12:45:00+00:00   

In [15]:
most_retweeted.reset_index(drop=True)

Unnamed: 0,Time,Retweets,User,Tweet
0,2022-04-26 12:01:10+00:00,63,Reuters,"Ukraine can win war with Russia, U.S. defense ..."
1,2022-04-26 12:05:17+00:00,43,Reuters,From weed joke to agreed deal: Inside Musk's $...
2,2022-04-26 12:15:00+00:00,39,Reuters,U.S. Defense Secretary Lloyd Austin kicked off...
3,2022-04-26 12:25:00+00:00,30,Reuters,North Korea will speed up development of its n...
4,2022-04-26 12:25:19+00:00,24,Reuters,"Fed up with COVID lockdown, bankers, fund mana..."
5,2022-04-26 12:00:09+00:00,20,Reuters,A U.N. report called for urgent action to aver...
6,2022-04-26 12:20:18+00:00,16,Reuters,Indonesia may widen palm export ban to combat ...
7,2022-04-26 12:15:19+00:00,15,Reuters,Kremlin voices concern after blasts hit breaka...
8,2022-04-26 12:20:00+00:00,14,Reuters,U.S. President Joe Biden welcomed the Tampa Ba...
9,2022-04-26 12:35:21+00:00,10,Reuters,"With giant mall, India's Reliance sets sights ..."


## Text Mining

### Obtaining tweets from a specific user accounts- IRCC @CitImmCanada

In [16]:
#Obtaining tweets from specific user accounts
#-In this case, IRCC @CitImmCanada

user = "CitImmCanada"
limit = 200  #the API returns a maximum of 200 tweets per request

tweets_ircc = api.user_timeline(screen_name=user, count=limit, tweet_mode="extended")

#Creating a DataFrame to store the tweets by IRCC
columns1 = ["User", "Tweet"]
data = []

for tweet in tweets_ircc:
    print(tweet.full_text)
    
df = pd.DataFrame(data,columns=columns1)
#print(df)
print(df.info())

Our online services were recently unavailable. Be sure to clear your internet browser cache: https://t.co/6CRa8svrxW
ONLINE SERVICE HELP: Reminder - Online services will be unavailable between 12:00 am and 5:30 am EDT.
@MissButlerEAL 2/2 If a client is in Afghanistan and not able to submit biometrics, officers should collect the Additional Background Information [IMM 0153 (PDF, 1.8 MB)] form in lieu of biometrics. https://t.co/jQuVxrB73O
@MissButlerEAL 1/2 Hi. Did you apply under the public policy for extended families of former Afghan interpreters?
@Lyuba_Petrenko 2/2 For case-specific information, please use our web form. The information we need you to include depends on the reason you’re contacting us https://t.co/V2e3FnXl21. You can also call us at 1-613-321-4243.
@Lyuba_Petrenko 1/2 Hi. CUAET electronic visa applications will be processed within 14 days of receipt of a complete application, for standard, non complex cases. Note: applications that include an open work permit, or st

In [17]:
#Sentiment analysis
https://www.youtube.com/watch?v=uPKnSq6TaAk
    

SyntaxError: invalid syntax (Temp/ipykernel_23268/3093182124.py, line 2)

In [None]:
Maybe focus on analyzing:
    1. Most retweeted words
    2. IRCC tweets