## Social Media Data Analysis Using Tweeter Data

### Part 1 : Get the Data
Using tweepy for authenticating the consumer key, consumer secret, access token and access token secret.
Downloading public tweet data and storing the time, user and tweet details in a csv file.

In [1]:
import tweepy
import pandas as pd
#import configparser

In [2]:
%run ./keys.ipynb

In [3]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token,access_token_secret)
api = tweepy.API(auth)

In [4]:
public_tweets = api.home_timeline()

In [5]:
#Printing all tweets
#print(public_tweets)

In [6]:
#Printing tweets in public tweets
for tweet in public_tweets:
    print(tweet.text)

Second Ebola patient dies in northwestern Congo, WHO says https://t.co/Xiim6EPqhh https://t.co/OCwjipf7gX
Russia accuses NATO of creating a serious risk of nuclear war. Ukraine prepares war crimes charges against Russian… https://t.co/pzBYB6WHgc
Fidelity to allow retirement savings allocation to bitcoin in 401(k) accounts https://t.co/WqijSHXVuM https://t.co/oB2kn9HFLr
Warner Bros Discovery adds 2 mln subscribers in first quarter https://t.co/p1dgl9JMFv https://t.co/CM6m9OAPbw
North Korea will speed up development of its nuclear arsenal, leader Kim Jong Un said while overseeing a huge milit… https://t.co/kDheiKFo15
As plumes of smoke rose above the Azovstal steel plant where Ukrainian fighters are holed up in the port city of Ma… https://t.co/hoHUdZWT8z
Moscow accused NATO of engaging in a proxy battle against Russia by arming Ukraine, saying this had created a serio… https://t.co/smoaiSX3bn
TPG Growth acquires majority stake in proxy firm Morrow Sodali https://t.co/1RkwHhIVaW https://

In [7]:
print(public_tweets[0].user.screen_name)
#retweet_count

Reuters


In [8]:
columns = ['Time', 'Retweets', 'User', 'Tweet']
data = []
for tweet in public_tweets:
    data. append([tweet.created_at, tweet.retweet_count, tweet.user.screen_name, tweet.text])

In [9]:
print (data)

[[datetime.datetime(2022, 4, 26, 13, 40, 19, tzinfo=datetime.timezone.utc), 13, 'Reuters', 'Second Ebola patient dies in northwestern Congo, WHO says https://t.co/Xiim6EPqhh https://t.co/OCwjipf7gX'], [datetime.datetime(2022, 4, 26, 13, 40, tzinfo=datetime.timezone.utc), 11, 'Reuters', 'Russia accuses NATO of creating a serious risk of nuclear war. Ukraine prepares war crimes charges against Russian… https://t.co/pzBYB6WHgc'], [datetime.datetime(2022, 4, 26, 13, 35, 20, tzinfo=datetime.timezone.utc), 9, 'Reuters', 'Fidelity to allow retirement savings allocation to bitcoin in 401(k) accounts https://t.co/WqijSHXVuM https://t.co/oB2kn9HFLr'], [datetime.datetime(2022, 4, 26, 13, 30, 22, tzinfo=datetime.timezone.utc), 7, 'Reuters', 'Warner Bros Discovery adds 2 mln subscribers in first quarter https://t.co/p1dgl9JMFv https://t.co/CM6m9OAPbw'], [datetime.datetime(2022, 4, 26, 13, 30, 10, tzinfo=datetime.timezone.utc), 21, 'Reuters', 'North Korea will speed up development of its nuclear ars

In [10]:
df = pd.DataFrame(data, columns=columns)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype              
---  ------    --------------  -----              
 0   Time      20 non-null     datetime64[ns, UTC]
 1   Retweets  20 non-null     int64              
 2   User      20 non-null     object             
 3   Tweet     20 non-null     object             
dtypes: datetime64[ns, UTC](1), int64(1), object(2)
memory usage: 768.0+ bytes


In [11]:
print(df)

                        Time  Retweets     User  \
0  2022-04-26 13:40:19+00:00        13  Reuters   
1  2022-04-26 13:40:00+00:00        11  Reuters   
2  2022-04-26 13:35:20+00:00         9  Reuters   
3  2022-04-26 13:30:22+00:00         7  Reuters   
4  2022-04-26 13:30:10+00:00        21  Reuters   
5  2022-04-26 13:30:00+00:00        10  Reuters   
6  2022-04-26 13:29:04+00:00        35  Reuters   
7  2022-04-26 13:25:18+00:00         6  Reuters   
8  2022-04-26 13:20:19+00:00        29  Reuters   
9  2022-04-26 13:15:21+00:00        10  Reuters   
10 2022-04-26 13:11:57+00:00         8  Reuters   
11 2022-04-26 13:10:19+00:00         8  Reuters   
12 2022-04-26 13:10:07+00:00        18  Reuters   
13 2022-04-26 13:10:00+00:00        10  Reuters   
14 2022-04-26 13:05:19+00:00        11  Reuters   
15 2022-04-26 13:01:10+00:00        36  Reuters   
16 2022-04-26 13:00:01+00:00        37  Reuters   
17 2022-04-26 12:56:37+00:00        32  Reuters   
18 2022-04-26 12:55:21+00:00   

In [12]:
df.to_csv('tweets.csv')

In [13]:
# The Twitter API only allows access to the past few weeks of tweets, it does not allow you to go any further.

### Part 2:  Analyze the data

1.	Social Media Data Analysis
2.	Data Visualization
3.	Sentimental analysis
4.	Text mining

### Listing the top 10 most retweeted tweets

In [14]:
#Most liked tweets
most_retweeted = df.loc[df.Retweets.nlargest(19).index]
print(most_retweeted)

                        Time  Retweets     User  \
16 2022-04-26 13:00:01+00:00        37  Reuters   
15 2022-04-26 13:01:10+00:00        36  Reuters   
6  2022-04-26 13:29:04+00:00        35  Reuters   
17 2022-04-26 12:56:37+00:00        32  Reuters   
8  2022-04-26 13:20:19+00:00        29  Reuters   
4  2022-04-26 13:30:10+00:00        21  Reuters   
12 2022-04-26 13:10:07+00:00        18  Reuters   
0  2022-04-26 13:40:19+00:00        13  Reuters   
1  2022-04-26 13:40:00+00:00        11  Reuters   
14 2022-04-26 13:05:19+00:00        11  Reuters   
5  2022-04-26 13:30:00+00:00        10  Reuters   
9  2022-04-26 13:15:21+00:00        10  Reuters   
13 2022-04-26 13:10:00+00:00        10  Reuters   
2  2022-04-26 13:35:20+00:00         9  Reuters   
10 2022-04-26 13:11:57+00:00         8  Reuters   
11 2022-04-26 13:10:19+00:00         8  Reuters   
3  2022-04-26 13:30:22+00:00         7  Reuters   
18 2022-04-26 12:55:21+00:00         7  Reuters   
19 2022-04-26 12:50:18+00:00   

In [15]:
most_retweeted.reset_index(drop=True)

Unnamed: 0,Time,Retweets,User,Tweet
0,2022-04-26 13:00:01+00:00,37,Reuters,"Today, April 26, marks the 36th anniversary of..."
1,2022-04-26 13:01:10+00:00,36,Reuters,Russia's 'victory' in Mariupol turns city's dr...
2,2022-04-26 13:29:04+00:00,35,Reuters,Moscow accused NATO of engaging in a proxy bat...
3,2022-04-26 12:56:37+00:00,32,Reuters,Health authorities around the world are invest...
4,2022-04-26 13:20:19+00:00,29,Reuters,"Karachi blast kills three Chinese, including C..."
5,2022-04-26 13:30:10+00:00,21,Reuters,North Korea will speed up development of its n...
6,2022-04-26 13:10:07+00:00,18,Reuters,A previously unreported account of a 36-hour c...
7,2022-04-26 13:40:19+00:00,13,Reuters,Second Ebola patient dies in northwestern Cong...
8,2022-04-26 13:40:00+00:00,11,Reuters,Russia accuses NATO of creating a serious risk...
9,2022-04-26 13:05:19+00:00,11,Reuters,Beijing to test 20 mln for COVID in bid to ave...


In [16]:
#saving retweets dataframe in CSV
most_retweeted.to_csv("most_retweeted.csv")

## Text Mining

### Obtaining tweets from a specific user accounts- IRCC @CitImmCanada

In [17]:
#Obtaining tweets from specific user accounts
#-In this case, IRCC @CitImmCanada

user = "CitImmCanada"
limit = 300  #the API returns a maximum of 200 tweets per request

tweets_ircc = tweepy.Cursor(api.user_timeline, screen_name=user, count=200, tweet_mode="extended").items(limit)

#tweets_ircc = api.user_timeline(screen_name=user, count=limit, tweet_mode="extended")

#Creating a DataFrame to store the tweets by IRCC
columns1 = ["User", "Tweet"]
data = []

for tweet in tweets_ircc:
    data.append([tweet.user.screen_name, tweet.full_text])
    #print(tweet.full_text)
    
ircc_df = pd.DataFrame(data,columns=columns1)
print(ircc_df)
#print(ircc_df.info())

             User                                              Tweet
0    CitImmCanada  @DenysProd 2/2 For case-specific information, ...
1    CitImmCanada  @DenysProd 1/2 Hi. CUAET electronic visa appli...
2    CitImmCanada  @eireenien 2/2 For case-specific information, ...
3    CitImmCanada  @eireenien 1/2 Hi. CUAET electronic visa appli...
4    CitImmCanada  Our online services were recently unavailable....
..            ...                                                ...
295  CitImmCanada  @Nazir64913160 1/2 Hi. Please check our websit...
296  CitImmCanada  Canada welcomes French-speaking skilled worker...
297  CitImmCanada  The first day of spring marks the beginning of...
298  CitImmCanada  #DYK? There are 14 Welcoming Francophone Commu...
299  CitImmCanada  #WelcomeAfghans: Today is the International Da...

[300 rows x 2 columns]


In [18]:
#saving IRCC tweets in csv
ircc_df.to_csv("ircc_df.csv")

Sentiment analysis # https://www.youtube.com/watch?v=uPKnSq6TaAk
Focus on analyzing:
    1. Most retweeted words
    2. IRCC tweets