# Social Media Analysis through the Twitter API

This detailed video course is available at  https://codingwithmax.teachable.com/courses/ . The course name is "Conquer Social Media through the Twitter API". 

Note:coded in python 3.0.
Remember to run it step-by-step. As the tutorial is in one flow.
In every step some of the code is changed. To track the change code look for the comment "###change####" and then follow below that line.

This is basic level tutorial for those we want some analysis on twitter data. this will give you the practical idea of how can you play with different API's to extract your data from twitter.
This tutorial covers below lectures:

1) sending first request to Twitter API

2) Getting Tweets from Specific time

3) Getting all tweets by moving backward in time

4) Filtering for english tweets and picking keywords

5) Identifying relevant tweets and keeping track of data

6) Plotting the mined data from twitter

7) Adjusting the maximum time and adding ticks to the graph

8) Streaming live twitter data

These libraries required in this tutorial can be found at these links.

http://docs.python-requests.org/en/master/

https://github.com/requests/requests-oauthlib

https://github.com/bear/python-twitter

To know more about installing python packages Please click on below link

https://packaging.python.org/tutorials/installing-packages/

### Sending first request to Twitter API
First we will import the required libraries. Then we will make our URL query for sending it to the user. Authentication is required to access server and forward your query so that the server can give you back something. Four types of secret codes are required for authentication process. These secret codes can be obtained by creating app on https://apps.twitter.com/. To create app on twitter click on the link. These secret code can be obtained by first logging in to your twitter account.
After getting those codes we set make our authention variable with the help of request_oauth library. Both URL and authentication is passed as request to server.if everything goes well the server respond to us and gives us the information requested against the  query in the URL. 


In [None]:
#allow us to send HTTP msg to twitter api
import requests

#allow us to send the authentication with the request
import requests_oauthlib as ra

#here in query variable we are setting what we want to send to server
query="earthquake"
#looking for the tweets contaning word stored in query
URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query)

In [None]:
#we get these credentials when we create an app on twitter
twitter_consumer_key="XXXXXXXXXXXXXXXXXXXXXXXXXXXXF"
twitter_consumer_secret="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

twitter_access_token="XXXXXXX_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
twitter_access_secret="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"


#setting our authentication to pass it with the request
authen= ra.OAuth1(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)



Request allow us to send HTTP request. we send get request which contain URL and authentication.
so here we are sending query of "earthquake".
After sendng the request the twitter will give us back the latest tweets containing keyword "earthquake"
if the print statement gives response 200 it means everthing is perfectly working. if response is 401 it means something is wrong with authentication keys. basically 400 response means something is wrong on your side. 500 response means something is wrong on server side. To know more about the response click on this link
http://www.restapitutorial.com/httpstatuscodes.html


In [None]:
backresponse=requests.get(URL,auth=authen)

print(backresponse)

## Dealing with Data Recieved in Response
Here we will discuss that what kind of data we get back in response and how can we parse the data to get our required data.

The response which we get is in JSON format. lets print it and have a look at it.
it contains alot of information including the tweet text, time, profile information, hashtag used, retweet information etc.
the data is in form of dictionary and arrays. 

In [None]:
print(backresponse.json())

### Number of tweets to retrieve
Keep in mind that we can extract upto 100 in single get request.Above data may contains information of multiple tweets. lets retrieve only one tweet so that we can futher study what type of information it contains. To notify the number of tweets we want to retrive in single request we will send parameter "count" in our URL to the server.

In [None]:
query="earthquake"

####changes#####

#the count variable will control the number if tweets we want to retieve, here we will set it to 1.
count="1"
#looking for sigle tweet contaning word stored in query
URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&count="+count)

###below is the same code which we have earlier used#####

authen= ra.OAuth1(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)
backresponse=requests.get(URL,auth=authen)

print(backresponse.json())

### Extracting tweet text and other information
Now we will go deep into dictionary and extract the text of tweet. we can extract any type of information related to this tweet. Here we will extraxt the tweet text, hastags and id of the tweet.


In [None]:
#######below is the same code which we have earlier used#####

query="earthquake"
count="1"
URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&count="+count)

authen= ra.OAuth1(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)
backresponse=requests.get(URL,auth=authen)
tweetInformation=backresponse.json()

###changes###

###getting Tweet####
# we get Array against the key "statuses", then in that array we extracted the dictionary at zero index and then further
#in that dictionary we searched for value against the key "text".
tweet = tweetInformation["statuses"][0]["text"]
print(tweet)

####Getting HashTags#####
#to get hashtag we get Array against the key "statuses", then in that array we extracted the dictionary at zero index and then further
#in that dictionary we searched for value against the key "entities" then further went deep to get array against key "hashtags" then 
#took the zero index of that Array and find against the key "text"
hashtagsArray= tweetInformation["statuses"][0]["entities"]["hashtags"]
#to confirm whether there are hashtags or not we will further check whether the array contain something or its empty
if (len(hashtagsArray)>0):
    hastags=hashtagsArray[0]["text"]
    print(hashtags)
else:
    print("No hastags found")
    
### Getting Tweet id####
#every tweet has its own unique id
id=tweetInformation["statuses"][0]["id"]

print(id)

### Tweet between specific interval of time
we will insert "since" and "until" in the url to get tweet of specific time interval. 

In [None]:

#######below is the same code which we have earlier used#####
query="eartquake"
count="1"

####changes######

#in since variable we are storing the date from where exatcly we want to start
since="2018-02-05"
#in untill variable we are storing ending date or the date untill we want to extract
until="2018-02-08"
#Looking for tweets between the specified interval
URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&since="+since+"&until="+until+"&count="+count)

#######below is the same code which we have earlier used#####

authen= ra.OAuth1(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)
backresponse=requests.get(URL,auth=authen)
tweetInformation=backresponse.json()
print(tweetInformation)


### we have already seen that max_id is unique id of every tweet. so can we get all the tweets before or after that id
we will make changes to the url which we are sending in our request. we will include max_id parameter in the url.

In [None]:

#######below is the same code which we have earlier used#####
query="earthquake"
count="100"
since="2018-02-05"

####changes####

#in max_id we are storing id of the ending tweet
max_id='961335265041272832'
#Looking for tweets between the starting time and maximum tweet id
URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&since="+since+"&max_id="+max_id+"&count="+count)

#######below is the same code which we have earlier used#####

authen= ra.OAuth1(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)
backresponse=requests.get(URL,auth=authen)
tweetInformation=backresponse.json()
print(tweetInformation)


### Getting all tweets by moving backward in time
As we have seen above that we can extract only 100 tweets in one request. Now we want to change our URL information dynamically so that we can extract all the tweets in a time frame. we will get the id of the last tweet in 100 tweet bundle and then again send a new request to server with this new information. In that way we will move backward in time to extract 100 tweets each time. The while loop will go on untill we have nill information response. Inside the while loop there is for loop. This for loop is used for unpacking the 100 tweets package and extract all the the 100 tweets from that json response.

In [None]:
###below is the same code in earlier step###
query="earthquake"
count="100"
since="2018-02-05"
max_id='961335265041272832'
URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&since="+since+"&max_id="+max_id+"&count="+count)

authen= ra.OAuth1(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)
backresponse=requests.get(URL,auth=authen)
tweetInformation=backresponse.json()


###changes###

## the loop will go on untill we have nill information in json i.e no tweets
while (len(tweetInformation)!=0):
    #getting the information against key "statuses" from the json response
    alltweets=tweetInformation["statuses"]
    #As we have set count to 100 so this loop will give us all the tweets we get in a single get request
    for eachtweet in range(len(alltweets)):
        #accessing tweets same as above 
        tweet = alltweets[eachtweet]["text"]
        print(tweet)
        #accessing id
        max_id=alltweets[eachtweet]["id_str"]
        print(max_id)
    
    #we want to print date and time of every 100th tweet    
    print(alltweets[eachtweet]["created_at"])    
    #dynamically changing our URL
    URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&since="+since+"&max_id="+max_id+"&count="+count)
    #inserted the try and except block so that our program does not give an error, if their is something wrong it will break the
    #loop and end the program.
    try:
        #sending get request with new id
        backresponse=requests.get(URL,auth=authen)
        #gettng the new json data of the new request
        tweetInformation=backresponse.json()
    except:
        break
        

### Filtering language specific tweets and picking out keywords
The json data which is returned to us contain language information. The key used for language is "lang". we can filter a language specific tweets. Here below we will filter out all english tweets.  More over we also look for specific keywords "Taiwan" in a tweet. An earthquake occured on 6 feb 2018 in taiwan. so we want to cover data of that.

In [None]:
###below is the same code in earlier step###
query="earthquake"
count="100"
since="2018-02-05"
max_id='961335265041272832'
URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&since="+since+"&max_id="+max_id+"&count="+count)

authen= ra.OAuth1(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)
backresponse=requests.get(URL,auth=authen)
tweetInformation=backresponse.json()

####changes####
#we will be looking for tweets having taiwan. so basically we want to find all those tweets in earthquake of taiwan is mentioned
keywords=["Taiwan","taiwan"]


###below is the same code in earlier step###
while (len(tweetInformation)!=0):
    alltweets=tweetInformation["statuses"]
    for eachtweet in range(len(alltweets)):
        tweet = alltweets[eachtweet]["text"]
        #print(tweet)
        max_id=alltweets[eachtweet]["id_str"]
        #print(max_id)
        
        
        ###changes###
        
        #gettng language information
        lang=alltweets[eachtweet]["lang"]
        #filtering tweets with language english
        if (lang=="en"):
            #checking all the keywords
            for word in keywords:
                #extracting only those tweet containing taiwan.
                if word in tweet:
                    print("Filtered tweet : " +tweet)
                    #break out of the iterating keyword loop as keyword is found
                    break
                    
            
        
    
    ###below is the same code in earlier step###
    
    print(alltweets[eachtweet]["created_at"])    
    URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&since="+since+"&max_id="+max_id+"&count="+count)
    try:
        backresponse=requests.get(URL,auth=authen)
        tweetInformation=backresponse.json()
    except:
        break
        

### identifying relevant tweets and keeping track of data
Basically here we want to know how many tweets and which tweets has been done at a certain hour.For that we will make three dictionary variables to keep track of these. first variable "tweet_count_hour" is a dictionary of number of tweets in that hour. second variable "tweet_hour" is dictionary of all tweets in that hour. Third variable "all_hour" is an array of all the hours at which tweeets occured.
As we want to keep track of every hour data so we will extract minute time of every tweet. 

In [None]:
###below is the same code in earlier step###
query="earthquake"
count="100"
since="2018-02-05"
max_id='961335265041272832'
URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&since="+since+"&max_id="+max_id+"&count="+count)

authen= ra.OAuth1(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)
backresponse=requests.get(URL,auth=authen)
tweetInformation=backresponse.json()

####changes####
#"tweet_count_hour" is a dictionary of number of tweets in each hour
tweet_count_hour={}
#"tweet_hour" is dictionary of all tweets in each hour
tweet_hour={}
#"all_hour" is an array of all the hour at which tweeets occured
all_hour=[]

###below is the same code in earlier step###
keywords=["Taiwan","taiwan"]

while (len(tweetInformation)!=0):
    alltweets=tweetInformation["statuses"]
    for eachtweet in range(len(alltweets)):
        tweet = alltweets[eachtweet]["text"]
        #print(tweet)
        max_id=alltweets[eachtweet]["id_str"]
        #print(max_id)
        
        ###changes###
        
        #the date stamp is in format "Thu Feb 08 15:59:00" we dont need second, minute, day and month information so we will skip that. only first
        #the indices [8:13] will be taken for date and hour information. 
        currentTime=alltweets[eachtweet]["created_at"][8:13]
        
        
        ###below is the same code in earlier step###
        
        lang=alltweets[eachtweet]["lang"]
        if (lang=="en"):
            for word in keywords:
                if word in tweet:
                    if word in tweet:
                        #print("Filtered tweet : " +tweet)
                                   
                        ###changes###
                        
                        #if current time is in tweet_count_hour we will increment the value against that time. means that another tweet is
                        #tweeted at that time. in this way we will group tweets by every hour
                        if currentTime in tweet_count_hour:
                            tweet_count_hour[currentTime]+=1 
                            #if at the hour stamp key a tweet is already present and another tweet is found at same time so we 
                            #we will append it at that array.
                            tweet_hour[currentTime].append(tweet)
                        #if current time is not in tweet_count_hour it means that time is not in dictioary so we will add it
                        else:
                            tweet_count_hour[currentTime]=1
                            #here we are saving the first tweet at that hour. the tweet is saved as array becuase may be more
                            #than one tweet may have occured in that hour.
                            tweet_hour[currentTime]=[tweet]
                            all_hour.append(currentTime)
                        #break out of the iterating keyword loop as keyword is found
                        break
                    
    #here we print all the three variables and see whats in these variables
    print(tweet_hour)
    print(tweet_count_hour)
    print(all_hour)
    #inserting this break just to see only data only after 100 tweets and then stop it
    break
    
            
        
    
    ###below is the same code in earlier step###
    
    print(alltweets[eachtweet]["created_at"])    
    URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&since="+since+"&max_id="+max_id+"&count="+count)
    try:
        backresponse=requests.get(URL,auth=authen)
        tweetInformation=backresponse.json()
        alltweets=tweetInformation["statuses"]
    except:
        break
        

### Plotting the number of tweets against a counter
at Feb 6, 11:50 PM earthquake occured in taiwan. we want to draw a graph that how many tweets occured about that earthquake in these these days. we need import for plotting.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

###below is the same code in earlier step###
query="earthquake"
count="100"
since="2018-02-04"
max_id='961335265041272832'
URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&since="+since+"&max_id="+max_id+"&count="+count)

authen= ra.OAuth1(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)
backresponse=requests.get(URL,auth=authen)
tweetInformation=backresponse.json()

tweet_count_hour={}
tweet_hour={}
all_hour=[]
keywords=["Taiwan","taiwan"]
while (len(tweetInformation)!=0):
    #the try and except is just if something went wrong
    try:
        alltweets=tweetInformation["statuses"]
    except:
        print ("No data retrieved Run again after 15 minutes")
        
    for eachtweet in range(len(alltweets)):
        tweet = alltweets[eachtweet]["text"]
        #print(tweet)
        max_id=alltweets[eachtweet]["id_str"]
        #print(max_id)
        currentTime=alltweets[eachtweet]["created_at"][9:13]
        
        lang=alltweets[eachtweet]["lang"]
        if (lang=="en"):
            for word in keywords:
                if word in tweet:
                    if word in tweet:
                        #print("Filtered tweet : " +tweet)     
                        if currentTime in tweet_count_hour:
                            tweet_count_hour[currentTime]+=1 
                            tweet_hour[currentTime].append(tweet)
                        else:
                            tweet_count_hour[currentTime]=1
                            tweet_hour[currentTime]=[tweet]
                            all_hour.append(currentTime)
                        break
                    
   
    #print(tweet_hour)
    #print(tweet_count_hour)
    #print(all_hour)
    
    #print(alltweets[eachtweet]["created_at"])    
    URL= ("https://api.twitter.com/1.1/search/tweets.json?q="+query+"&since="+since+"&max_id="+max_id+"&count="+count)
    try:
        backresponse=requests.get(URL,auth=authen)
        tweetInformation=backresponse.json()
        alltweets=tweetInformation["statuses"]
    except:
        break
        
  ###########   Changes   ###############

#this variable is for saving all the tweets that occured in one hour into array
tweet_counts=[]
#these two variables are just for creating the x-axis ticks label from counter
tickcount=0
x_ax=[]
#this variable will save orignal time so that we can assign it to ticks
time_ticks=[]

#the time will be appended in backward way
for t in all_hour :
    #appending the value against each hour key 
    tweet_counts.append(tweet_count_hour[t])
    #for overwriting the xticks counter
    time_ticks.append(t)
     #appending counter to generate x-axis ticks
    x_ax.append(tickcount)
    tickcount+=1
    
plt.figure(figsize=(15,10))
#plotting the number of tweets in one hour against counter value
plt.plot(x_ax,  tweet_counts,label="Earthquake", c="red")
#changing the x-axis counter ticks into corresponding time 
plt.xticks(x_ax,all_hour)
#setting the figure size

#setting a legend for the label window
plt.legend()
#showing the graph
plt.show()
        

## Live Data from Twitter
we can get live data from twitter by using twitter-python api. the link is already given at the top. To do anything through Twitter api read the documentation http://python-twitter.readthedocs.io/en/latest/twitter.html

In [None]:
import twitter
#accessing the api class of twitter library. we pass our credentials for authentication
api= twitter.Api(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)
keywords=["Taiwan","taiwan","earthquake","Earthquake"]
#the GetStreamFilter method of api class return public real time tweets. for more information read the api documentation.
for eachtweet in api.GetStreamFilter(track=keywords):
    print(eachtweet)
    break

for getting the exact text we need to mention the "text" 

for more information consult the twitter api documentation

In [None]:
import twitter
api= twitter.Api(twitter_consumer_key,twitter_consumer_secret,twitter_access_token,twitter_access_secret)
keywords=["Trump","donald"]
for eachtweet in api.GetStreamFilter(track=keywords):
    print(eachtweet["text"])
    break