## Twitter data analysis

Two projects are built using Twitter API:
1. Extracting trending hashtags and their volume based on location. ALso, visualising the appearance of trending hashtags at different locations.

2. Extracting and visulaising the locations where a specific keyword or hashtag is trending, to capture the popularity or awareness of any specific thing (e.g. product, policy, agenda, technology etc)

### I. Import libraries

In [6]:
import tweepy

In [None]:
pip install tweepy

In [3]:
import pandas as pd
import numpy as np
from pandas import DataFrame,Series 
import matplotlib.pyplot as plt
import seaborn as sns

### II. Connect with twitter
#### Input these information: 
 - access_token="enter value here"
 - access_token_secret="enter value here"
 - api_key="enter value here"
 - api_key_secret="enter value here"

In [4]:
access_token=""
access_token_secret=""
api_key=""
api_key_secret=""

In [7]:
auth=tweepy.OAuthHandler(consumer_key=api_key,consumer_secret=api_key_secret)
auth.set_access_token(access_token,access_token_secret)
api=tweepy.API(auth)

## *Project 1 :*

### 1. Get where on earth id (woeid)

In [118]:
woeid_csv=pd.read_csv('...\geoplanet_places_7.10.0.csv',sep='\t')
woeid_country=woeid_csv[woeid_csv['PlaceType']=='Country']

### 2. Get trending hashtags form a specific location
       - api.get_place_trend(woeid) gives the trending tweets in JSON format
       - Trending hashtags and number of its appearance (Volume) at a specific location were extracted
       - Trending hashtags of India, USA, France, UK and Ausralia were extracted

In [119]:
country_info=woeid_country[woeid_country['Name'].isin(['India','United States','France','United Kingdom','Australia'])]
hashtags_all=DataFrame()
for x in range(len(country_info)):
   woeid=str(country_info.iloc[x,0])
   country_name=str(country_info.iloc[x,2])
   trend_result=api.get_place_trends(woeid)
   trendlist1=[]
   trendlist2=[]
   for trend in trend_result[0]["trends"][:10]:
        ##print(trend["name"])
        ##print(trend["tweet_volume"])
        trendlist1.append(trend["name"])
        trendlist2.append(trend["tweet_volume"])
        hashtags=DataFrame(list(zip(trendlist1,trendlist2)),columns=['hashtag','volume'])
        hashtags.sort_values(by='volume',ascending=False,inplace=True)
        hashtags.insert(1,'country',country_name)
   hashtags_all=hashtags_all.append(hashtags)

### 3. Get common hashtags

In [129]:
duplicate_hashtag=hashtags_all[hashtags_all.duplicated('hashtag',keep = False)]
duplicate_hashtag

### 4. Plot trending common hashtags and their corresponding volume

In [None]:
plt.figure(figsize=(8,5))
plt.xticks(rotation=0)
sns.barplot(data=duplicate_hashtag,x='hashtag',y='volume',hue='country')
plt.xlabel('Hashtag')
plt.ylabel('Volume')
plt.legend( loc='upper right',title='Country')
plt.show()

## *Project 2 :*

### 1. Get the location of the trending keyword 
       - Extract the location of tweets and the 
       - This query searches for 100 recent popular tweets that contain the input word

In [125]:
choice=input('enter the keyword:')
tweets=tweepy.Cursor(api.search_tweets,q=choice,result_type='popular').items(100)
tweet_loc=[]
totaltweets=[]
for tweet in tweets:
    tweet_loc.append(tweet.user.location)
tweet_loc_df=DataFrame((tweet_loc),columns=['location'])


enter the keyword:cancelo


### 2. Get the locations in a dataframe and visualise using bar plot
       - Replace rows with no value
       - Group the rows of dataframe based on the number of times a keyword has appeared at a specific location 
       - Sort the dataframe based on the tweet count (highest to lowest) and plot top 30 locations using seaborn

In [130]:
tweet_loc_df['location'].replace('', np.nan, inplace=True)
tweet_loc_df.dropna()

In [127]:
tweet_loc_df['Counts'] = tweet_loc_df.groupby('location')['location'].transform('count')
tweet_loc_vol=tweet_loc_df.sort_values(by='Counts', ascending=False).iloc[0:30,:]
tweet_loc_trend=tweet_loc_vol.drop_duplicates()

In [None]:
plt.figure(figsize=(8,4))
plt.xticks(rotation=90)
sns.barplot(x='location',y='Counts',data=tweet_loc_trend,palette ='plasma')
plt.xlabel('Location')
plt.ylabel('Counts')
plt.show()