# Twitter Pulse Checker

![preview](https://cdn.pixabay.com/photo/2013/06/07/09/53/twitter-117595_960_720.png)

This is a quick and dirty way to get a sense of what's trending on Twitter related to a particular Topic. For my use case, I am focusing on the city of Seattle but you can easily apply this to any topic.

**Use the GPU for this notebook to speed things up:** select the menu option "Runtime" -> "Change runtime type", select "Hardware Accelerator" -> "GPU" and click "SAVE".

The code in this notebook does the following things:


*   Scrapes Tweets related to the Topic you are interested in.
*   Extracts relevant Tags from the text (NER: Named Entity Recognition).
*   Does Sentiment Analysis on those Tweets.
*   Provides some visualizations in an interactive format to get a 'pulse' of what's happening.

We use Tweepy to scrape Twitter data and Flair to do NER / Sentiment Analysis. We use Seaborn for visualizations and all of this is possible because of the wonderful, free and fast (with GPU) Google Colab.

**A bit about NER (Named Entity Recognition)** 

This is the process of extracting labels form text. 

So, take an example sentence: 'George Washington went to Washington'. NER will allow us to extract labels such as Person for 'George Washington' and Location for 'Washington (state)'. It is one of the most common and useful applications in NLP and, using it, we can extract labels from Tweets and do analysis on them.

**A bit about Sentiment Analysis** 

Most commonly, this is the process of getting a sense of whether some text is Positive or Negative. More generally, you can apply it to any label of your choosing (Spam/No Spam etc.).

So, 'I hated this movie' would be classified as a negative statement but 'I loved this movie' would be classified as positive. Again - it is a very useful application as it allows us to get a sense of people's opinions about something (Twitter topics, Movie reviews etc). 

To learn more about these applications, check out the Flair Github homepage and Tutorials: https://github.com/zalandoresearch/flair


Note: You will need Twitter API keys (and of course a Twitter account) to make this work. You can get those by signing up here: https://developer.twitter.com/en/apps

To get up and running, we need to import a bunch of stuff and install Flair. Run through the next 3 cells.

In [24]:
# import lots of stuff
import sys
import os
import re
import tweepy
from tweepy import OAuthHandler
from textblob import TextBlob


import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from IPython.display import clear_output
from tqdm import tqdm

import matplotlib.pyplot as plt
import seaborn as sns
% matplotlib inline

from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS




UsageError: Line magic function `%` not found.


In [25]:
# install Flair
!pip install --upgrade git+https://github.com/flairNLP/flair.git

clear_output()


  Cloning https://github.com/flairNLP/flair.git to c:\users\v6\appdata\local\temp\pip-req-build-1kccsg81
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Building wheels for collected packages: flair
  Building wheel for flair (PEP 517): started
  Building wheel for flair (PEP 517): finished with status 'done'
  Created wheel for flair: filename=flair-0.6.0.post1-py3-none-any.whl size=202216 sha256=8a22bb7d23d423da71bae73be841606feb4855c1b179b087907a476a55f4b891
  Stored in directory: C:\Users\V6\AppData\Local\Temp\pip-ephem-wheel-cache-c377ncf2\wheels\06\ac\0f\9e4e54611b479a240de1fadcdf1ebd44e19b7a5bfc9e27973c
Successfully built flair
Installing collected packages: flair
  Attempting uninstall: flair
    Found existing instal

In [26]:
# import Flair stuff
from flair.data import Sentence
from flair.models import SequenceTagger

tagger = SequenceTagger.load('ner')

clear_output()

In [27]:
#import Flair Classifier
from flair.models import TextClassifier

classifier = TextClassifier.load('en-sentiment')

clear_output()

### Authenticate with Twitter API

In [28]:
#@title Enter Twitter Credentials
TWITTER_KEY = 'HTi8RVQq3LczryTEyESQBdqNk' #@param {type:"string"}
TWITTER_SECRET_KEY = '5VYQAWZTLuBs2p9mbLmkoGvCg7iR0vnXvKMdEDyEGcSKS5jcQk' #@param {type:"string"}

In [29]:
# Authenticate
auth = tweepy.AppAuthHandler(TWITTER_KEY, TWITTER_SECRET_KEY)

api = tweepy.API(auth, wait_on_rate_limit=True,
				   wait_on_rate_limit_notify=True)

if (not api):
    print ("Can't Authenticate")
    sys.exit(-1)




In [30]:
user = api.get_user('GillAtkinson11')
user._json


{'id': 1151131792482553856,
 'id_str': '1151131792482553856',
 'name': 'Gill Atkinson',
 'screen_name': 'GillAtkinson11',
 'location': 'Abuja, Nigeria',
 'profile_location': {'id': '00e55e2b4c491c5f',
  'url': 'https://api.twitter.com/1.1/geo/id/00e55e2b4c491c5f.json',
  'place_type': 'unknown',
  'name': 'Abuja, Nigeria',
  'full_name': 'Abuja, Nigeria',
  'country_code': '',
  'country': '',
  'contained_within': [],
  'bounding_box': None,
  'attributes': {}},
 'description': 'British Deputy High Commissioner in Abuja (stuck in 🇬🇧 Still ❤️🇳🇬 and will be back)',
 'url': None,
 'entities': {'description': {'urls': []}},
 'protected': False,
 'followers_count': 1196,
 'friends_count': 612,
 'listed_count': 7,
 'created_at': 'Tue Jul 16 14:09:41 +0000 2019',
 'favourites_count': 6315,
 'utc_offset': None,
 'time_zone': None,
 'geo_enabled': False,
 'verified': False,
 'statuses_count': 1511,
 'lang': None,
 'status': {'created_at': 'Thu Sep 17 15:39:43 +0000 2020',
  'id': 1306618845352

###Lets start scraping!

The Twitter scrape code here was taken from: https://bhaskarvk.github.io/2015/01/how-to-use-twitters-search-rest-api-most-effectively.

My thanks to the author.

We need to provide a Search term and a Max Tweet count. Twitter lets you to request 45,000 tweets every 15 minutes  so setting something below that works.

In [31]:
#@title Twitter Search API Inputs
#@markdown ### Enter Search Query:
searchQuery = 'Karachi' #@param {type:"string"}
#@markdown ### Enter Max Tweets To Scrape:
#@markdown #### The Twitter API Rate Limit (currently) is 45,000 tweets every 15 minutes.
maxTweets = 100 #@param {type:"slider", min:0, max:45000, step:100}
Filter_Retweets = True #@param {type:"boolean"}

tweetsPerQry = 100  # this is the max the API permits
tweet_lst = []

if Filter_Retweets:
  searchQuery = searchQuery + ' -filter:retweets'  # to exclude retweets

# If results from a specific ID onwards are reqd, set since_id to that ID.
# else default to no lower limit, go as far back as API allows
sinceId = None

# If results only below a specific ID are, set max_id to that ID.
# else default to no upper limit, start from the most recent tweet matching the search query.
max_id = -10000000000

tweetCount = 0
print("Downloading max {0} tweets".format(maxTweets))
while tweetCount < maxTweets:
    try:
        if (max_id <= 0):
            if (not sinceId):
                new_tweets = api.search(q=searchQuery, count=tweetsPerQry, lang="en")
            else:
                new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                        lang="en", since_id=sinceId)
        else:
            if (not sinceId):
                new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                        lang="en", max_id=str(max_id - 1))
            else:
                new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                        lang="en", max_id=str(max_id - 1),
                                        since_id=sinceId)
        if not new_tweets:
            print("No more tweets found")
            break
        for tweet in new_tweets:
          if hasattr(tweet, 'reply_count'):
            reply_count = tweet.reply_count
          else:
            reply_count = 0
          if hasattr(tweet, 'retweeted'):
            retweeted = tweet.retweeted
          else:
            retweeted = "NA"
            
          # fixup search query to get topic
          topic = searchQuery[:searchQuery.find('-')].capitalize().strip()
          
          # fixup date
          tweetDate = tweet.created_at.date()

    
          
          tweet_lst.append([tweetDate, topic, 
                      tweet.id, tweet.user.screen_name, tweet.user.name, tweet.user.followers_count, tweet.text, tweet.favorite_count, 
                    reply_count, tweet.retweet_count, retweeted, tweet.user.created_at, tweet.user.verified, tweet.user.location, tweet.user.statuses_count])
        
        tweetCount += len(new_tweets)
        print("Downloaded {0} tweets".format(tweetCount))
        max_id = new_tweets[-1].id
    except tweepy.TweepError as e:
        # Just exit if any error
        print("some error : " + str(e))
        break


print("Downloaded {0} tweets".format(tweetCount))



Downloading max 100 tweets
Downloaded 100 tweets
Downloaded 100 tweets


##Data Sciencing

Let's load the tweet data into a Pandas Dataframe so we can do Data Science to it. 

The data is also saved down in a tweets.csv file in case you want to download it.

In [32]:
pd.set_option('display.max_colwidth', None)

# load it into a pandas dataframe
tweet_df = pd.DataFrame(tweet_lst, columns=['tweet_dt', 'topic', 'id', 'username', 'name', 'followers', 'tweet', 'like_count', 'reply_count', 'retweet_count', 'retweeted', 'account_start_date', 'verified_user', 'location', 'number_of_tweets'])

tweet_df.head(10)

Unnamed: 0,tweet_dt,topic,id,username,name,followers,tweet,like_count,reply_count,retweet_count,retweeted,account_start_date,verified_user,location,number_of_tweets
0,2020-09-17,Karachi,1306739732869795851,UnivalesCom,Univales.com,189,"When will @reportpemra wake up to take strict notice against Nida Yasir, the host of a morning show on ARY? She inv… https://t.co/JeTFRwZ80o",0,0,0,False,2019-10-28 05:04:09,False,,2401
1,2020-09-17,Karachi,1306738129982889990,benishaan195,Baji,12573,"@HarounRashid2 They are also gone from Karachi Saddar, very few you see are cladded in Shalwar kameez",0,0,0,False,2012-03-08 04:47:38,False,We are also humans!!,141878
2,2020-09-17,Karachi,1306737197555482624,Diplomat_APAC,The Diplomat,166243,Recent anti-Shia demonstrations in Karachi raise uncomfortable questions about the possible involvement of the Paki… https://t.co/zm62W6LonL,0,0,0,False,2009-05-14 04:01:35,True,"Tokyo, Japan",83911
3,2020-09-17,Karachi,1306737172729470977,MaqsoodAsi,Maqsood Asi,4124,What Role Does the State Play in #Pakistan ’s Anti-Shia Hysteria?\n\nRecent anti-Shia demonstrations in #Karachi rais… https://t.co/1EwQLlqZbp,0,0,1,False,2013-03-22 21:51:52,False,"Oslo, Norge",253311
4,2020-09-17,Karachi,1306736836358926342,Tweeterist_,Tweeterist,556,#Karachi restaurants boycott delivery service @foodpanda_pk over commission policy https://t.co/so5c75N6sU,0,0,0,False,2018-02-08 11:54:03,False,Far Away,46138
5,2020-09-17,Karachi,1306736541163634688,TariqAKhan1905,Tariq Ali Khan,2860,My comments irritates those who are running on government funding!!! Because that's why they are paid. I can't igno… https://t.co/NWm0QbEjoo,0,0,0,False,2017-12-23 23:05:34,False,,62436
6,2020-09-17,Karachi,1306736316504305664,forster_keith,keith forster,892,@JLcab74 @JOEPUBLIC20171 Khan will win it.\nHe's made London the new Karachi and the appreciative postal voters will… https://t.co/frAM6HQ4Wi,0,0,0,False,2013-11-25 18:29:27,False,Lancashire,11870
7,2020-09-17,Karachi,1306735921908379650,whatsufiawhat,صوفیہ,381,She has learned ✨ Karachi ✨ https://t.co/tiq6z6FRla,0,0,0,False,2019-08-17 14:27:43,False,"in your head, rent free.",2631
8,2020-09-17,Karachi,1306734472918949895,am1halal,papi and 69 others,6,😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔karachi😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔i hate you😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔😔,3,0,0,False,2020-05-31 19:47:26,False,,88
9,2020-09-17,Karachi,1306734325417865217,apniwebcompk,Apni Web News,133,Rescue Operation in Karachi by PAK Army | Headlines 10 AM | 26 August 2020 | Express News | EN1 https://t.co/13oCb97WYF,0,0,0,False,2017-08-12 04:16:50,False,Pakistan,127065


Unfortunately Twitter does not let you filter by date when you request tweets. However, we can do this at this stage. I have set it up to pull yesterday + todays Tweets by default.

In [33]:
#@title Filter By Date Range
today = datetime.now().date()
yesterday = today - timedelta(1)

start_dt = '2020-09-01' #@param {type:"date"}
end_dt = '2020-09-14' #@param {type:"date"}

if start_dt == '':
  start_dt = yesterday
else:
  start_dt = datetime.strptime(start_dt, '%Y-%m-%d').date()

if end_dt == '':
  end_dt = today
else:
  end_dt = datetime.strptime(end_dt, '%Y-%m-%d').date()


tweet_df = tweet_df[(tweet_df['tweet_dt'] >= start_dt) 
                    & (tweet_df['tweet_dt'] <= end_dt)]
tweet_df.shape


(0, 15)

## NER and Sentiment Analysis

Now let's do some NER / Sentiment Analysis. We will use the Flair library: https://github.com/zalandoresearch/flair

###NER

Previosuly, we extracted, and then appended the Tags as separate rows in our dataframe. This helps us later on to Group by Tags.

We also create a new 'Hashtag' Tag as Flair does not recognize it and it's a big one in this context.

### Sentiment Analysis

We use the Flair Classifier to get Polarity and Result and add those fields to our dataframe.

**Warning:** This can be slow if you have lots of tweets.

In [34]:
# predict NER
nerlst = []

for index, row in tqdm(tweet_df.iterrows(), total=tweet_df.shape[0]):
  cleanedTweet = row['tweet'].replace("#", "")
  sentence = Sentence(cleanedTweet, use_tokenizer=True)
  
  # predict NER tags
  tagger.predict(sentence)

  # get ner
  ners = sentence.to_dict(tag_type='ner')['entities']
  
  # predict sentiment
  classifier.predict(sentence)
  
  label = sentence.labels[0]
  response = {'result': label.value, 'polarity':label.score}
  
  # get hashtags
  hashtags = re.findall(r'#\w+', row['tweet'])
  if len(hashtags) >= 1:
    for hashtag in hashtags:
      ners.append({ 'type': 'Hashtag', 'text': hashtag })
  
  for ner in ners:
    adj_polarity = response['polarity']
    if response['result'] == 'NEGATIVE':
      adj_polarity = response['polarity'] * -1
    try:
      ner['type']
    except:
      ner['type'] = ''      
    nerlst.append([ row['tweet_dt'], row['topic'], row['id'], row['username'], 
                   row['name'], row['followers'], row['tweet'], ner['type'], ner['text'], response['result'], 
                   response['polarity'], adj_polarity, row['like_count'], row['reply_count'], 
                  row['retweet_count'], row['account_start_date'], row['verified_user'], row['location'], row['number_of_tweets'] ])

clear_output()

Let's filter out obvious tags like #Seattle that would show up for this search. You can comment this portion out or use different Tags for your list.

In [35]:
df_ner = pd.DataFrame(nerlst, columns=['tweet_dt', 'topic', 'id', 'username', 'name', 'followers', 'tweet', 'tag_type', 'tag', 'sentiment', 'polarity', 
                                      'adj_polarity', 'like_count', 'reply_count', 'retweet_count', 'account_start_date', 'verified_user', 'location', 'number_of_tweets'])



In [36]:
def days_before(date):
  if date and date != 0 or date != '0':
      date = str(date)
      date = date.split()[0]
      now = str(datetime.now())
      now = now.split()[0]
      d1 = datetime.strptime(date, "%Y-%m-%d")
      d2 = datetime.strptime(now, "%Y-%m-%d")
      return abs((d2 - d1).days)
  else:
    return ''





In [37]:
#This cell does main data analysis
df_ner_1 = df_ner
#Removes all the duplicate tweets
df_ner_1 = df_ner_1.drop_duplicates(subset = 'id')
df_ner_1 = df_ner_1.reset_index(drop = True)
del df_ner_1['polarity']
df_ner_1['mean polarity'] = df_ner_1['adj_polarity'].mean()
df_ner_1['polarity standard deviation'] = df_ner_1['adj_polarity'].std(axis = 0, ddof = 0)
df_ner_1['average daily tweets'] = 0
df_ner_1 = df_ner_1[['tweet_dt',	'topic',	'id',	'username',	'name',	'followers',	'tweet',	'tag', 'tag_type',	'average daily tweets', 'sentiment', 'adj_polarity',	'mean polarity',	'polarity standard deviation', 'like_count',	'reply_count',	'retweet_count', 'account_start_date', 'verified_user', 'location', 'number_of_tweets']]


df_ner_1['account_start_date'] = df_ner_1['account_start_date'].apply(days_before)
df_ner_1['average daily tweets'] = df_ner_1['number_of_tweets']/df_ner_1['account_start_date']
del df_ner_1['account_start_date']

df_ner_1.rename(columns = {'followers': 'number of followers', 'adj_polarity': 'polarity', 'number_of_tweets': 'total number of tweets'}, inplace = True)
# df_ner_1 = df_ner_1.sort_values(by = 'average daily tweets', ascending=False)
df_ner_1.tail()

Unnamed: 0,tweet_dt,topic,id,username,name,number of followers,tweet,tag,tag_type,average daily tweets,sentiment,polarity,mean polarity,polarity standard deviation,like_count,reply_count,retweet_count,verified_user,location,total number of tweets


In [38]:
#Saves as csv
file_name = 'karachi24_data.csv'
df_ner_1.to_csv(file_name, index = False)

In [39]:
# filter out obvious tags
banned_words = ['Seattle', 'WA', '#Seattle', '#seattle', 'Washington', 'SEATTLE', 'WASHINGTON',
                'seattle', 'Seattle WA', 'seattle wa','Seattle, WA', 'Seattle WA USA', 
                'Seattle, Washington', 'Seattle Washington', 'Wa', 'wa', '#Wa',
               '#wa', '#washington', '#Washington', '#WA', '#PNW', '#pnw', '#northwest']

df_ner = df_ner[~df_ner['tag'].isin(banned_words)]

Calculate Frequency, Likes, Replies, Retweets and Average Polarity per Tag.

In [40]:
ner_groups = df_ner.groupby(['tag', 'tag_type']).agg({'tag': "count", 'adj_polarity': "mean", 'like_count': 'sum', 'reply_count': 'sum', 'retweet_count': 'sum'})
ner_groups = ner_groups.rename(columns={
   "tag": "Frequency",
   "adj_polarity": "Avg_Polarity",
   "like_count": "Total_Likes",
   "reply_count": "Total_Replies",
   "retweet_count": "Total_Retweets"
})
ner_groups = ner_groups.sort_values(['Frequency'], ascending  = False)
ner_groups = ner_groups.reset_index()
ner_groups['Polarity Standard Deviation'] = ner_groups['Avg_Polarity'].std()
ner_groups = ner_groups[['tag',	'tag_type',	'Frequency',	'Avg_Polarity',	'Polarity Standard Deviation', 'Total_Likes',	'Total_Replies', 'Total_Retweets']]
ner_groups.head(10)

DataError: No numeric types to aggregate

Create an overall Sentiment column based on the Average Polarity of the Tag.

In [41]:
ner_groups['Sentiment'] = np.where(ner_groups['Avg_Polarity']>=0, 'POSITIVE', 'NEGATIVE')
ner_groups.head()

NameError: name 'ner_groups' is not defined

Raw Data

In [42]:
df2 = df_ner_1
df2.drop(columns = ['sentiment', 'polarity', 'mean polarity', 'polarity standard deviation'])
df2.head()

Unnamed: 0,tweet_dt,topic,id,username,name,number of followers,tweet,tag,tag_type,average daily tweets,sentiment,polarity,mean polarity,polarity standard deviation,like_count,reply_count,retweet_count,verified_user,location,total number of tweets


## Visualize!

We can get some bar plots for the Tags based on the following metrics:



*   Most Popular Tweets
*   Most Liked Tweets
*   Most Replied Tweets
*   Most Retweeted Tweets

By default, we do the analysis on all the Tags but we can also filter by Tag by checking the Filter_TAG box. 
This way we can further drill down into the metrics for Hashtags, Persons, Locations & Organizations.

We cut the plots by Sentiment i.e. the color of the bars tells us if the overall Sentiment was Positive or Negative.


In [43]:
#@title Visualize Top TAGs
Filter_TAG = False #@param {type:"boolean"}
TAG = 'Location' #@param ["Hashtag", "Person", "Location", "Organization"]
#@markdown ###Pick how many tags to display per chart:
Top_N = 10 #@param {type:"integer"}

# get TAG value
if TAG != 'Hashtag':
  TAG = TAG[:3].upper()

if Filter_TAG:
  filtered_group = ner_groups[(ner_groups['tag_type'] == TAG)]
else:
  filtered_group = ner_groups

# plot the figures
fig = plt.figure(figsize=(20, 16))
fig.subplots_adjust(hspace=0.2, wspace=0.5)

ax1 = fig.add_subplot(321)
sns.barplot(x="Frequency", y="tag", data=filtered_group[:Top_N], hue="Sentiment")
ax2 = fig.add_subplot(322)
filtered_group = filtered_group.sort_values(['Total_Likes'], ascending=False)
sns.barplot(x="Total_Likes", y="tag", data=filtered_group[:Top_N], hue="Sentiment")
ax3 = fig.add_subplot(323)
filtered_group = filtered_group.sort_values(['Total_Replies'], ascending=False)
sns.barplot(x="Total_Replies", y="tag", data=filtered_group[:Top_N], hue="Sentiment")
ax4 = fig.add_subplot(324)
filtered_group = filtered_group.sort_values(['Total_Retweets'], ascending=False)
sns.barplot(x="Total_Retweets", y="tag", data=filtered_group[:Top_N], hue="Sentiment")

ax1.title.set_text('Most Popular')
ax2.title.set_text('Most Liked')
ax3.title.set_text('Most Replied')
ax4.title.set_text('Most Retweeted')

ax1.set_ylabel('')    
ax1.set_xlabel('')
ax2.set_ylabel('')    
ax2.set_xlabel('')
ax3.set_ylabel('')    
ax3.set_xlabel('')
ax4.set_ylabel('')    
ax4.set_xlabel('')

NameError: name 'ner_groups' is not defined

###Get the Average Polarity Distribution.

In [44]:
fig = plt.figure(figsize=(12, 6))
sns.distplot(filtered_group['Avg_Polarity'], hist=False, kde_kws={"shade": True})

NameError: name 'filtered_group' is not defined

## Word Cloud

Let's build a Word Cloud based on these metrics. 

Since I am interested in Seattle, I am going to use overlay the Seattle city skyline view over my Word Cloud. 
You can change this by selecting a different Mask option from the drop down.

Images for Masks can be found at:

http://clipart-library.com/clipart/2099977.htm

https://needpix.com

In [45]:
# download mask images
!wget http://clipart-library.com/img/2099977.jpg -O seattle.jpg
!wget https://storage.needpix.com/rsynced_images/trotting-horse-silhouette.jpg -O horse.jpg
!wget https://storage.needpix.com/rsynced_images/black-balloon.jpg -O balloon.jpg
  
clear_output()

In [46]:
#@title Build Word Cloud For Top TAGs
Metric = 'Most Popular' #@param ["Most Popular", "Most Liked", "Most Replied", "Most Retweeted"]
#@markdown
Filter_TAG = False #@param {type:"boolean"}
##@markdown
TAG = 'Location' #@param ["Hashtag", "Person", "Location", "Organization"]
Mask = 'Rectangle' #@param ["Rectangle", "Seattle", "Balloon", "Horse"]

# get correct Metric value
if Metric == 'Most Popular':
   Metric = 'Frequency'
elif Metric == 'Most Liked':
   Metric = 'Total_Likes'
elif Metric == 'Most Replied':
   Metric = 'Total_Replies'
elif Metric == 'Most Retweeted':
   Metric = 'Total_Retweets'    

# get TAG value
if TAG != 'Hashtag':
  TAG = TAG[:3].upper()

if Filter_TAG:
  filtered_group = ner_groups[(ner_groups['tag_type'] == TAG)]
else:
  filtered_group = ner_groups

countDict = {}

for index, row in filtered_group.iterrows():
  if row[Metric] == 0:
    row[Metric] = 1
  countDict.update( {row['tag'] : row[Metric]} )
  
if Mask == 'Seattle':
  Mask = np.array(Image.open("seattle.jpg"))
elif Mask == 'Rectangle':
  Mask = np.array(Image.new('RGB', (800,600), (0, 0, 0)))
elif Mask == 'Horse':
  Mask = np.array(Image.open("horse.png"))
elif Mask == 'Balloon':
  Mask = np.array(Image.open("balloon.jpg"))

clear_output()

# Generate Word Cloud
wordcloud = WordCloud(
    max_words=100,
#     max_font_size=50,
    height=300,
    width=800,
    background_color = 'white',
    mask=Mask,
    contour_width=1,
    contour_color='steelblue',
    stopwords = STOPWORDS).generate_from_frequencies(countDict)
fig = plt.figure(
    figsize = (18, 18),
    )
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()

NameError: name 'ner_groups' is not defined