## Big waves of thoughts globally or locally in real-time.

Twitter-storms are available for analysis in near real-time. This means we can learn about the big waves of thoughts and moods around the world as they arise.
As any place filled with riches, Twitter has security guards blocking us from laying our hands on the data right away ⛔️ Some authentication steps (really straightforward) are needed to call their APIs for data collection. Since our goal today is learning to extract insights from data, we have already gotten a green-pass from security ✅ Our data is ready for usage in the datasets folder — we can concentrate on the fun part!

In [None]:
# Loading json module
import json 

# Loading WW_trends and US_trends data
WW_trends = json.loads(open('datasets/WWTrends.json').read())
US_trends = json.loads(open('datasets/USTrends.json').read())

# Inspecting data by printing out WW_trends and US_trends variables
print(WW_trends)
print(US_trends)

In [None]:
# Pretty-printing the results. First WW and then US trends. 

print("WW trends:")
print (json.dumps(WW_trends, indent=1))

print("\n", "US trends:")
print (json.dumps(US_trends, indent=1))

## 3. Finding common trends

From the output, we can observe that:

* We have an array of trend objects having: the name of the trending topic, the query parameter that can be used to search for the topic on Twitter-Search, the search URL and the volume of tweets for the last 24 hours, if available. (The trends get updated every 5 mins.)

* At query time #BeratKandili, #GoodFriday and #WeLoveTheEarth were trending WW.

* "tweet_volume" tell us that #WeLoveTheEarth was the most popular among the three.

* There are some trends which are unique to the US.

In [None]:
# Extracting all the WW trend names from WW_trends
world_trends = set([trend['name']                    
                    for trend in WW_trends[0]['trends']])

# Extracting all the US trend names from US_trends
us_trends = set([trend['name'] 
                     for trend in US_trends[0]['trends']]) 

# Let's get the intersection of the two sets of trends
common_trends = world_trends.intersection(us_trends)

# Inspecting the data
print(world_trends, "\n")
print(us_trends, "\n")
print (len(common_trends), "common trends:", common_trends)

## 4. Exploring the hot trend

From the intersection , we can see that, out of the two sets of trends (each of size 50), we have 11 overlapping topics. In particular, there is one common trend that sounds very interesting: #WeLoveTheEarth — so good to see that Twitteratis are unanimously talking about loving Mother Earth! 💚

We have found a hot-trend, #WeLoveTheEarth. Now let's see what story it is screaming to tell us! 
If we query Twitter's search API with this hashtag as query parameter

Image Source:Official Music Video Cover: https://welovetheearth.org/video/

In [None]:
# Loading the data
tweets = json.loads(open('/datasets/WeLoveTheEarth.json').read())

# Inspecting some tweets
tweets[0:2]

## Digging deeper 

#### Printing the first two tweet items makes us realize that there’s a lot more to a tweet than what we normally think of as a tweet — there is a lot more than just a short text.

In [None]:
# Extracting the text of all the tweets from the tweet object
texts = [tweet['text'] 
                 for tweet in tweets ]

# Extracting screen names of users tweeting about #WeLoveTheEarth
names = [user_mention['screen_name'] 
                 for tweet in tweets
                     for user_mention in tweet['entities']['user_mentions']]

# Extracting all the hashtags being used when talking about this topic
hashtags = [hashtag['text'] 
             for tweet in tweets
                 for hashtag in tweet['entities']['hashtags']]

# Inspecting the first 10 results
print (json.dumps(texts[0:10], indent=1),"\n")
print (json.dumps(names[0:10], indent=1),"\n")
print (json.dumps(hashtags[0:10], indent=1),"\n")

## Frequency Analysis

We are talking about a song about loving the Earth.

* A lot of big artists are the forces behind this Twitter wave, especially Lil Dicky.
* Ed Sheeran was some cute koala in the song — "EdSheeranTheKoala" hashtag! 🐨
* Observing the first 10 items of the interesting fields gave us a sense of the data.

In [None]:
# Importing modules
from collections import Counter

# Counting occcurrences/ getting frequency dist of all names and hashtags
for item in [names, hashtags]:
    c = Counter(item)    
    # Inspecting the 10 most common items in c
    print (c.most_common(10), "\n")

## Based on the last frequency distributions we can further build-up on our deductions:

* We can more safely say that this was a music video about Earth (hashtag 'EarthMusicVideo') by Lil Dicky.
* DiCaprio is not a music artist, but he was involved as well (Leo is an environmentalist so not a surprise to see his name pop up here).
* We can also say that the video was released on a Friday; very likely on April 19th.

We can measure a tweet's popularity by analyzing the retweetcount and favoritecount fields. But let's also extract the number of followers of the tweeter — we have a lot of celebs in the picture, so can we tell if their advocating for #WeLoveTheEarth influenced a significant proportion of their followers?
Note: The retweet_count gives us the total number of times the original tweet was retweeted.

In [None]:
# Extracting useful information from retweets
retweets = [
             (tweet['retweet_count'], 
              tweet['retweeted_status']['favorite_count'],
              tweet['retweeted_status']['user']['followers_count'],
              tweet['retweeted_status']['user']['screen_name'],
              tweet['text']) 
            
            for tweet in tweets 
                if 'retweeted_status' in tweet
           ]

8. A table that speaks a 1000 words
Let's manipulate the data further and visualize it in a better and richer way — "looks matter!"

In [None]:
# Importing modules
import matplotlib.pyplot as plt
import pandas as pd

# Visualizing the data in a pretty and insightful format
df = pd.DataFrame(
    retweets, 
    columns=['Retweets','Favorites','Followers','ScreenName','Text']).groupby(
    ['ScreenName','Text','Followers']).sum().sort_values(by=['Followers'], ascending=False)

df.style.background_gradient()

9. Tips and Hints 🕵️‍
* Lil Dicky's followers reacted the most — 42.4% of his followers liked his first tweet. Even if celebrities like Katy Perry and Ellen have a huuge Twitter following, their followers hardly reacted, e.g., only 0.0098% of Katy's followers liked her tweet.

* While Leo got the most likes and retweets in terms of counts, his first tweet was only liked by 2.19% of his followers.

* The large differences in reactions could be explained by the fact that this was Lil Dicky's music video. Leo still got more traction than Katy or Ellen because he played some major role in this initiative.

Let's create a frequency distribution for the languages.

In [None]:
# Extracting language for each tweet and appending it to the list of languages
tweets_languages = []
for tweet in tweets:
    tweets_languages.append(tweet['lang'])

# Plotting the distribution of languages
%matplotlib inline
plt.hist(tweets_languages)

## Conclusion ♀️

* Most of the tweets were in English.
* Polish, Italian and Spanish were the next runner-ups.
* There were a lot of tweets with a language alien to Twitter (lang = 'und').

##### This can allow us to get an understanding of the "category" of people interested in this topic (clustering). We could also analyze the device type used by the Twitteratis, tweet['source'], to answer questions like, "Does owning an Apple compared to Android influences people's propensity towards this trend?"