Load and inspect the data.

Import the json module.
Open the JSON file using the open() method with 'datasets/WWTrends.json' as input parameter -> call the read() method on the opened file to read its content -> pass the read JSON string to the json.loads() method as input parameter for decoding it -> store the decoded output in WW_trends.
Repeat the same steps for 'datasets/USTrends.json' and store the output in US_trends.
Inspect WW_trends and US_trends using the print() method.

In [2]:
import json

# Load and decode the JSON data for WWTrends
with open('datasets/WWTrends.json', 'r') as file:
    WW_trends = json.loads(file.read())

# Load and decode the JSON data for USTrends
with open('datasets/USTrends.json', 'r') as file:
    US_trends = json.loads(file.read())

# Inspect the data
print("WW_trends:", WW_trends)
print("US_trends:", US_trends)

WW_trends: [{'trends': [{'name': '#BeratKandili', 'url': 'http://twitter.com/search?q=%23BeratKandili', 'promoted_content': None, 'query': '%23BeratKandili', 'tweet_volume': 46373}, {'name': '#GoodFriday', 'url': 'http://twitter.com/search?q=%23GoodFriday', 'promoted_content': None, 'query': '%23GoodFriday', 'tweet_volume': 81891}, {'name': '#WeLoveTheEarth', 'url': 'http://twitter.com/search?q=%23WeLoveTheEarth', 'promoted_content': None, 'query': '%23WeLoveTheEarth', 'tweet_volume': 159698}, {'name': '#195TLdenTTVerilir', 'url': 'http://twitter.com/search?q=%23195TLdenTTVerilir', 'promoted_content': None, 'query': '%23195TLdenTTVerilir', 'tweet_volume': None}, {'name': '#AFLNorthDons', 'url': 'http://twitter.com/search?q=%23AFLNorthDons', 'promoted_content': None, 'query': '%23AFLNorthDons', 'tweet_volume': None}, {'name': 'Shiv Sena', 'url': 'http://twitter.com/search?q=%22Shiv+Sena%22', 'promoted_content': None, 'query': '%22Shiv+Sena%22', 'tweet_volume': None}, {'name': 'Lyra McKe

In [3]:
# Pretty-printing the results. First WW and then US trends.
print("WW trends:",json.dumps(WW_trends, indent=1))
print("\n", "US trends:",json.dumps(US_trends, indent=1))


WW trends: [
 {
  "trends": [
   {
    "name": "#BeratKandili",
    "url": "http://twitter.com/search?q=%23BeratKandili",
    "promoted_content": null,
    "query": "%23BeratKandili",
    "tweet_volume": 46373
   },
   {
    "name": "#GoodFriday",
    "url": "http://twitter.com/search?q=%23GoodFriday",
    "promoted_content": null,
    "query": "%23GoodFriday",
    "tweet_volume": 81891
   },
   {
    "name": "#WeLoveTheEarth",
    "url": "http://twitter.com/search?q=%23WeLoveTheEarth",
    "promoted_content": null,
    "query": "%23WeLoveTheEarth",
    "tweet_volume": 159698
   },
   {
    "name": "#195TLdenTTVerilir",
    "url": "http://twitter.com/search?q=%23195TLdenTTVerilir",
    "promoted_content": null,
    "query": "%23195TLdenTTVerilir",
    "tweet_volume": null
   },
   {
    "name": "#AFLNorthDons",
    "url": "http://twitter.com/search?q=%23AFLNorthDons",
    "promoted_content": null,
    "query": "%23AFLNorthDons",
    "tweet_volume": null
   },
   {
    "name": "Shiv Sen

## 3. Finding common trends

üïµÔ∏è‚Äç‚ôÄÔ∏è From the pretty-printed results (output of the previous task), we can observe that:

- We have an array of trend objects having: the name of the trending topic, the query parameter that can be used to search for the topic on Twitter-Search, the search URL and the volume of tweets for the last 24 hours, if available. (The trends get updated every 5 mins.)

- At query time #BeratKandili, #GoodFriday and #WeLoveTheEarth were trending WW.

- "tweet_volume" tell us that #WeLoveTheEarth was the most popular among the three.
 
- Results are not sorted by "tweet_volume".

- There are some trends which are unique to the US.

In [4]:
# Extracting all the WW trend names from WW_trends
world_trends = set([trend['name'] for trend in WW_trends[0]['trends']])

# Extracting all the US trend names from US_trends
us_trends = set([trend['name'] for trend in US_trends[0]['trends']])

# Getting the intersection of the two sets of trends
common_trends = world_trends.intersection(us_trends)

# Inspecting the data
print(world_trends, "\n")
print(us_trends, "\n")
print (len(common_trends), "common trends:", common_trends)

{'#NikahUmurBerapa', '#HanumanJayanti', 'Priyanka Chaturvedi', 'Hemant Karkare', '#Karfreitag', 'Ê±†Ë¢ã„ÅÆ‰∫ãÊïÖ', '#ProtestoEdiyorum', '#Jersey', '#BeratKandili', '„Éó„É™„Ç¶„Çπ', 'Êù±‰∫¨„ÉªÊ±†Ë¢ãË°ùÁ™Å‰∫ãÊïÖ', '#KpuJanganCurang', 'Derrick White', '#HardikPatel', 'Èáç‰Ωì„ÅÆÂ•≥ÊÄß„Å®Â•≥ÂÖê', '#Hayƒ±rlƒ±Kandiller', '#Hayƒ±rlƒ±Cumalar', '#NRLBulldogsSouths', '#AFLNorthDons', 'È´òÈΩ¢ËÄÖ', 'Î∏åÏù¥Ïïå', 'Shiv Sena', '„Ç∞„É¨„Ç¢', '#ViernesSanto', '#ShivSena', '#CHIvLIO', '#BLACKPINKxCorden', '#ConCalmaRemix', '#IndonesianElectionHeroes', 'Ê≠©Ë°åËÄÖ', '#DragRace', 'Berat Kandilimiz', '#DinahJane1', 'ÂÖçË®±ËøîÁ¥ç', '#Ontas', '#DuyguAsena', '#GoodFriday', '#195TLdenTTVerilir', '#ÿßÿ∫ŸÑÿßŸÇ_BBM', '#JunquerasACN', 'Lyra McKee', '√∂rg√ºtdeƒüil arkada≈ügrubu', '#WeLoveTheEarth', 'ÂçÅ‰∫åÂõΩË®ò', '#TheJudasInMyLife', '#ŸäŸàŸÖ_ÿßŸÑÿ¨ŸÖÿπŸá', 'ÂàÄ„Çπ„ÉÜ', '#19aprile', 'Derry', 'Lil Dicky'} 

{'Yvie', 'Game 6', '#WhatStopsYouFromGoingHome', 'Servais', 'Shy Glizzy', 'WE LOVE THE EARTH', 'Mike Anderson', '

## **4. Exploring the hot trend**
üïµÔ∏è‚Äç‚ôÄÔ∏è From the intersection (last output) we can see that, out of the two sets of trends (each of size 50), we have 11 overlapping topics. In particular, there is one common trend that sounds very interesting: **#WeLoveTheEarth** ‚Äî so good to see that Twitteratis are unanimously talking about loving Mother Earth! üíö

**_Note:_** We could have had no overlap or a much higher overlap; when we did the query for getting the trends, people in the US could have been on fire obout topics only relevant to them.

In [5]:
# Loading the data
with open('datasets/WeLoveTheEarth.json', 'r') as file:
    tweets = json.loads(file.read())

# Inspecting some tweets
tweets[0:2]

[{'created_at': 'Fri Apr 19 08:46:48 +0000 2019',
  'id': 1119160405270523904,
  'id_str': '1119160405270523904',
  'text': 'RT @lildickytweets: üåé out now #WeLoveTheEarth https://t.co/L22XsoT5P1',
  'truncated': False,
  'entities': {'hashtags': [{'text': 'WeLoveTheEarth', 'indices': [30, 45]}],
   'symbols': [],
   'user_mentions': [{'screen_name': 'lildickytweets',
     'name': 'LD',
     'id': 1209516660,
     'id_str': '1209516660',
     'indices': [3, 18]}],
   'urls': [{'url': 'https://t.co/L22XsoT5P1',
     'expanded_url': 'https://youtu.be/pvuN_WvF1to',
     'display_url': 'youtu.be/pvuN_WvF1to',
     'indices': [46, 69]}]},
  'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
  'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'in_reply_to_screen_name': None,
  'user': {'id': 

#  5. Digging deeper #
üïµÔ∏è‚Äç‚ôÄÔ∏è Printing the first two tweet items makes us realize that there‚Äôs a lot more to a tweet than what we normally think of as a tweet ‚Äî there is a lot more than just a short text!

But hey, let's not get overwhemled by all the information in a tweet object! Let's focus on a few interesting fields and see if we can find any hidden insights there.

In [6]:
# Extracting the text of all the tweets from the tweet object
texts = [tweet['text'] for tweet in tweets]

# Extracting screen names of users tweeting about #WeLoveTheEarth
names = [user_mention['screen_name'] for tweet in tweets for user_mention in tweet['entities']['user_mentions']]

# Extracting all the hashtags being used when talking about this topic
hashtags = [hashtag['text'] for tweet in tweets for hashtag in tweet['entities']['hashtags']]

# Inspecting the first 10 results
print (json.dumps(texts[0:10], indent=1),"\n")
print (json.dumps(names[0:10], indent=1),"\n")
print (json.dumps(hashtags[0:10], indent=1),"\n")

[
 "RT @lildickytweets: \ud83c\udf0e out now #WeLoveTheEarth https://t.co/L22XsoT5P1",
 "\ud83d\udc9a\ud83c\udf0e\ud83d\udc9a  #WeLoveTheEarth \ud83d\udc47\ud83c\udffc",
 "RT @cabeyoomoon: Ta piosenka to bop,  wpada w ucho  i dochody z niej id\u0105 na dobry cel,  warto s\u0142ucha\u0107 w k\u00f3\u0142ko i w k\u00f3\u0142ko gdziekolwiek si\u0119 ty\u2026",
 "#WeLoveTheEarth \nCzemu ja si\u0119 pop\u0142aka\u0142am",
 "RT @Spotify: This is epic. @lildickytweets got @justinbieber, @arianagrande, @halsey, @sanbenito, @edsheeran, @SnoopDogg, @ShawnMendes, @Kr\u2026",
 "RT @biebercentineo: Justin : are we gonna die? \nLil dicky: you know bieber we might die \n\nBTCH IM CRYING #EARTH #WeLoveTheEarth #WELOVEEART\u2026",
 "RT @dreamsiinflate: #WeLoveTheEarth \u201ci am a fat fucking pig\u201d okay brendon urie https://t.co/FdJmq31xZc",
 "Literally no one:\n\nMe in the past 4 hours:\n\nI'm a koala and I sleep all the time, so what, it's cute \ud83c\udfb6\n\n#WeLoveTheEarth #EdSheeranTheKoala",

## 6. Frequency analysis
üïµÔ∏è‚Äç‚ôÄÔ∏è Just from the first few results of the last extraction, we can deduce that:

- We are talking about a song about loving the Earth.
- A lot of big artists are the forces behind this Twitter wave, especially Lil Dicky.
- Ed Sheeran was some cute koala in the song ‚Äî "EdSheeranTheKoala" hashtag! üê®

In [1]:
# Importing modules
# ... YOUR CODE FOR TASK 6 ...
from collections import Counter


# Counting occcurrences/ getting frequency dist of all names and hashtags
for item in [names, hashtags]:
    c = Counter(item) 
    # Inspecting the 10 most common items in c
    print (c.most_common(10), "\n")

NameError: name 'names' is not defined

# 7. Activity around the trend
üïµÔ∏è‚Äç‚ôÄÔ∏è Based on the last frequency distributions we can further build-up on our deductions:

We can more safely say that this was a music video about Earth (hashtag 'EarthMusicVideo') by Lil Dicky.
DiCaprio is not a music artist, but he was involved as well (Leo is an environmentalist so not a surprise to see his name pop up here).
We can also say that the video was released on a Friday; very likely on April 19th.
We have been able to extract so many insights. Quite powerful, isn't it?!

Let's further analyze the data to find patterns in the activity around the tweets ‚Äî __did all retweets occur around a particular tweet?__

If a tweet has been retweeted, the 'retweeted_status' field gives many interesting details about the original tweet itself and its author.

We can measure a tweet's popularity by analyzing the __retweetcount__ and __favoritecount__ fields. But let's also extract the number of followers of the tweeter ‚Äî we have a lot of celebs in the picture, so can we tell if their advocating for __#WeLoveTheEarth influenced a significant proportion of their followers?__

__Note__: The retweet_count gives us the total number of times the original tweet was retweeted. It should be the same in both the original tweet and all the next retweets. Tinkering around with some sample tweets and the official documentaiton are the way to get your head around the mnay fields.

In [2]:
# Extracting useful information from retweets
retweets = [(tweet['retweeted_status']['retweet_count'],
             tweet['retweeted_status']['favorite_count'],
             tweet['retweeted_status']['user']['followers_count'],
             tweet['retweeted_status']['user']['screen_name'],
             tweet['text']) for tweet in tweets if 'retweeted_status' in tweet]

NameError: name 'tweets' is not defined

# 8. A table that speaks a 1000 words
Let's manipulate the data further and visualize it in a better and richer way ‚Äî _"looks matter!"_

In [None]:
import pandas as pd

# Importing modules
import matplotlib.pyplot as plt

# Create a DataFrame and visualize the data in a pretty and insightful format
df = pd.DataFrame(retweets, columns=['Retweets','Favorites', 'Followers', 'ScreenName', 'Text'])
df_grouped = df.groupby(['ScreenName','Text','Followers']).sum()
df_sorted = df_grouped.sort_values(by='Followers', ascending=False)

df_sorted.style.background_gradient()

# 9. Analyzing used languages
üïµÔ∏è‚Äç‚ôÄÔ∏è Our table tells us that:

- Lil Dicky's followers reacted the most ‚Äî 42.4% of his followers liked his first tweet.
- Even if celebrities like Katy Perry and Ellen have a huuge Twitter following, their followers hardly reacted, e.g., only 0.0098% of Katy's followers liked her tweet.
- While Leo got the most likes and retweets in terms of counts, his first tweet was only liked by 2.19% of his followers.
The large differences in reactions could be explained by the fact that this was Lil Dicky's music video. Leo still got more traction than Katy or Ellen because he played some major role in this initiative.

Can we find some more interesting patterns in the data? From the text of the tweets, we could spot different languages, so let's create a frequency distribution for the languages.

In [None]:
import matplotlib.pyplot as plt

# Extracting language for each tweet and appending it to the list of languages
tweets_languages = []
for tweet in tweets: 
    tweets_languages.append(tweet['lang'])


# Plotting the distribution of languages
%matplotlib inline
# ... YOUR CODE FOR TASK 9 ...
plt.hist(tweets_languages)
plt.xlabel('Language')
plt.ylabel('Frequency')
plt.title('Frequency Distribution of Tweet Languages')
plt.show()