### LSE Data Analytics Online Career Accelerator

# DA201: Data Analytics using Python

## Practical activity: Search the Twitter API

**Important**

Please take note that you will work with the Twitter API. Keep in mind that the Twitter API is based on live and current events. Therefore, your output will differ from the outputs provided. For example, the colour of apples was trending yesterday, but today the aerodynamics of aeroplanes is trending.

**This is the solution to the activity.**

The story of Bitcoin and other cryptocurrencies has captured investors like few financial stories have. Many finance firms are looking to invest in the crypto market. As a data analyst at a financial institution, your manager has tasked you with investigating Bitcoin in a little more detail, particularly in terms of future growth of the currency and its use in the United States. 

Earlier, as a data analyst at a financial institution, your manager tasked you with investigating Bitcoin in terms of future growth and its use in the United States. Previously, you accessed Bitcoin data through the Coingecko API. Now, your manager asks you to turn your attention to Twitter, particularly tweets on Bitcoin and cryptocurrency in general. Your manager particularly wants you to check if Bitcoin is trending in `New York, Los Angeles, Sydney, Auckland, and Dubai`.

She also wants to see a DataFrame of topics with over `200,000` tweets for each city. 

Your manager then wants you to cross-check trending topics between the `United States and the UK`, to see what people are talking about in both countries, and if Bitcoin forms part of the larger conversation. If Bitcoin is not a shared trending topic, then she asks that you search Twitter for `#Bitcoin` and two other cryptocurrency hashtags of your choice, and analyse the top two tweets you return for each hashtag, particularly in terms of their popularity.

## 1. Prepare your workstation

In [1]:
# Copy the YAML file and your Twitter keys over to this Jupyter Notebook before you start to work.
import yaml
from yaml.loader import SafeLoader
from twitter import *

# Import the YAML file - remember to specify the whole path.
twitter_creds = yaml.safe_load(open('C:\\Users\\lefteris\\AppData\\Roaming\\Microsoft\\Windows\\Start Menu\\Programs\\Anaconda3 (64-bit)\\twitter.yaml', 'r').read())

# Pass your Twitter credentials.
twitter_api = Twitter(auth=OAuth(twitter_creds['access_token'],
                                 twitter_creds['access_token_secret'], 
                                 twitter_creds['api_key'],
                                 twitter_creds['api_secret_key'] ))

In [2]:
# See if you are connected.
print(twitter_api)

<twitter.api.Twitter object at 0x00000225CA40A4C0>


In [3]:
# Run a test with #python.
python_tweets = twitter_api.search.tweets(q="#python")

# View the output.
print(python_tweets)

{'statuses': [{'created_at': 'Sat Jul 02 14:36:48 +0000 2022', 'id': 1543242290851975170, 'id_str': '1543242290851975170', 'text': 'RT @DisfoldDotCom: RT @Khulood_Almani: \U0001faf6 13 #medical advances that are changing livesüôå\n\n#TechForGood #innovation #tech #coding #business #1‚Ä¶', 'truncated': False, 'entities': {'hashtags': [{'text': 'medical', 'indices': [44, 52]}, {'text': 'TechForGood', 'indices': [88, 100]}, {'text': 'innovation', 'indices': [101, 112]}, {'text': 'tech', 'indices': [113, 118]}, {'text': 'coding', 'indices': [119, 126]}, {'text': 'business', 'indices': [127, 136]}], 'symbols': [], 'user_mentions': [{'screen_name': 'DisfoldDotCom', 'name': 'Disfold', 'id': 2371409042, 'id_str': '2371409042', 'indices': [3, 17]}, {'screen_name': 'Khulood_Almani', 'name': 'Dr. Khulood Almani | ÿØ.ÿÆŸÑŸàÿØ ÿßŸÑŸÖÿßŸÜÿπ', 'id': 1403861754808049666, 'id_str': '1403861754808049666', 'indices': [22, 37]}], 'urls': []}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent

## 2. Identify New York and London

In [4]:
# Determine worldwide trends.
trends_worldwide = twitter_api.trends.available()

# How many trends are available?
print(len(trends_worldwide))

# Example of trends_worldwide.
trends_worldwide[0]

467


{'name': 'Worldwide',
 'placeType': {'code': 19, 'name': 'Supername'},
 'url': 'http://where.yahooapis.com/v1/place/1',
 'parentid': 0,
 'country': '',
 'woeid': 1,
 'countryCode': None}

## New York

In [5]:
# Find New York.
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
print(len(new_york))

# Use index to find New York.
new_york[0]

# List of where on earth identifier (woeid).
new_york[0]['woeid']

1


2459115

## London

In [6]:
# Find London.
our_city_2 = 'London'

# Create a variable.
london = [_ for _ in trends_worldwide if _['name'] == our_city_2]

# View the output.
print(len(london))

# Use index to find London.
london[0]

# List of where on earth identifier (woeid).
london[0]['woeid']

1


44418

## 3. Common trends

## New York

In [7]:
# Look at trends in New York.
new_york_trends = twitter_api.trends.place(_id = new_york[0]['woeid'])

# View the output.
new_york_trends

[{'trends': [{'name': '#Caturday',
    'url': 'http://twitter.com/search?q=%23Caturday',
    'promoted_content': None,
    'query': '%23Caturday',
    'tweet_volume': 11035},
   {'name': '#SaturdayMorning',
    'url': 'http://twitter.com/search?q=%23SaturdayMorning',
    'promoted_content': None,
    'query': '%23SaturdayMorning',
    'tweet_volume': 10316},
   {'name': '#BritishGP',
    'url': 'http://twitter.com/search?q=%23BritishGP',
    'promoted_content': None,
    'query': '%23BritishGP',
    'tweet_volume': 72613},
   {'name': '#SaturdayVibes',
    'url': 'http://twitter.com/search?q=%23SaturdayVibes',
    'promoted_content': None,
    'query': '%23SaturdayVibes',
    'tweet_volume': 10795},
   {'name': 'Latifi',
    'url': 'http://twitter.com/search?q=Latifi',
    'promoted_content': None,
    'query': 'Latifi',
    'tweet_volume': 10491},
   {'name': 'SNKRS',
    'url': 'http://twitter.com/search?q=SNKRS',
    'promoted_content': None,
    'query': 'SNKRS',
    'tweet_volume'

In [8]:
# Look at the output as a DataFrame.
# Import Pandas.
import pandas as pd

# Create a DataFrame.
new_york_trends_pd = pd.DataFrame(new_york_trends[0]['trends'])

# View a DataFrame.
new_york_trends_pd

Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,#Caturday,http://twitter.com/search?q=%23Caturday,,%23Caturday,11035.0
1,#SaturdayMorning,http://twitter.com/search?q=%23SaturdayMorning,,%23SaturdayMorning,10316.0
2,#BritishGP,http://twitter.com/search?q=%23BritishGP,,%23BritishGP,72613.0
3,#SaturdayVibes,http://twitter.com/search?q=%23SaturdayVibes,,%23SaturdayVibes,10795.0
4,Latifi,http://twitter.com/search?q=Latifi,,Latifi,10491.0
5,SNKRS,http://twitter.com/search?q=SNKRS,,SNKRS,
6,She's 10,http://twitter.com/search?q=%22She%27s+10%22,,%22She%27s+10%22,86558.0
7,Good Saturday,http://twitter.com/search?q=%22Good+Saturday%22,,%22Good+Saturday%22,25436.0
8,Daily Quordle 159,http://twitter.com/search?q=%22Daily+Quordle+1...,,%22Daily+Quordle+159%22,
9,#Colin,http://twitter.com/search?q=%23Colin,,%23Colin,


In [9]:
# Narrow list down to 50,000 tweets.
new_york_trends_over50k_pd = new_york_trends_pd[new_york_trends_pd['tweet_volume'] > 50000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(new_york_trends_over50k_pd.shape)
new_york_trends_over50k_pd

(11, 5)


Unnamed: 0,name,url,promoted_content,query,tweet_volume
42,Hoyoverse,http://twitter.com/search?q=Hoyoverse,,Hoyoverse,150321.0
37,The Message,http://twitter.com/search?q=%22The+Message%22,,%22The+Message%22,138553.0
29,taehyun,http://twitter.com/search?q=taehyun,,taehyun,91835.0
47,Adele,http://twitter.com/search?q=Adele,,Adele,88162.0
6,She's 10,http://twitter.com/search?q=%22She%27s+10%22,,%22She%27s+10%22,86558.0
31,Kazuha,http://twitter.com/search?q=Kazuha,,Kazuha,83421.0
24,Diluc,http://twitter.com/search?q=Diluc,,Diluc,73146.0
2,#BritishGP,http://twitter.com/search?q=%23BritishGP,,%23BritishGP,72613.0
27,Sumeru,http://twitter.com/search?q=Sumeru,,Sumeru,66160.0
20,Wyoming,http://twitter.com/search?q=Wyoming,,Wyoming,65187.0


In [10]:
# Save output as a CSV file.
new_york_trends_over50k_pd.to_csv('new_york_trends_over50k.csv', index=False)

## London

In [11]:
# Look at trends in London.
london = twitter_api.trends.place(_id = london[0]['woeid'])

# View the output.
london

[{'trends': [{'name': '#RLCS',
    'url': 'http://twitter.com/search?q=%23RLCS',
    'promoted_content': None,
    'query': '%23RLCS',
    'tweet_volume': None},
   {'name': '#CoralEclipse',
    'url': 'http://twitter.com/search?q=%23CoralEclipse',
    'promoted_content': None,
    'query': '%23CoralEclipse',
    'tweet_volume': None},
   {'name': '#FormulaOne',
    'url': 'http://twitter.com/search?q=%23FormulaOne',
    'promoted_content': None,
    'query': '%23FormulaOne',
    'tweet_volume': None},
   {'name': 'Hamilton',
    'url': 'http://twitter.com/search?q=Hamilton',
    'promoted_content': None,
    'query': 'Hamilton',
    'tweet_volume': 36857},
   {'name': '#preseason',
    'url': 'http://twitter.com/search?q=%23preseason',
    'promoted_content': None,
    'query': '%23preseason',
    'tweet_volume': None},
   {'name': 'Martinez',
    'url': 'http://twitter.com/search?q=Martinez',
    'promoted_content': None,
    'query': 'Martinez',
    'tweet_volume': 116768},
   {'nam

In [12]:
# Look at the output as a DataFrame.

# Create a DataFrame.
london_trends_pd = pd.DataFrame(london[0]['trends'])

# View the DataFrame.
london_trends_pd

Unnamed: 0,name,url,promoted_content,query,tweet_volume
0,#RLCS,http://twitter.com/search?q=%23RLCS,,%23RLCS,
1,#CoralEclipse,http://twitter.com/search?q=%23CoralEclipse,,%23CoralEclipse,
2,#FormulaOne,http://twitter.com/search?q=%23FormulaOne,,%23FormulaOne,
3,Hamilton,http://twitter.com/search?q=Hamilton,,Hamilton,36857.0
4,#preseason,http://twitter.com/search?q=%23preseason,,%23preseason,
5,Martinez,http://twitter.com/search?q=Martinez,,Martinez,116768.0
6,Ginola,http://twitter.com/search?q=Ginola,,Ginola,
7,#ASongOrMovieForLocalProduce,http://twitter.com/search?q=%23ASongOrMovieFor...,,%23ASongOrMovieForLocalProduce,
8,Deano,http://twitter.com/search?q=Deano,,Deano,
9,Boulter,http://twitter.com/search?q=Boulter,,Boulter,


In [13]:
# Narrow list down to 50,000 tweets.
london_trends_over50k_pd = london_trends_pd[london_trends_pd['tweet_volume'] > 50000]\
.sort_values('tweet_volume', ascending=False)

# View the output.
print(london_trends_over50k_pd.shape)
london_trends_over50k_pd

(3, 5)


Unnamed: 0,name,url,promoted_content,query,tweet_volume
5,Martinez,http://twitter.com/search?q=Martinez,,Martinez,116768.0
45,Hill,http://twitter.com/search?q=Hill,,Hill,102690.0
12,Diaz,http://twitter.com/search?q=Diaz,,Diaz,54310.0


In [14]:
# Save output as a CSV file.
london_trends_over50k_pd.to_csv('london_trends_over50k.csv', index=False)

### compare cities

In [15]:
# Find New York.
our_city = 'New York'

# Create a variable.
new_york = [_ for _ in trends_worldwide if _['name'] == our_city]

# View the output.
new_york[0]['woeid']

2459115

In [None]:
# Find London.
our_city_2 = 'London'

# Create a variable.
london = [_ for _ in trends_worldwide if _['name'] == our_city_2]

# View the output.
london[0]['woeid']

In [16]:
# Search for each city.
# Import JSON.
import json

# Search for New York.
new_york_trends = twitter_api.trends.place(_id=2459115)

# View JSON output.
print (json.dumps(new_york_trends, indent=4))

[
    {
        "trends": [
            {
                "name": "#Caturday",
                "url": "http://twitter.com/search?q=%23Caturday",
                "promoted_content": null,
                "query": "%23Caturday",
                "tweet_volume": 11235
            },
            {
                "name": "Latifi",
                "url": "http://twitter.com/search?q=Latifi",
                "promoted_content": null,
                "query": "Latifi",
                "tweet_volume": 16312
            },
            {
                "name": "#SaturdayMorning",
                "url": "http://twitter.com/search?q=%23SaturdayMorning",
                "promoted_content": null,
                "query": "%23SaturdayMorning",
                "tweet_volume": 10406
            },
            {
                "name": "#BritishGP",
                "url": "http://twitter.com/search?q=%23BritishGP",
                "promoted_content": null,
                "query": "%23BritishGP",
      

In [17]:
# Search for London.
london_trends = twitter_api.trends.place(_id=44418)

# View JSON output.
print (json.dumps(london_trends, indent=4))

[
    {
        "trends": [
            {
                "name": "Mishriff",
                "url": "http://twitter.com/search?q=Mishriff",
                "promoted_content": null,
                "query": "Mishriff",
                "tweet_volume": null
            },
            {
                "name": "#CoralEclipse",
                "url": "http://twitter.com/search?q=%23CoralEclipse",
                "promoted_content": null,
                "query": "%23CoralEclipse",
                "tweet_volume": null
            },
            {
                "name": "#RLCS",
                "url": "http://twitter.com/search?q=%23RLCS",
                "promoted_content": null,
                "query": "%23RLCS",
                "tweet_volume": null
            },
            {
                "name": "Zinchenko",
                "url": "http://twitter.com/search?q=Zinchenko",
                "promoted_content": null,
                "query": "Zinchenko",
                "tweet_volume":

In [18]:
# Find common topics.
new_york_trends_list = [trend['name'] for trend in new_york_trends[0]['trends']]

# View output.
print(new_york_trends_list)

['#Caturday', 'Latifi', '#SaturdayMorning', '#BritishGP', '#SaturdayVibes', 'SNKRS', "She's 10", 'Daily Quordle 159', 'good saturday', '#Colin', 'Clover', 'Opetaia', 'John McCain', 'Happy 4th', 'Albon', 'Anisimova', 'Texas Supreme Court', 'Aston Martin', 'Wyoming', 'Mary Ann', 'Happy Birthday Tom', 'Jim Breuer', 'Haas', 'Gauff', 'Diluc', 'Heizou', 'Thom', 'Sumeru', 'Marilyn Manson', 'taehyun', 'Kazuha', 'Chick-fil-A', 'Arias', 'Richard Petty', 'Cobbler', 'Carolinas', 'Alec Baldwin', 'Klee', 'The Message', 'rapinoe', 'Happy Fourth of July', 'Andujar', 'Evan Rachel Wood', 'Deebo', 'Hoyoverse', 'Idiocracy', 'John Adams', 'Adele', 'Volk', 'Gamma']


In [19]:
# Find common topics.
london_trends_list = [trend['name'] for trend in london_trends[0]['trends']]

# View output.
print(london_trends_list)

['Mishriff', '#CoralEclipse', '#RLCS', 'Zinchenko', '#FormulaOne', 'Hamilton', 'De Gea', '#preseason', 'Martinez', 'Ginola', '#ASongOrMovieForLocalProduce', 'Deano', 'Boulter', 'Diaz', 'Jordan Peterson', 'Varane', 'Bumble', 'Daily Quordle 159', 'Tierney', 'Halal', 'Moyes', 'Robbo', 'Latifi', 'Andy Goram', 'Free Wind', 'Albon', 'Aston Martin', 'Bowyer', 'Eddie Jones', 'Crowley', 'Trialist', 'Ricciardo', 'Rangers', 'Jamie Chadwick', 'Hickey', 'get shirty', 'Equilateral', 'Goatifi', 'Sancho', 'Zhou', 'Broady', 'Arundell', 'Haas', 'Shelvey', 'Rashford', 'Anne Diamond', 'James Tarkowski', 'Bay Bridge', 'alex de minaur', 'Okolie']


In [20]:
# Find trends between cities.
new_york_trends_set = set(new_york_trends_list)
london_trends_set = set(london_trends_list)

# Set variable.
common_trends = new_york_trends_set.intersection(london_trends_set)

# View output.
print(common_trends)

{'Daily Quordle 159', 'Aston Martin', 'Albon', 'Haas', 'Latifi'}


## Search for #Bitcoin

In [21]:
# Run a test with #Bitcoin.
bitcoin_tweets = twitter_api.search.tweets(q="#Bitcoin")

# View JSON output.
print(json.dumps(bitcoin_tweets, indent=4))

{
    "statuses": [
        {
            "created_at": "Sat Jul 02 14:43:41 +0000 2022",
            "id": 1543244023627915264,
            "id_str": "1543244023627915264",
            "text": "RT @MatthewHyland_: #Bitcoin has an opportunity to create Bullish Divergence on the daily time frame but it will require a bounce here: htt\u2026",
            "truncated": false,
            "entities": {
                "hashtags": [
                    {
                        "text": "Bitcoin",
                        "indices": [
                            20,
                            28
                        ]
                    }
                ],
                "symbols": [],
                "user_mentions": [
                    {
                        "screen_name": "MatthewHyland_",
                        "name": "Matthew Hyland",
                        "id": 889273682,
                        "id_str": "889273682",
                        "indices": [
                 