# Exercise Notebook on Accessing Twitter

## Exercise 1: Twitter API Access

You should already have Twitter access setup for the lecture, if you do not, please revisit the lecture and make sure you have your Twitter credentials saved:

In [1]:
import pickle
import os

Make sure to select the relative path to the `secret_twitter_credentials.pkl` file:

Remember `.pkl` files are files that can be Python data structures, saved on to disk, so that they can be accessed at a later date. In this case, the saved data structure is a Python dictionary containing the necessary authorisation details to connect to Twitter's API. 

We must load the file using 'rb' to read the file bytes (deserialization), as it has been previously serialized.

In [10]:
twitter_auth=pickle.load(open('../Week-8-NLP-Databases/secret_twitter_credentials.pkl','rb'))

We connect to the Twitter API by first creating a twitter.oauth.Oauth object, containing our credentials. We then use the twitter.Twitter method, setting the auth kwarg to the Oauth object we've just created, and assign this to a variable. This variable is a twitter API object which we can now use to start accessing Twitter's data.

In [13]:
import twitter

auth = twitter.oauth.OAuth(twitter_auth['Access Token'],
                           twitter_auth['Access Token Secret'],
                           twitter_auth['Consumer Key'],
                           twitter_auth['Consumer Secret'])

twitter_api = twitter.Twitter(auth=auth)

## Exercise 2: Get the WOE ID for a place of interest

Find the Yahoo! Where On Earth ID for a place you are interested in at:

http://woeid.rosselliot.co.nz/

Set `LOCAL_WOE_ID` to this integer number below: 

In [18]:
LOCAL_WOE_ID = None
### BEGIN SOLUTION
LOCAL_WOE_ID = '23424916'
### END SOLUTION

Passing a variable in to a Python `assert` statement checks if the variable is assigned a value other than None. Triggers the assertion error msg if the variables value is None.

In [22]:
assert LOCAL_WOE_ID, "Remember to set LOCAL_WOE_ID to a location identifier"

## Exercise 3: Retrieve and print local trends

Let's use the twitter API to retrieve trends. We do this using the `.trends.place()` method and passing in our WOE id to the _id kwarg. We are returned a TwitterListResponse.

In [30]:
local_trends = twitter_api.trends.place(_id=LOCAL_WOE_ID)

`local_trends` is a highly nested data structure made up of lists and dictionaries, explore it with `type()`, `len()` and indexing like `[0]` and print out a list of all the trends:

In [44]:
print(len(local_trends[0]['trends']))
local_trends[0]['trends']

50


[{'name': '#DemDebate',
  'url': 'http://twitter.com/search?q=%23DemDebate',
  'promoted_content': None,
  'query': '%23DemDebate',
  'tweet_volume': 642984},
 {'name': 'nick smith',
  'url': 'http://twitter.com/search?q=%22nick+smith%22',
  'promoted_content': None,
  'query': '%22nick+smith%22',
  'tweet_volume': None},
 {'name': '#NZvENG',
  'url': 'http://twitter.com/search?q=%23NZvENG',
  'promoted_content': None,
  'query': '%23NZvENG',
  'tweet_volume': None},
 {'name': '#HiNZ2019',
  'url': 'http://twitter.com/search?q=%23HiNZ2019',
  'promoted_content': None,
  'query': '%23HiNZ2019',
  'tweet_volume': None},
 {'name': 'grammy',
  'url': 'http://twitter.com/search?q=grammy',
  'promoted_content': None,
  'query': 'grammy',
  'tweet_volume': 1086141},
 {'name': '#22OpeningDay',
  'url': 'http://twitter.com/search?q=%2322OpeningDay',
  'promoted_content': None,
  'query': '%2322OpeningDay',
  'tweet_volume': None},
 {'name': '#FordTrophy',
  'url': 'http://twitter.com/search?q=%

Using list comprehension we can easily extract the trending tweet names in to a list.

In [50]:
list_of_trends = None
### BEGIN SOLUTION
list_of_trends = [x['name'] for x in local_trends[0]['trends']]
### END SOLUTION
list_of_trends

['#DemDebate',
 'nick smith',
 '#NZvENG',
 '#HiNZ2019',
 'grammy',
 '#22OpeningDay',
 '#FordTrophy',
 'Grace Millane',
 'Test',
 'Spurs',
 'Winston',
 'Jose',
 'dave rennie',
 'Minister',
 'Peters',
 'Poch',
 'Stephen Colbert',
 'tottenham',
 'sondland',
 'nunes',
 'mike hosking',
 'NZ First',
 'wallabies',
 'LinkedIn',
 'winnie',
 'Disney',
 'deputy pm',
 'Tauranga',
 'Flat Bush',
 'Timaru',
 'kiwibuild',
 'Wrong',
 'united',
 'coldplay',
 'goty',
 'Dems',
 'andrew little',
 '#impeachmenthearing',
 '#Paedsoc2019',
 '#futureofwork',
 '#thisisbts',
 '#nzqt',
 '#wellingtonparanormal',
 '#nzmadeday',
 '#ndfnz',
 '#ReadyToRun',
 '#xeroroadshow',
 '#BB13',
 '#DF19',
 '#christchurch']

`isinstance(obj, class_or_tuple)` is a Python method that returns whether an object is an instance of a class or of a subclass thereof.

In [51]:
assert isinstance(list_of_trends, list), "list_of_trends should be a list"

## Exercise 4: Collecting search results

Now let's execute a search on Twitter for the most popular trend and repeat the filtering step performed during the lectures to remove duplicate results.

Set the `q` variable to the most popular trend in the list we retrieved above:

In [59]:
q = None
### BEGIN SOLUTION
q='#wellingtonparanormal'
### END SOLUTION

Then let's execute the Twitter search. We do this using the `.search.tweets(q, count)` and pass in the hashtag we're searching for to the q kwarg, and set the count kwarg to the maximum number of tweets we'd like to return.

In [77]:
# DO NOT MODIFY
count = 100

search_results = twitter_api.search.tweets(q=q, count=count)

statuses = search_results['statuses']
print(len(statuses))
statuses

100


[{'created_at': 'Thu Nov 21 05:24:13 +0000 2019',
  'id': 1197385202781302784,
  'id_str': '1197385202781302784',
  'text': 'I watched S02E06 of Wellington Paran...! #wellingtonparanormal  #tvtime https://t.co/EbRJpP0GYS https://t.co/3S3uJXKXs3',
  'truncated': False,
  'entities': {'hashtags': [{'text': 'wellingtonparanormal',
     'indices': [41, 62]},
    {'text': 'tvtime', 'indices': [64, 71]}],
   'symbols': [],
   'user_mentions': [],
   'urls': [{'url': 'https://t.co/EbRJpP0GYS',
     'expanded_url': 'https://tvtime.com/r/1dNKN',
     'display_url': 'tvtime.com/r/1dNKN',
     'indices': [72, 95]}],
   'media': [{'id': 1197385200591880194,
     'id_str': '1197385200591880194',
     'indices': [96, 119],
     'media_url': 'http://pbs.twimg.com/media/EJ33fvSWwAIpHA2.jpg',
     'media_url_https': 'https://pbs.twimg.com/media/EJ33fvSWwAIpHA2.jpg',
     'url': 'https://t.co/3S3uJXKXs3',
     'display_url': 'pic.twitter.com/3S3uJXKXs3',
     'expanded_url': 'https://twitter.com/JoshAll

The below scans through all of the retrieved tweets from our given hashtag search results and stores the tweet and tweet text in to two separate lists. Before adding, it checks if the tweet text exists already - if it does, the tweet is not added again. This avoids accumulating duplicates.

In [95]:
# DO NOT MODIFY

all_text = []
filtered_statuses = []
for s in statuses:
    if not s["text"] in all_text:
        filtered_statuses.append(s)
        all_text.append(s["text"])
        
print(len(filtered_statuses))
filtered_statuses[0].keys()

87


dict_keys(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities', 'extended_entities', 'metadata', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'lang'])

## Exercise 5: Create a list of retweet count and status tuples

We want to sort the tweets by the retweet count, therefore the first step is to create a list of **tuples** where the first element is the retweet count and then use the `sorted` function to perform the sorting operation.

In [91]:
retweets = None
### BEGIN SOLUTION
retweets = [(x['retweet_count'], x['text']) for x in filtered_statuses]
### END SOLUTION

In [92]:
assert len(retweets) == len(filtered_statuses), "Make sure you are using filterest_statuses and not statuses"
assert len(retweets[0]) == 2, "Each tuple should only have 2 elements, retweet count and the tweet text"

## Exercise 6: Sort a list of tweets

Use the built in Python `sorted` function to sort retweets and get the 10 more popular tweets (based on number of retweets). We'd like to have the more popular tweet first.

In [101]:
popular_tweets = None
### BEGIN SOLUTION
popular_tweets = sorted(retweets, reverse=True)[:10]
### END SOLUTION

popular_tweets

[(1919,
  'RT @AJemaineClement: #WellingtonParanormal is on the case. https://t.co/3fqt2QrXKj'),
 (21,
  'RT @OfficerOLearyNZ: It’s me and me Mum #wellingtonparanormal #lyndatopp #legend #duckhunter https://t.co/M3cD7mlDYm'),
 (9,
  'RT @AJemaineClement: Love watching you guys live tweeting about #WellingtonParanormal\n\nThank you all for watching. \n\nThere will be new epis…'),
 (9,
  'Love watching you guys live tweeting about #WellingtonParanormal\n\nThank you all for watching. \n\nThere will be new e… https://t.co/JCDCvpf9mS'),
 (7,
  'RT @MrMikeMinogue: Tonight marks the end of season 2 of #wellingtonparanormal and we’re going out on a high. It’s been an absolute joy. See…'),
 (1,
  'Thank you team #WellingtonParanormal for another brilliant season over waaaaay too soon.'),
 (1,
  'RT @nickijingles: Thank you team #WellingtonParanormal for another brilliant season over waaaaay too soon.'),
 (0,
  'กุไม่น่าไปส่องแทกใน NZ เจออันนี้ขึ้นมาเลยเพิ่งรู้เนี่ยว่ามีซีรี่นี้ ฮืออออ อีบ้าอีบอ

In [102]:
assert len(popular_tweets) == 10, "Find the 10 most popular"
assert popular_tweets[0][0] >= popular_tweets[-1][0], "Make sure you are sorting in descending order"