# Twitter Example   
## ACE Cluster
### School of Psychology, Massey University

**Twitter API Setup:**
To use the Twitter API you need to register as an app developer. All you need is a Twitter account. When you register an app you are given four encryption keys, two public and two private. One pair is to identify you to the Twitter server and the other is to allow  someone using your app to give you access to their data without them having to share their private credentials with your app.

In [8]:
import twitter

consumer_key, consumer_secret = twitter.read_token_file("consumer.txt")
oauth_token, oauth_secret = twitter.read_token_file("oauth.txt") 
auth = twitter.oauth.OAuth(oauth_token, oauth_secret, consumer_key, consumer_secret)
twitter_api = twitter.Twitter(auth=auth)
print(twitter_api)

<twitter.api.Twitter object at 0x03AE5CF0>


The **twitter_api** object exists which means we are good to go.

Let's find out what we know about @MasseyUni:

In [104]:
massey_info = twitter_api.users.show(screen_name = "MasseyUni")
print(massey_info.keys())

dict_keys(['id', 'id_str', 'name', 'screen_name', 'location', 'profile_location', 'description', 'url', 'entities', 'protected', 'followers_count', 'friends_count', 'listed_count', 'created_at', 'favourites_count', 'utc_offset', 'time_zone', 'geo_enabled', 'verified', 'statuses_count', 'lang', 'status', 'contributors_enabled', 'is_translator', 'is_translation_enabled', 'profile_background_color', 'profile_background_image_url', 'profile_background_image_url_https', 'profile_background_tile', 'profile_image_url', 'profile_image_url_https', 'profile_banner_url', 'profile_link_color', 'profile_sidebar_border_color', 'profile_sidebar_fill_color', 'profile_text_color', 'profile_use_background_image', 'has_extended_profile', 'default_profile', 'default_profile_image', 'following', 'follow_request_sent', 'notifications', 'translator_type'])


Lots of goodies. Let's check out the profile image:

In [105]:
print(massey_info['profile_image_url'])

http://pbs.twimg.com/profile_images/2039681940/massey-profile-pic_normal.jpg


How many followers does @MasseyUni have?

In [106]:
print(massey_info["followers_count"])

11844


How many Tweets (including retweets) has @MasseyUni issued?

In [107]:
print(massey_info["statuses_count"])

12908


Let's grab some Tweets from @MasseyUni (which Twitter also calls **statuses** or **status updates**).

In [108]:
q = "@MasseyUni" # The query string - the string we are going to search Twitter with
count = 5
results = twitter_api.search.tweets(q=q, count=count)

 Like most web services, Twitter returns data in **json** format (json = JavaScript Object Notation)
 which is very similar to a Python dictionary containing other nested dictionaries and lists in its structure. To print json in a readable
 manner I will use the json dump string function aliased to dump.
 
 A tweet is only 140 characters but the information Twitter provides for each tweet is around 5kB.

In [109]:
from json import dumps as dump

print(dump(results, indent=2))

{
  "statuses": [
    {
      "created_at": "Thu Mar 09 19:33:37 +0000 2017",
      "id": 839922137536172032,
      "id_str": "839922137536172032",
      "text": "RT @TheAMShowNZ: @MasseyUni Political Marketing Expert Professor Claire Robinson says a co leadership wouldn't work for @nzlabour - \"it's p\u2026",
      "truncated": false,
      "entities": {
        "hashtags": [],
        "symbols": [],
        "user_mentions": [
          {
            "screen_name": "TheAMShowNZ",
            "name": "The AM Show",
            "id": 2163648486,
            "id_str": "2163648486",
            "indices": [
              3,
              15
            ]
          },
          {
            "screen_name": "MasseyUni",
            "name": "Massey University",
            "id": 27494710,
            "id_str": "27494710",
            "indices": [
              17,
              27
            ]
          },
          {
            "screen_name": "nzlabour",
            "name": "New Zealand L

That is just 5 tweets!

The data is returned as a dictionary at the topmost level. The first item is **statuses** which are the actual tweets. The second is **search_metadata**.

In [110]:
print(results.keys())

dict_keys(['statuses', 'search_metadata'])


Let's take a look at what is inside the first tweet (aka statuses[0])

In [111]:
print(results["statuses"][0].keys())

dict_keys(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities', 'metadata', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'retweeted_status', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'lang'])


**"text"** is the actual text of the tweet. Let's print them:

In [112]:
for tweet in results["statuses"]:
    print(tweet["text"])

RT @TheAMShowNZ: @MasseyUni Political Marketing Expert Professor Claire Robinson says a co leadership wouldn't work for @nzlabour - "it's p…
RT @TheAMShowNZ: @MasseyUni Political Marketing Expert Professor Claire Robinson says the Adern brand for @nzlabour is good - "she reflects…
RT @TheAMShowNZ: @MasseyUni Political Marketing Expert Professor Claire Robinson - "@jacindaardern will attract younger voters"
@TheAMShowNZ @MasseyUni If trump can win USA Presidency, anything is possible #changethegovt
@MasseyUni Political Marketing Expert Professor Claire Robinson says a co leadership wouldn't work for @nzlabour - "it's problematic"


Let's see what is inside a single tweet.

**"entities"** are things like hashtags, users and urls mentioned in a tweet. Let's check out the entities for the first tweet: 

In [113]:
t1 = results["statuses"][0] # t1 = tweet 1, saves writing results["statuses"][0] all the time

print(dump(t1["entities"], indent=1))

{
 "hashtags": [],
 "symbols": [],
 "user_mentions": [
  {
   "screen_name": "TheAMShowNZ",
   "name": "The AM Show",
   "id": 2163648486,
   "id_str": "2163648486",
   "indices": [
    3,
    15
   ]
  },
  {
   "screen_name": "MasseyUni",
   "name": "Massey University",
   "id": 27494710,
   "id_str": "27494710",
   "indices": [
    17,
    27
   ]
  },
  {
   "screen_name": "nzlabour",
   "name": "New Zealand Labour",
   "id": 15466126,
   "id_str": "15466126",
   "indices": [
    120,
    129
   ]
  }
 ],
 "urls": []
}


**"user"** contains infromation about the original tweeter:

In [114]:
print(dump(t1["user"], indent=2))

{
  "id": 176731709,
  "id_str": "176731709",
  "name": "Claire Robinson",
  "screen_name": "Spinprofessor",
  "location": "Wellington",
  "description": "Professor, political commentator, Pro Vice-Chancellor College of Creative Arts, Massey University, member Stakeholder Advisory Group Callaghan Innovation",
  "url": null,
  "entities": {
    "description": {
      "urls": []
    }
  },
  "protected": false,
  "followers_count": 1805,
  "friends_count": 542,
  "listed_count": 60,
  "created_at": "Tue Aug 10 08:32:12 +0000 2010",
  "favourites_count": 237,
  "utc_offset": 46800,
  "time_zone": "Wellington",
  "geo_enabled": false,
  "verified": false,
  "statuses_count": 3840,
  "lang": "en",
  "contributors_enabled": false,
  "is_translator": false,
  "is_translation_enabled": false,
  "profile_background_color": "533109",
  "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/392939837/popdoodle3__bottom_.jpg",
  "profile_background_image_url_https": "https

Find out what the original tweeter has been tweeting about:

In [125]:
original_tweeter = t1["user"]["screen_name"]
print("Original Tweeter: ", original_tweeter)
user_tweet_results = twitter_api.search.tweets(q=original_tweeter, count=10)
user_tweets = user_tweet_results["statuses"]
for tweet in user_tweets:
    print(tweet["text"])

Original Tweeter:  Spinprofessor
RT @TheAMShowNZ: Professor Claire Robinson says @Labournz won't win this year's election but will hands down win the next one in 2020.
RT @TheAMShowNZ: @MasseyUni Political Marketing Expert Professor Claire Robinson says a co leadership wouldn't work for @nzlabour - "it's p…
RT @TheAMShowNZ: @MasseyUni Political Marketing Expert Professor Claire Robinson says the Adern brand for @nzlabour is good - "she reflects…
RT @TheAMShowNZ: @MasseyUni Political Marketing Expert Professor Claire Robinson - "@jacindaardern will attract younger voters"
@fractaldesign @cocamassey @Spinprofessor o don't worry about that… i checked as soon as i saw it!
@cocamassey @Spinprofessor @tristamsparks zoom and enhance on that email 😬🔥
RT @cocamassey: Stage set for Tim Brennan's inaugural Professorial - From Nomad to Monad @Spinprofessor @tristamsparks https://t.co/Te3Igww…
Stage set for Tim Brennan's inaugural Professorial - From Nomad to Monad @Spinprofessor @tristamsparks htt

How many users is @MasseyUni following (i.e. @MasseyUni's friends)?

In [116]:
print(massey_info["friends_count"])

3060


Show the latest 20 users @MasseyUni follows:

In [117]:
friends = twitter_api.friends.list(screen_name="MasseyUni", count = 20)
for friend in friends["users"]:
    print(friend["screen_name"])

offgrid17
AwhinaEnglish
TracieMafileo
socialpolicy_py
MasseyUSocialWk
rthazou
RNDrive
HelReynolds
NewsroomNZ
ProfJanThomas
wcwtp
komikanya
BA_Yildirim
RT_Erdogan
mesutcemil
glsnayldz
sebomubu
BritInWgtn
DefsecNZ
profjanemills


Show the latest 20 followers of @MasseyUni:

In [118]:
followers = twitter_api.followers.list(screen_name="MasseyUni", count = 20)
for follower in followers["users"]:
    print(follower["screen_name"])

LNNR07
norma_joanne
wanieaimi
herdsa_conf
richardram
ujez689
dirkli1992
kiravdheijden
SeyonceS
mrloudon2
DeallaSmith
Phiric_NZ
thewavenz
TOlaaiga
S_A_PARTNERS
amardeepmoga
ashleyruthxx
NZGovEcon
josojm
JasonLRobinson


**Note:** Twitter places limits on how many friends and followers can be downloaded at once. If obtaining full user data, 200 can be returned in one request (no more than 15 requests in 15 minutes is allowed). If friends/followers are obtained by ID number only, Twitter allows 5000 user IDs to be returned in one request.

---

How about what is trending in NZ? First we need to find the Yahoo! WOE (Where On Earth) code for NZ. There is a simple lookup page at [http://woeid.rosselliot.co.nz/](http://woeid.rosselliot.co.nz/) 

It turns out NZ is 23424916

In [119]:
WOE_NZ = 23424916
nz_trends = twitter_api.trends.place(_id=WOE_NZ) 
# the underscore on _id is needed because of a quirk in the python twitter API ("id" is reserved for another purpose)
# It turns out trends is a list of dictionaries with only one element. Not much of a list really :)
print(nz_trends[0].keys())

dict_keys(['trends', 'as_of', 'created_at', 'locations'])


The interesting stuff is in the 'trends' key. Let's take a look at the first 10:

In [120]:
for trend in nz_trends[0]["trends"][0:10]:
    print(dump(trend, indent=2))

{
  "name": "#NRLRoostersBulldogs",
  "url": "http://twitter.com/search?q=%23NRLRoostersBulldogs",
  "promoted_content": null,
  "query": "%23NRLRoostersBulldogs",
  "tweet_volume": null
}
{
  "name": "#internationalwomensday",
  "url": "http://twitter.com/search?q=%23internationalwomensday",
  "promoted_content": null,
  "query": "%23internationalwomensday",
  "tweet_volume": 1726818
}
{
  "name": "#IWD2017",
  "url": "http://twitter.com/search?q=%23IWD2017",
  "promoted_content": null,
  "query": "%23IWD2017",
  "tweet_volume": 272926
}
{
  "name": "#nzjscon",
  "url": "http://twitter.com/search?q=%23nzjscon",
  "promoted_content": null,
  "query": "%23nzjscon",
  "tweet_volume": null
}
{
  "name": "#edchatnz",
  "url": "http://twitter.com/search?q=%23edchatnz",
  "promoted_content": null,
  "query": "%23edchatnz",
  "tweet_volume": null
}
{
  "name": "Ross Taylor",
  "url": "http://twitter.com/search?q=%22Ross+Taylor%22",
  "promoted_content": null,
  "query": "%22Ross+Taylor%22",
 

Now I will try something more complicated. Let's compare the **lexical diversity** of Massey tweets with Victoria tweets. Lexical diversity will be crudely defined as **the number of unique words divided by the total number of words** in a list of tweets, N = 100 say.

First get 100 tweets from Massey and Victoria:

In [126]:
count = 100

q = "@MasseyUni"
massey_tweets = twitter_api.search.tweets(q=q, count=count)["statuses"]

q = "@VicUniWgtn"
victoria_tweets = twitter_api.search.tweets(q=q, count=count)["statuses"]

Now extract the text of each tweet for both Massey and Victoria (using a Python *list comprehension*):

In [None]:
massey_texts = [tweet["text"] for tweet in massey_tweets]
victoria_texts = [tweet["text"] for tweet in victoria_tweets]

Now we need to break down each text into individual words add the words to a list. I will exclude 'words' that include punctuation (like hastags, screen names etc.) by means of Pythons isalpha() method. We need to iterate over each text and then over each word in the text:

In [143]:
massey_words = [word 
                    for text in massey_texts
                        for word in text.split() if word.isalpha()]
victoria_words = [word 
                    for text in victoria_texts
                        for word in text.split() if word.isalpha()]

print("50 Massey words:\n\n", massey_words[0:50])
print("\n50 Victoria words:\n\n", victoria_words[0:50])

Massey words:

 ['RT', 'RT', 'RT', 'Political', 'Marketing', 'Expert', 'Professor', 'Claire', 'Robinson', 'will', 'attract', 'younger', 'RT', 'Political', 'Marketing', 'Expert', 'Professor', 'Claire', 'Robinson', 'says', 'the', 'Adern', 'brand', 'for', 'is', 'good', 'RT', 'Political', 'Marketing', 'Expert', 'Professor', 'Claire', 'Robinson', 'says', 'a', 'co', 'leadership', 'work', 'for', 'Massey', 'University', 'celebrates', 'world', 'rankings', 'RT', 'Political', 'Marketing', 'Expert', 'Professor', 'Claire']

Victoria words:

 ['Congratulations', 'to', 'being', 'awarded', 'Centres', 'of', 'Asia', 'Pacific', 'and', 'to', 'host', 'new', 'Centres', 'of', 'Excellence', 'HT', 'Goooood', 'morning', 'Another', 'excited', 'bunch', 'of', 'newbie', 'computer', 'scientists', 'are', 'about', 'to', 'hear', 'RT', 'Next', 'week', 'we', 'are', 'celebrating', 'indelible', 'poets', 'on', 'compostable', 'launching', 'six', 'coffee', 'cups', 'with', 'some', 'of', 'most', 'really', 'help', 'lift']


Notice that the words are not unique. We have just split up all the texts and pulled out strings of alpha characters. It's easy to convert a list of words into a *set* of unique words by using Pythons set() method:

In [160]:
unique_massey_words = set(massey_words)
unique_victoria_words = set(victoria_words)

Python makes it that easy! Now we have everything we need to compare the lexical diversity.

In [161]:
print("Massey:", len(unique_massey_words), "unique words out of", len(massey_words), "=",
      len(unique_massey_words) / len(massey_words),"\n")
print(unique_massey_words, "\n\n")
print("\nVictoria:", len(unique_victoria_words), "unique words out of", len(victoria_words), "=",
      len(unique_victoria_words) / len(victoria_words),"\n")
print(unique_victoria_words)

Massey: 385 unique words out of 1046 = 0.36806883365200765 

{'Tech', 'too', 'date', 'lovely', 'International', 'language', 'no', 'adaptations', 'Scientists', 'a', 'discover', 'way', 'working', 'organiser', 'day', 'results', 'something', 'Venturia', 'Really', 'health', 'Grove', 'rankings', 'killed', 'new', 'Congrats', 'carbon', 'looking', 'to', 'younger', 'nursing', 'outstanding', 'economic', 'Supper', 'fourth', 'climate', 'final', 'system', 'emerging', 'lab', 'Mansfield', 'setting', 'her', 'cost', 'memorial', 'STEM', 'Join', 'Early', 'today', 'brand', 'launch', 'Remind', 'year', 'boot', 'peaceful', 'school', 'Shout', 'UCOL', 'will', 'happen', 'hey', 'Pregnant', 'reflects', 'here', 'viruses', 'us', 'camp', 'Chancellor', 'Charlotte', 'Sport', 'with', 'not', 'Listen', 'Watch', 'Veronica', 'hosts', 'Political', 'Championship', 'much', 'Cron', 'breast', 'Ed', 'co', 'many', 'good', 'Expert', 'be', 'top', 'celebrates', 'doing', 'Women', 'PhD', 'Dr', 'cafes', 'keynote', 'Assoc', 'lads', 'incl