# Twitter Example   
## ACE Cluster
### School of Psychology, Massey University

**Twitter API Setup:**
To use the Twitter API you need to register as an app developer. All you need is a Twitter account. When you register an app you are given four encryption keys, two public and two private. One pair is to identify you to the Twitter server and the other is to allow  someone using your app to give you access to their data without them having to share their private credentials with your app.

In [1]:
import twitter

consumer_key, consumer_secret = twitter.read_token_file("consumer.txt")
oauth_token, oauth_secret = twitter.read_token_file("oauth.txt") 
auth = twitter.oauth.OAuth(oauth_token, oauth_secret, consumer_key, consumer_secret)
twitter_api = twitter.Twitter(auth=auth)
print(twitter_api)

<twitter.api.Twitter object at 0x032EBFD0>


The **twitter_api** object exists which means we are good to go.

Let's find out what we know about @MasseyUni:

In [2]:
massey_info = twitter_api.users.show(screen_name = "MasseyUni")
print(massey_info.keys())

dict_keys(['id', 'id_str', 'name', 'screen_name', 'location', 'profile_location', 'description', 'url', 'entities', 'protected', 'followers_count', 'friends_count', 'listed_count', 'created_at', 'favourites_count', 'utc_offset', 'time_zone', 'geo_enabled', 'verified', 'statuses_count', 'lang', 'status', 'contributors_enabled', 'is_translator', 'is_translation_enabled', 'profile_background_color', 'profile_background_image_url', 'profile_background_image_url_https', 'profile_background_tile', 'profile_image_url', 'profile_image_url_https', 'profile_banner_url', 'profile_link_color', 'profile_sidebar_border_color', 'profile_sidebar_fill_color', 'profile_text_color', 'profile_use_background_image', 'has_extended_profile', 'default_profile', 'default_profile_image', 'following', 'follow_request_sent', 'notifications', 'translator_type'])


Lots of goodies. Let's check out the profile image:

In [3]:
print(massey_info['profile_image_url'])

http://pbs.twimg.com/profile_images/2039681940/massey-profile-pic_normal.jpg


How many followers does @MasseyUni have?

In [4]:
print(massey_info["followers_count"])

11857


How many Tweets (including retweets) has @MasseyUni issued?

In [5]:
print(massey_info["statuses_count"])

12914


Let's grab some Tweets from @MasseyUni (which Twitter also calls **statuses** or **status updates**).

In [6]:
q = "@MasseyUni" # The query string - the string we are going to search Twitter with
count = 5
results =  twitter_api.statuses.user_timeline(screen_name="@MasseyUni", count=count)

 Like most web services, Twitter returns data in **json** format (json = JavaScript Object Notation)
 which is very similar to a Python dictionary containing other nested dictionaries and lists in its structure. To print json in a readable
 manner I will use the json dump string function aliased to dump.
 
 A tweet is only 140 characters but the information Twitter provides for each tweet is around 5kB.

In [7]:
from json import dumps as dump

print(dump(results, indent=2))

[
  {
    "created_at": "Mon Mar 13 06:45:01 +0000 2017",
    "id": 841178264039231489,
    "id_str": "841178264039231489",
    "text": "#MasseyUni project looks at empowering iwi &amp; hap\u016b to be partners in the co-management of healthy estuaries\u2026 https://t.co/del8USycqi",
    "truncated": true,
    "entities": {
      "hashtags": [
        {
          "text": "MasseyUni",
          "indices": [
            0,
            10
          ]
        }
      ],
      "symbols": [],
      "user_mentions": [],
      "urls": [
        {
          "url": "https://t.co/del8USycqi",
          "expanded_url": "https://twitter.com/i/web/status/841178264039231489",
          "display_url": "twitter.com/i/web/status/8\u2026",
          "indices": [
            112,
            135
          ]
        }
      ]
    },
    "source": "<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">TweetDeck</a>",
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": nul

That is just 5 tweets!

The data is returned as a dictionary at the topmost level. The first item is **statuses** which are the actual tweets. The second is **search_metadata**.

In [8]:
print(len(results))

5


Let's take a look at what is inside the first tweet (aka statuses[0])

In [9]:
print(results[0].keys())

dict_keys(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'lang'])


**"text"** is the actual text of the tweet. Let's print them:

In [10]:
for n in range(count):
    print(results[n]["text"])

#MasseyUni project looks at empowering iwi &amp; hapū to be partners in the co-management of healthy estuaries… https://t.co/del8USycqi
#VIBES student blogger Elliot says Classical Studies is "full-throttle Game of Thrones madness" in today's post https://t.co/p3XRvl7JK2
Massey researcher reckons Kiwi start-ups could be missing out on investment with the wrong style of approach. https://t.co/5g3ZkKH9tG
New research: troubled NZ youth transfer gang behaviour to Samoa - Dr Gisa Moses Faleolo #MasseyUni… https://t.co/tsB28drzea
Looking for short term, on-campus accommodation in the Manawatu? Spaces are still available, but get in quick https://t.co/Yfkfp8HffY


In [11]:
for n in range(count):
    print(results[n]["source"])

<a href="https://about.twitter.com/products/tweetdeck" rel="nofollow">TweetDeck</a>
<a href="http://sproutsocial.com" rel="nofollow">Sprout Social</a>
<a href="https://about.twitter.com/products/tweetdeck" rel="nofollow">TweetDeck</a>
<a href="https://about.twitter.com/products/tweetdeck" rel="nofollow">TweetDeck</a>
<a href="http://sproutsocial.com" rel="nofollow">Sprout Social</a>


Let's see what is inside a single tweet.

**"entities"** are things like hashtags, users and urls mentioned in a tweet. Let's check out the entities for the first tweet: 

In [13]:
t1 = results[0] # t1 = tweet 1, saves writing results["statuses"][0] all the time

print(dump(t1["entities"], indent=1))

{
 "hashtags": [
  {
   "text": "MasseyUni",
   "indices": [
    0,
    10
   ]
  }
 ],
 "symbols": [],
 "user_mentions": [],
 "urls": [
  {
   "url": "https://t.co/del8USycqi",
   "expanded_url": "https://twitter.com/i/web/status/841178264039231489",
   "display_url": "twitter.com/i/web/status/8\u2026",
   "indices": [
    112,
    135
   ]
  }
 ]
}


**"user"** contains infromation about the original tweeter:

In [17]:
print(t1["user"]["screen_name"])

MasseyUni


In this case the original tweeter is MasseyUni but if it was a retweet we could get the original tweeter's user profile, follower, friends and tweets. 

How many users is @MasseyUni following (i.e. @MasseyUni's friends)?

In [18]:
print(massey_info["friends_count"])

3057


Show the latest 20 users @MasseyUni follows:

In [19]:
friends = twitter_api.friends.list(screen_name="MasseyUni", count = 20)
for friend in friends["users"]:
    print(friend["screen_name"])

offgrid17
AwhinaEnglish
TracieMafileo
socialpolicy_py
MasseyUSocialWk
rthazou
RNDrive
HelReynolds
NewsroomNZ
ProfJanThomas
wcwtp
komikanya
BA_Yildirim
RT_Erdogan
mesutcemil
glsnayldz
sebomubu
BritInWgtn
DefsecNZ
profjanemills


Show the latest 20 followers of @MasseyUni:

In [20]:
followers = twitter_api.followers.list(screen_name="MasseyUni", count = 20)
for follower in followers["users"]:
    print(follower["screen_name"])

TimGree91086091
katepmora
customkiwi
chicks_pt
AgathaRobles3
est_intrepide
hannahrubyjen
Ashleyjanehowar
nzfgw
LachiePhilipson
BrnJames
AnsariIsrajul
jomub
PecsSummerscho
Jaime_Nielsen
GeorgiaKendal10
IMatainaho
nicolegesmundo
GarmyGarms
AnnaJoyDixon


**Note:** Twitter places limits on how many friends and followers can be downloaded at once. If obtaining full user data, 200 can be returned in one request (no more than 15 requests in 15 minutes is allowed). If friends/followers are obtained by ID number only, Twitter allows 5000 user IDs to be returned in one request.

---

How about what is trending in NZ? First we need to find the Yahoo! WOE (Where On Earth) code for NZ. There is a simple lookup page at [http://woeid.rosselliot.co.nz/](http://woeid.rosselliot.co.nz/) 

It turns out NZ is 23424916

In [21]:
WOE_NZ = 23424916
nz_trends = twitter_api.trends.place(_id=WOE_NZ) 
# the underscore on _id is needed because of a quirk in the python twitter API ("id" is reserved for another purpose)
# It turns out trends is a list of dictionaries with only one element. Not much of a list really :)
print(nz_trends[0].keys())

dict_keys(['trends', 'as_of', 'created_at', 'locations'])


The interesting stuff is in the 'trends' key. Let's take a look at the first 10:

In [22]:
for trend in nz_trends[0]["trends"][0:10]:
    print(dump(trend, indent=2))

{
  "name": "#nzqt",
  "url": "http://twitter.com/search?q=%23nzqt",
  "promoted_content": null,
  "query": "%23nzqt",
  "tweet_volume": null
}
{
  "name": "#PlunketShield",
  "url": "http://twitter.com/search?q=%23PlunketShield",
  "promoted_content": null,
  "query": "%23PlunketShield",
  "tweet_volume": null
}
{
  "name": "#BBDSummit",
  "url": "http://twitter.com/search?q=%23BBDSummit",
  "promoted_content": null,
  "query": "%23BBDSummit",
  "tweet_volume": null
}
{
  "name": "Parliament",
  "url": "http://twitter.com/search?q=Parliament",
  "promoted_content": null,
  "query": "Parliament",
  "tweet_volume": 69484
}
{
  "name": "Murray Ball",
  "url": "http://twitter.com/search?q=%22Murray+Ball%22",
  "promoted_content": null,
  "query": "%22Murray+Ball%22",
  "tweet_volume": null
}
{
  "name": "#KievMajor",
  "url": "http://twitter.com/search?q=%23KievMajor",
  "promoted_content": null,
  "query": "%23KievMajor",
  "tweet_volume": null
}
{
  "name": "#BFC630NZ",
  "url": "http:/

Now I will try something more complicated. Let's compare the **lexical diversity** of Massey tweets with Victoria tweets. Lexical diversity will be crudely defined as **the number of unique words divided by the total number of words** in a list of tweets, N = 100 say.

First get 100 tweets from Massey and Victoria:

In [23]:
count = 100

q = "@MasseyUni"
#massey_tweets = twitter_api.search.tweets(q=q, count=count)["statuses"]
massey_tweets = twitter_api.statuses.user_timeline(screen_name="@MasseyUni", count=count)
q = "@VicUniWgtn"
#victoria_tweets = twitter_api.search.tweets(q=q, count=count)["statuses"]
victoria_tweets = twitter_api.statuses.user_timeline(screen_name="@VicUniWgtn", count=count)

Now extract the text of each tweet for both Massey and Victoria (using a Python *list comprehension*):

In [24]:
massey_texts = [tweet["text"] for tweet in massey_tweets]
victoria_texts = [tweet["text"] for tweet in victoria_tweets]

Now we need to break down each text into individual words add the words to a list. I will exclude 'words' that include punctuation (like hastags, screen names etc.) by means of Pythons isalpha() method. We need to iterate over each text and then over each word in the text:

In [25]:
massey_words = [word 
                    for text in massey_texts
                        for word in text.split() if word.isalpha()]
victoria_words = [word 
                    for text in victoria_texts
                        for word in text.split() if word.isalpha()]

print("50 Massey words:\n\n", massey_words[0:50])
print("\n50 Victoria words:\n\n", victoria_words[0:50])

50 Massey words:

 ['project', 'looks', 'at', 'empowering', 'iwi', 'hapū', 'to', 'be', 'partners', 'in', 'the', 'of', 'healthy', 'student', 'blogger', 'Elliot', 'says', 'Classical', 'Studies', 'is', 'Game', 'of', 'Thrones', 'in', 'post', 'Massey', 'researcher', 'reckons', 'Kiwi', 'could', 'be', 'missing', 'out', 'on', 'investment', 'with', 'the', 'wrong', 'style', 'of', 'New', 'troubled', 'NZ', 'youth', 'transfer', 'gang', 'behaviour', 'to', 'Samoa', 'Dr']

50 Victoria words:

 ['talks', 'exotic', 'species', 'and', 'messages', 'on', 'and', 'research', 'show', 'the', 'susceptibility', 'of', 'vital', 'groundwater', 'systems', 'to', 'is', 'the', 'coldest', 'how', 'warm', 'can', 'it', 'sets', 'the', 'record', 'is', 'Hear', 'from', 'leading', 'innovators', 'using', 'VR', 'in', 'Friday', 'names', 'alumna', 'former', 'staff', 'member', 'Prof', 'Linda', 'Trenberth', 'as', 'the', 'new', 'and', 'The', 'slippery', 'slope']


Notice that the words are not unique. We have just split up all the texts and pulled out strings of alpha characters. It's easy to convert a list of words into a *set* of unique words by using Pythons set() method:

In [26]:
unique_massey_words = set(massey_words)
unique_victoria_words = set(victoria_words)

Python makes it that easy! Now we have everything we need to compare the lexical diversity.

In [27]:
print("Massey:", len(unique_massey_words), "unique words out of", len(massey_words), "=",
      len(unique_massey_words) / len(massey_words),"\n")
print(unique_massey_words, "\n\n")
print("\nVictoria:", len(unique_victoria_words), "unique words out of", len(victoria_words), "=",
      len(unique_victoria_words) / len(victoria_words),"\n")
print(unique_victoria_words)

Massey: 582 unique words out of 1232 = 0.4724025974025974 

{'Need', 'Taranaki', 'empowering', 'earning', 'Tamariki', 'costs', 'work', 'Faleolo', 'Spaces', 'consumers', 'Prof', 'RadioLive', 'hero', 'Jon', 'at', 'iwi', 'expanding', 'support', 'available', 'women', 'Cup', 'gang', 'Details', 'spaces', 'partners', 'blockbuster', 'Regional', 'training', 'awesome', 'Kiwi', 'deal', 'has', 'Low', 'fire', 'us', 'regional', 'developed', 'lecturer', 'food', 'Club', 'now', 'Yule', 'New', 'minute', 'grow', 'snr', 'campus', 'journos', 'but', 'Aucklands', 'Jenny', 'country', 'tribute', 'Rising', 'she', 'after', 'talks', 'focus', 'revolutionises', 'Road', 'World', 'fuel', 'ideas', 'assist', 'three', 'transport', 'Kenshin', 'style', 'Brock', 'its', 'Robinson', 'under', 'via', 'peaceful', 'date', 'any', 'everyone', 'March', 'accommodation', 'wait', 'Facebook', 'we', 'introduces', 'when', 'growing', 'out', 'outlook', 'Miles', 'decriminalising', 'working', 'Norris', 'Finance', 'tests', 'definitely', 'mana

Get the intersection of Massey and Victoria words:

In [29]:
print(unique_victoria_words & unique_massey_words)

{'and', 'A', 'success', 'proud', 'meeting', 'host', 'part', 'his', 'work', 'back', 'by', 'with', 'School', 'Prof', 'better', 'student', 'discusses', 'at', 'how', 'PhD', 'employment', 'available', 'use', 'school', 'issue', 'up', 'international', 'RT', 'local', 'win', 'view', 'or', 'deal', 'has', 'so', 'won', 'developed', 'about', 'lecturer', 'now', 'reading', 'can', 'New', 'great', 'future', 'week', 'your', 'campus', 'but', 'highest', 'should', 'quality', 'leading', 'talks', 'children', 'just', 'helping', 'Road', 'explains', 'Congratulations', 'live', 'shows', 'who', 'of', 'event', 'its', 'deep', 'please', 'under', 'year', 'talk', 'Watch', 'any', 'are', 'first', 'annual', 'we', 'today', 'growing', 'our', 'out', 'media', 'for', 'Studies', 'last', 'as', 'students', 'could', 'not', 'world', 'on', 'to', 'Can', 'this', 'is', 'find', 'housing', 'The', 'launch', 'some', 'Summer', 'into', 'project', 'in', 'says', 'will', 'party', 'a', 'Dr', 'have', 'study', 'that', 'NZ', 'new', 'high', 'economy

The difference:

In [28]:
print(unique_victoria_words - unique_massey_words)

{'extra', 'ice', 'Marten', 'migration', 'inquiry', 'Master', 'Vercauteren', 'recorded', 'Volunteer', 'feel', 'Vanessa', 'irony', 'Vincent', 'professional', 'What', 'pics', 'firm', 'Friday', 'play', 'fauna', 'life', 'Steady', 'One', 'ranked', 'Travelling', 'publishes', 'really', 'Excellence', 'systems', 'academic', 'before', 'they', 'emissions', 'behind', 'sold', 'programme', 'which', 'well', 'able', 'attack', 'leader', 'anomaly', 'tackled', 'Sunday', 'genius', 'primary', 'make', 'John', 'prize', 'awful', 'foundation', 'volunteer', 'opinion', 'schools', 'setting', 'RSVP', 'Mackintosh', 'That', 'around', 'spent', 'friendship', 'From', 'symposium', 'finalist', 'police', 'temperatures', 'Tiriti', 'bringing', 'mystery', 'stamps', 'beautiful', 'towards', 'perspective', 'Manly', 'times', 'Get', 'Malone', 'groundwater', 'deals', 'lasting', 'US', 'link', 'balance', 'membership', 'used', 'design', 'strength', 'policy', 'he', 'news', 'safer', 'was', 'instruments', 'modelling', 'mineral', 'itself'

In [30]:
print(unique_massey_words - unique_victoria_words)

{'Need', 'Taranaki', 'empowering', 'earning', 'Tamariki', 'costs', 'Faleolo', 'Spaces', 'consumers', 'RadioLive', 'hero', 'Jon', 'expanding', 'iwi', 'support', 'women', 'Cup', 'gang', 'Details', 'spaces', 'partners', 'blockbuster', 'Regional', 'training', 'awesome', 'Kiwi', 'Low', 'fire', 'us', 'regional', 'food', 'Club', 'Yule', 'minute', 'grow', 'snr', 'journos', 'Aucklands', 'country', 'Jenny', 'tribute', 'Rising', 'she', 'after', 'focus', 'revolutionises', 'World', 'fuel', 'ideas', 'assist', 'three', 'transport', 'Kenshin', 'style', 'Brock', 'Robinson', 'via', 'peaceful', 'date', 'everyone', 'March', 'accommodation', 'wait', 'Facebook', 'introduces', 'when', 'outlook', 'Miles', 'decriminalising', 'working', 'Norris', 'Finance', 'tests', 'definitely', 'management', 'Had', 'cancer', 'featured', 'privilege', 'Studying', 'Check', 'BA', 'overstated', 'Professor', 'underway', 'Grafton', 'Whangarei', 'Glover', 'Coming', 'Come', 'olds', 'sustainable', 'reckons', 'mouldy', 'stop', 'discusse

Get the words that in one or other but not in both:
|

In [31]:
print(unique_massey_words ^ unique_victoria_words)

{'extra', 'Taranaki', 'ice', 'empowering', 'migration', 'Tamariki', 'inquiry', 'recorded', 'Faleolo', 'Spaces', 'RadioLive', 'feel', 'iwi', 'expanding', 'professional', 'What', 'firm', 'women', 'play', 'life', 'Steady', 'Cup', 'ranked', 'Travelling', 'publishes', 'Details', 'spaces', 'partners', 'training', 'academic', 'awesome', 'Kiwi', 'sold', 'Low', 'fire', 'us', 'which', 'regional', 'Club', 'Yule', 'attack', 'anomaly', 'Sunday', 'minute', 'grow', 'snr', 'journos', 'Aucklands', 'make', 'Jenny', 'tribute', 'after', 'awful', 'foundation', 'focus', 'volunteer', 'revolutionises', 'schools', 'setting', 'three', 'RSVP', 'Mackintosh', 'around', 'spent', 'style', 'Brock', 'friendship', 'symposium', 'Robinson', 'temperatures', 'via', 'peaceful', 'date', 'Tiriti', 'bringing', 'wait', 'Facebook', 'mystery', 'when', 'stamps', 'towards', 'working', 'Manly', 'times', 'groundwater', 'deals', 'Had', 'lasting', 'cancer', 'privilege', 'membership', 'Check', 'used', 'design', 'policy', 'Professor', 'h