# Exercise 4: APIs and Functions II 

## 4.1 Using the Twitter API to collect data

**4.1.1** Find the Twitter account of the University of Copenhagen's Faculty of Social Science _by hand_ and get their Twitter account information using `tweepy` functionality. Remember that you just started a new Jupyter Notebook, so you will have to load the necessary modules and set up your authentication with the Twitter API. 

In [1]:
#Importing tweepy and setting up auth
import tweepy
from AppCred import CONSUMER_KEY, CONSUMER_SECRET
from AppCred import ACCESS_TOKEN, ACCESS_TOKEN_SECRET
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)

ku_user=api.get_user('uni_copenhagen')

**4.1.2** When was this account created? Try to use the `str` and `print` commands to respond with a complete sentence.

In [2]:
print("The account " + ku_user.name + " was created at " + str(ku_user.created_at))

The account UniversityCopenhagen was created at 2011-05-12 09:31:02


**4.1.3** Can you find out 1) where this account is located, 2) how many people are following the account, and 3) how many accounts the account is following?

In [3]:
ku_user.location
#ku_user.followers_count
#ku_user.friends_count

'Copenhagen, Denmark'

**4.1.4** Next, get the timeline for this user "mfroman". What happens? Can you explain why?

In [4]:
api.user_timeline('mfroman')
# Returns we are not authorized. It is because the Twitter is private/tweets are protected, thus not publically available.

TweepError: Not authorized.

**4.1.5** Now, get the timeline for our example account "vicariousveblen". Some of the tweets were posted automatically, i.e. using a Python script. Can you tell from the metadata which? How?

In [5]:
example_timeline=api.user_timeline('vicariousveblen')
#print(example_timeline)

timetweet=[0]

for tweet in example_timeline:
    timetweet.append((tweet.text, str(tweet.created_at)))

print(timetweet)
    
# Looks like the three 'last' tweets were posted using a script. Can tell because they are all posted with 2 minutes between
# them, which suggests that a script has a wait-period of two minutes and posting them.

[0, ('For the end of vicarious consumption is to enhance, not the fullness of life of the consumer, but the pecuniary rep… https://t.co/x2O7ALCem3', '2020-02-18 16:49:23'), ('As has already been indicated, the distinction between exploit and drudgery is an invidious distinction between employments.', '2020-02-18 16:47:23'), ('High-bred manners and ways of living are items of conformity to the norm of conspicuous leisure and conspicuous consumption.', '2020-02-18 16:45:22'), ('Hello World!', '2020-02-18 16:16:46')]


## 4.2 Writing and using our own functions to process the Twitter data

**4.2.1** Collect the timeline for this account "CPH_SODAS".

In [6]:
cph_sodas = api.user_timeline("CPH_SODAS")

**4.2.2** Write a function that you can use to summarize the tweets in the timeline–feel free to look at the code examples we used earlier today.

In [9]:
#Will try to see the most frequently used words used by the account
#in order to get an idea of the general nature of their tweets.

# Defining function
def user_gist(accountname):
    #Setting up lists to use
    word_freq = {}
    word_list = []
    gist = []
    
    #Loop through each tweet in timeline
    for tweet in accountname:
        tweet_words = tweet.text.split() #splits tweets into separate words
        word_list.extend(tweet_words) #words are combined into wordlist using extend command
    
    #Loop through each word in word_list
    for w in word_list:
        if w not in word_freq:
            word_freq[w] = word_list.count(w)
            
    #Loop through dictionary, adds each value/key pair to the list
    for key in word_freq:
        gist.append((word_freq[key], key))
        
    #Sorting the gist list (goes min -> max)
    gist.sort()
    
    #Sorts gist list by max -> min
    gist.reverse()
        
    return(gist)

user_gist(cph_sodas)

[(10, 'on'),
 (9, 'the'),
 (9, 'RT'),
 (8, 'and'),
 (6, 'of'),
 (5, '@distractdenmark:'),
 (5, '#machineanthropology'),
 (4, 'workshop'),
 (4, 'to'),
 (4, 'a'),
 (3, 'this'),
 (3, 'social'),
 (3, 'series'),
 (3, 'out'),
 (3, 'in'),
 (3, 'how'),
 (3, 'for'),
 (3, 'about'),
 (3, 'Morten'),
 (3, 'Axel'),
 (3, '@suneman'),
 (3, '&amp;'),
 (2, 'with'),
 (2, 'the…'),
 (2, 'talk'),
 (2, 'stage'),
 (2, 'science'),
 (2, 'people'),
 (2, 'our'),
 (2, 'new'),
 (2, 'media'),
 (2, 'is'),
 (2, 'have'),
 (2, 'from'),
 (2, 'first'),
 (2, 'commenting'),
 (2, 'been'),
 (2, 'an'),
 (2, 'What'),
 (2, 'The'),
 (2, 'Pedersen'),
 (2, 'Machine'),
 (2, 'MSc'),
 (2, 'Join'),
 (2, 'In'),
 (2, '@andbjn:'),
 (2, '@RebAdlerNissen'),
 (2, '@CPH_SODAS'),
 (1, "📲'Politikere"),
 (1, 'you'),
 (1, 'will'),
 (1, 'widely'),
 (1, 'which'),
 (1, 'where'),
 (1, 'well-off'),
 (1, 'we'),
 (1, 'want'),
 (1, 'visited'),
 (1, 'very'),
 (1, 'verdens'),
 (1, 'various'),
 (1, 'using'),
 (1, 'use'),
 (1, 'us'),
 (1, 'two'),
 (1, 'trans

**4.2.3** Apply the function to the timeline data you collected. Without looking it up, what would you say this account tweets about?

In [None]:
#It tweets about workshops and talks presented by SODAS is my guess.
# The most common word is 'on', which could be for "X will gave a talk on X"

## 4.3 Follow Your Interests

**4.3.1** Identify three Twitter accounts _or_ key words of interest to you. Use the functionality we learned today to look at their history of the accounts, who tweets about your keywords, what do your accounts tweet about etc.

In [7]:
# @EsbjergfB + #hongkong + @Astralisgg

hk_tweets = api.search("#hongkong")

for tweet in hk_tweets:
    print(tweet._json['user']['name'])

loveourhome
Kwok Ka Ki 郭家麒
Nul n'Est une Ile 😷🇭🇰 ❄
Sunandy
Raymond😷
Edward Yu
kaiyanchen
nami
bluesky
boman
Ken Siu
K 😷 醫護同行
Sibalagon
BlessHK3😷
为民主中国网站


In [10]:
#Using earlier function to see common words used in #hongkong
user_gist(hk_tweets)

[(13, 'RT'),
 (11, 'to'),
 (11, 'in'),
 (6, 'jail'),
 (6, 'a'),
 (6, '@benedictrogers:'),
 (5, 'the'),
 (5, '#HongKong'),
 (4, 'of'),
 (4, 'is'),
 (4, 'after'),
 (3, 'years'),
 (3, 'widely'),
 (3, 'refused'),
 (3, 'kidnappe…'),
 (3, 'help'),
 (3, 'being'),
 (3, 'because'),
 (3, 'and'),
 (3, 'Sentenced'),
 (3, 'Please'),
 (3, 'Minhai'),
 (3, 'Gui'),
 (3, 'Chinese'),
 (3, '10'),
 (3, '#StandwithHK'),
 (3, '#DemocracyforHK'),
 (3, '#CarrieLam'),
 (2, 'until'),
 (2, 'share'),
 (2, 'schools'),
 (2, 'out'),
 (2, 'op-ed,'),
 (2, 'news'),
 (2, 'new'),
 (2, 'just'),
 (2, 'for'),
 (2, 'closed'),
 (2, 'My'),
 (2, 'Kong'),
 (2, 'Hong'),
 (2, '#…'),
 (2, '#StandwithUyghurs'),
 (2, '#FreeChina'),
 (2, '#CCP'),
 (1, '究竟憑咩著上全身保護衣？'),
 (1, '當基本保護裝備都唔足夠時，#醫護'),
 (1, '柳廣成'),
 (1, '企埋一邊嘅'),
 (1, '仍然被硬推上前線救人；'),
 (1, 'youngsters…'),
 (1, 'written'),
 (1, 'will'),
 (1, 'which'),
 (1, 'very'),
 (1, 'travelled'),
 (1, 'tomorrow.'),
 (1, 'today.'),
 (1, 'this'),
 (1, 'their'),
 (1, 'that'),
 (1, 'stay'),
 (1, 

In [11]:
#Using function to check commonly used words used by EsbjergfB
efb = api.user_timeline('EsbjergfB')
user_gist(efb)

[(9, 'i'),
 (6, 'på'),
 (6, '#sldk'),
 (6, '#lbkefb'),
 (5, 'til'),
 (5, 'for'),
 (4, '🔵⚪️'),
 (4, 'at'),
 (4, '@MartinBraith'),
 (3, 'og'),
 (3, 'mod'),
 (3, 'er'),
 (3, '@FCBarcelona'),
 (2, '🙏'),
 (2, 'vi'),
 (2, 'the'),
 (2, 'om'),
 (2, 'fans'),
 (2, 'den'),
 (2, 'Vi'),
 (2, 'Lyngby'),
 (2, '@LyngbyBoldklub'),
 (2, '-'),
 (1, '🧤🔵⚪️'),
 (1, '🙏👊'),
 (1, '🔵⚪️…'),
 (1, '🔥'),
 (1, '🔜'),
 (1, '📲…'),
 (1, '👏'),
 (1, '👊'),
 (1, '🎶'),
 (1, '🎥'),
 (1, '⚽️'),
 (1, '☘️'),
 (1, 'weekendens'),
 (1, 'week'),
 (1, 'vores'),
 (1, 'venter'),
 (1, 'vejen'),
 (1, 'undvære'),
 (1, 'travelling'),
 (1, 'to'),
 (1, 'tilbage,'),
 (1, 'tager'),
 (1, 'sætter'),
 (1, 'støtten'),
 (1, 'startopstilling'),
 (1, 'spillere'),
 (1, 'slår'),
 (1, 'sin'),
 (1, 'siger'),
 (1, 'samtidig'),
 (1, 'resultat'),
 (1, 'quizzen'),
 (1, 'på,'),
 (1, 'premieren,'),
 (1, 'plads:'),
 (1, 'periode:'),
 (1, 'ord'),
 (1, 'opvarmningen'),
 (1, 'opgave'),
 (1, 'op'),
 (1, 'on'),
 (1, 'når'),
 (1, 'nu'),
 (1, 'new'),
 (1, 'må'),
 (1, '

In [49]:
# How many tweets have Esbjerg fB made?
len(efb)
#Gives only 20. Seems unlikely?

20

In [51]:
#efb's location
efb_user=api.get_user('EsbjergfB')
efb_user.location

'Esbjerg, Denmark'