# Demo 4: APIs and Functions II 

## 4.1 Twitter API


**4.1.1 Installing and importing new modules:** Before we can interact with the Twitter API, we need to install the `tweepy` module. We would usually install modules outside our Jupyter Notebooks using the command line. However, we can actually also interact with the command line from within our Notebooks using the `!` operator. Now, uncomment the cell below and run it.

In [16]:
# # BEFORE WE CAN USE THE TWEEPY LIBRARY, WE NEED DO INSTALL IT
# # THAT IS, UNCOMMENT AND EXECUTE THIS CELL ONCE
# # need to use sys.prefix to install from within jupyter notebook
# # following: https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/
import sys
! conda install --yes --prefix {sys.prefix} -c conda-forge tweepy

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



After you have run the above cell, make sure everything worked as expected and that you have successfully installed `tweepy` by importing the module and checking it's version using the commands in the cell below.

In [17]:
import tweepy
print(tweepy.__version__)

3.8.0


**4.1.2 Loading credentials and authenticating to the API**: Now that we have installed and imported the `tweepy` module, we can use it to authenticate ourselves to the Twitter API. To do this, we first need to access our credentials from the file _AppCred.py_ we set up earlier in class. Running the cell below will load your Twitter developer credentials and make them available in this session of your Jupyter Notebook.

In [23]:
CONSUMER_KEY = "FULYEYV7jl8leJSPd1s1w3cDT"
CONSUMER_SECRET = "d2Cch1IQ6j14rexgL95vpSbVZdMNmpS4mJEyrsb5HvfiDwi5Up"

ACCESS_TOKEN = "1037999668343517184-5TV8TkeV13lMZHzUju1NXP0ArcMw7w"
ACCESS_TOKEN_SECRET = "BXXbmNSsr2UyPTE4At0f0zN1kzeCsxzLeDILDLtneQ5lA"

Now we can start the authentication process to access the Twitter API by passing your consumer details to the `OAuthHandler` function from the `tweepy` module.

In [24]:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)

Next, we add our access details to the `auth` variable we just created.

In [25]:
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

Finally, we pass our `auth` variable to the `API` function provided in the `tweepy` module to generate variable that allows us to interact with Twitter API.

In [26]:
api = tweepy.API(auth)

**4.1.3 Interacting with the Twitter API:** Now that we authenticated ourselves to the Twitter API, we can use it to post and delete tweets from our own account, favorite and retweet tweets from other accounts, and collect information from other public Twitter accounts. 

**4.1.3.1 Tweeting:** Let's try posting a tweet with our well known example using the `update_status` function.

In [27]:
api.update_status("Hello World!")

TweepError: [{'code': 187, 'message': 'Status is a duplicate.'}]

You just posted your first tweet using Python, how exciting! Now, go to your Twitter profile and see if you can find the tweet by going to _twitter.com/YOUR_USERNAME_.

**4.1.3.2 Deleting:** In addition to posting on Twitter, we can also delete our own tweets. To do that we need to find the _tweet id_ of our post. See if you can find your tweet's id, then pass it to the function `destroy_status` below and see what happens when you execute the cell and return to your Twitter profile. 

In [8]:
api.destroy_status("1232270222033375232")

#the tweet id is the number at the end of the url

TweepError: [{'code': 144, 'message': 'No status found with that ID.'}]

**4.1.3.3 Reading:** For many research purposes you might be more interested in collecting information such as tweets from Twitter rather than posting your own. We can also do this in Python using the Twitter API. Let's start with a simple example of accessing the complete timeline of an account I created for our class.

In [28]:
example_timeline = api.user_timeline("vicariousveblen")

This creates a variable of type `tweepy.models.ResultSet` which basically behaves like a list of tweets with the text and a lot of metadata. Knowing that it behaves like a list, how can we see how many tweets we collected?

In [29]:
# code to look up how many tweets we collected

len(example_timeline)

4

So we collected a set of tweets and now want to look at the content/the texts of these tweets. Remembering that you can work with the `example_timeline` variable like with a list and that each list element has a key called `text` linked to the content of the tweet, how would you access the first tweet in `example_timeline`?

In [30]:
# code to access the content of the first tweet in the timeline

print (example_timeline[0].text)

For the end of vicarious consumption is to enhance, not the fullness of life of the consumer, but the pecuniary rep… https://t.co/x2O7ALCem3


Look at this output and compare it to the original tweet [here](https://t.co/x2O7ALCem3). What do you notice? What does that mean for working with the Twitter API in practice?

The Twitter API cuts tweets of a certain length but contains information about which tweets are cut in a key called `truncated`. Can you write a loop to look at which of the tweets we collected were cut short?

In [32]:
# code to loop through timeline and tell if tweets were cut short

for tweet in example_timeline:
    print(tweet.truncated)

True
False
False
False


If we did not have this information, we could use the tools that we have learned already to provide us with the same information. How would you write a loop that does this? _Hint:_ You will want to look at what distinguishes the `text` in truncated tweets from those in untruncated tweets.

In [35]:
# alternative code to loop through timeline 
# and tell if tweets were cut short

# the 'https://t.co' is the part of the link that all truncated tweets will have in common

for tweet in example_timeline:
    print ('https://t.co/' in tweet.text)

True
False
False
False


In addition to the tweet content, the API provides us with a host of valuable metadata about the tweets such as how often they were retweeted, favorited, and when they were posted. Looking just at the second tweet using `example_timeline[1]`, can you find the right keys to identify 1) when the tweet was posted, 2) how often it has been retweeted, and 3) how often it has been favorited?

In [36]:
# look at the second element in example_timeline

print(example_timeline[1].text)

As has already been indicated, the distinction between exploit and drudgery is an invidious distinction between employments.


In [39]:
# 1) code to access time of posting

str(example_timeline[1].created_at)

'2020-02-18 16:47:23'

In [43]:
# 2) code to access number of retweets

example_timeline[1].retweet_count

0

In [44]:
# 3) code to access number of favorites

example_timeline[1].favorite_count

2

Beside allowing us to collect all the tweets produced by public Twitter accounts, the Twitter API also allows us to only access information about the Twitter acounts. The function to do this in `tweepy` is called `get_user`.

In [45]:
example_user = api.get_user("vicariousveblen")

Once we have collected the user profile, we can look at things like their location, their description or about me section, how often they have posted and who they follow and who follows them. The variable type returned by `get_user` is slightly easier to navigate to access these information since they are not nested in tweets.

In [46]:
# where does our example account live?
example_user.location

'Cato, Wisconsin'

In [47]:
# what does the description say
example_user.description

'Living my best Veblen life, vicariously.'

In [48]:
# does the account follow anyone or have any friends?
print("The account " + str(example_user.name) + " has " + str(example_user.followers_count) + " accounts following it.")
print("The account " + str(example_user.name) + " is following " + str(example_user.friends_count) + " accounts.")

The account vicariousveblen has 3 accounts following it.
The account vicariousveblen is following 0 accounts.


Finally the Twitter API has functionality with which we can look for certain keywords in all of Twitter. We can access this using `tweepy`'s  `search` function. For example, if we wanted to look for tweets using the hashtag '#DigitalMethods', we could search like this.

In [49]:
digimeth_tweets = api.search("#DigitalMethods")

Then, we could look at who tweets about '#DigitalMethods' by parsing the returned data like this.

In [50]:
for tweet in digimeth_tweets:
    print(tweet._json['user']['name'])

FDM Hildesheim
Andreⓐs Ferus 🌈 🦄 🐶
Marie v. Lüneburg
K. White
eveline wandl-vogt
Iian Neill (The Codex)
Elisabeth Militz
ZfdG
Julia Poerting
Agiati Benardou
mon Rodriguez-Amat
mon Rodriguez-Amat
Nicolo' Dell'Unto
mLab Geography Bern
DOS Research Group


**4.1.3.4 Rate limiting:** While we are only working with a few tweets or a limited number of accounts, we will not run into any problems. However, it is good general practice to always keep an eye on the rate limits set on us by the Twitter API. The `tweepy` module provides the function `rate_limit_status` to do so.

In [51]:
# check our current rate limit status
current_limits = api.rate_limit_status()

The variable `rate_limit_status` returns is a dictionary, that means it is intuitive to index once we know the keys. The keys that will likely be most important to us relate to searches and users.

In [52]:
# rate limit on the number of times we can call `get_user` within a 15 minute window
current_limits['resources']['users']['/users/lookup']

{'limit': 900, 'remaining': 900, 'reset': 1582735161}

In [53]:
# rate limit on the number of times we can call `user_timeline` within a 15 minute window
current_limits['resources']['statuses']['/statuses/user_timeline']

{'limit': 900, 'remaining': 899, 'reset': 1582734640}

## 4.2 Using functions to process data from the Twitter API 
Now that we know about some of the information that we can gather from Twitter and the structure in which it is returned to us, we can see even more use for defining our own functions. For example, we can combine all the tweets we retrieved for our example account and see what the account is tweeting about most.

In [54]:
# define the function `user_gist` taking one argument
def user_gist(user_timeline):
    
    # set up empty containers we will need throughout the loop
    word_freq = {}
    word_list = []
    gist = []
    
    # FIRST, loop through tweets in the timeline
    for tweet in example_timeline:
        # split up tweets into lists of words
        tweet_words = tweet.text.split()
        # and combine into one big list using `extend` command
        word_list.extend(tweet_words)
    
    # SECOND, loop through list of words in tweets
    for w in word_list:
        # add each unique word and its `count` to the dictionary `word_frequency`
        if w not in word_freq:
            word_freq[w] = word_list.count(w)

    #looping through the dictionary and adding each value, key pair to the list
    for key in word_freq:
        gist.append((word_freq[key], key))

    #sorting the list
    gist.sort()
    #reversing the sort to be largest to smallest
    gist.reverse()

    #returning the list
    return gist
        

In [55]:
user_gist(example_timeline)

[(6, 'the'),
 (6, 'of'),
 (3, 'and'),
 (2, 'to'),
 (2, 'is'),
 (2, 'distinction'),
 (2, 'conspicuous'),
 (2, 'between'),
 (1, 'ways'),
 (1, 'vicarious'),
 (1, 'rep…'),
 (1, 'pecuniary'),
 (1, 'not'),
 (1, 'norm'),
 (1, 'manners'),
 (1, 'living'),
 (1, 'life'),
 (1, 'leisure'),
 (1, 'items'),
 (1, 'invidious'),
 (1, 'indicated,'),
 (1, 'https://t.co/x2O7ALCem3'),
 (1, 'has'),
 (1, 'fullness'),
 (1, 'exploit'),
 (1, 'enhance,'),
 (1, 'end'),
 (1, 'employments.'),
 (1, 'drudgery'),
 (1, 'consumption.'),
 (1, 'consumption'),
 (1, 'consumer,'),
 (1, 'conformity'),
 (1, 'but'),
 (1, 'been'),
 (1, 'are'),
 (1, 'an'),
 (1, 'already'),
 (1, 'World!'),
 (1, 'High-bred'),
 (1, 'Hello'),
 (1, 'For'),
 (1, 'As')]