<div style="text-align: center; line-height: 0; padding-top: 2px;">
  <img src="https://www.quantiaconsulting.com/logos/quantia_logo_orizz.png" alt="Quantia Consulting" style="width: 600px; height: 250px">
</div>

In [None]:
import time
import tweepy
import pandas as pd
import json

## Twitter API Example

- Interact with [Twitter API](https://developer.twitter.com/en/docs.html). The main endpoints return tweets, users and followers. 
- It is necessary to generate an API key to obtain access to the endpoints.
- You need to sign in on [Twitter](https://twitter.com) and then following this steps https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html

API keys and tokens are needed to initialize Python Twitter Wrapper ([tweepy](https://tweepy.readthedocs.io/en/3.7.0/api.html)). It is a best practice to store the keys in a separated **configuration file** that should be kept secret and not shared (e.g.: on GitHub).

In [None]:
cred = { "consumer_key" : "",
         "consumer_secret" : "",
         "access_token" : "",
         "access_token_secret" : ""
        }

In [None]:
consumer_key = cred['consumer_key']
consumer_secret = cred['consumer_secret']
access_token = cred['access_token']
access_token_secret = cred['access_token_secret']

In [None]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
twitter = tweepy.API(auth)

1) Get last N_MAX tweets of a specific user and store them in MongoDB collection.

In [None]:
def save_tweet(data):
    tw = {}
    tw['id_post'] = tweet.id
    tw['username'] = tweet.user.screen_name

    if tweet.coordinates is not None:
        coor = tweet.coordinates['coordinates']
        lat = coor[1]
        lng = coor[0]
        tw['lat'] = lat
        tw['long'] = lng
    else:
        lat = None
        lng = None

    if tweet.place is not None:
        place = tweet.place.name
        tw['place'] = place
    else:
        place = None

    tw['text'] = tweet.full_text
    tw['timestamp'] = tweet.created_at
    tw['retweets'] = tweet.retweet_count
    tw['likes'] = tweet.favorite_count
    tw['lang'] = tweet.lang
    
    return tw

In [None]:
N_MAX = 100
username = 'polimi'

tweets_df = pd.DataFrame(columns=['id_post','username','lat', 'long', 'place','text','timestamp','retweets','likes','lang'])
for tweet in tweepy.Cursor(twitter.user_timeline, screen_name=username, tweet_mode='extended').items(N_MAX):
    tw_row = save_tweet(tweet)
    tweets_df = tweets_df.append(tw_row, ignore_index=True)

2) Retrieve user account information

In [None]:
u = twitter.get_user(screen_name = username)

print u._json

3) Save **follow** relationship. 

This is the most expensive operation, since number of followers can be extremely large. For this reason, we need to define a function to handle **API rate limits**: over a certain number of requests, that depends on resource, the API stops for **15 minutes** (more details [here](https://developer.twitter.com/en/docs/basics/rate-limits))

In [None]:
def limit_handled(cursor):
    while True:
        
        try:
            
            yield cursor.next()
            
        except tweepy.RateLimitError:
            print ('API Rate Limit exceeded. Waiting...')
            
            # wait for 15 minutes to reset the API timeout
            time.sleep(15 * 60)

In [None]:
follow = pd.DataFrame(columns=['id_following', 'id_followed'])

id_user = u.id
for follower in limit_handled(tweepy.Cursor(twitter.followers_ids, screen_name=username).items()):
    follow = follow.append({'id_following': follower, 'id_followed': id_user}, ignore_index=True)

In [None]:
follow.head()

In [None]:
follow['id_following'] = follow['id_following'].astype(long)
follow['id_followed'] = follow['id_followed'].astype(long)
follow.head()

In [None]:
# save data
spark_df = spark.createDataFrame(follow)
spark_df.write.mode("overwrite").saveAsTable("default.{}_followers".format(username))

##### ![Quantia Tiny Logo](https://www.quantiaconsulting.com/logos/quantia_logo_tiny.png) 2020 Quantia Consulting, srl. All rights reserved.