# **Tweepy for Twitter API**

Twitter is one of the most accessed social networks in the world. Every type of company usually has an account. With that in mind, they are able to verify people's engagement with their products and services, in addition to being able to know their feelings about it. Thus, companies can increase or change their strategies in order to establish an improvement in their deliveries.

[Tweepy](http://docs.tweepy.org/en/latest/) is an open source package of Python and an easy way to connect with the Twitter API to collect information, perform analysis and do some automations.

Tweepy imposes a rate limit of frequency on the use of the API. Exceeding this limit, we will have to wait 15 minutes to use the API again.

*Note: we're going to use Public Mode in our procedures.*

### **What is the objective here?**

* Collect tweets and retweets.
    * timestamp
    * location of user
    * (re)tweet text
    * retweet count
    * hashtags

## Packages

To install tweepy package:

```
pip install tweepy
```

Alternatively, install directly from the GitHub repository:

```
pip install git+https://github.com/tweepy/tweepy.git
```


In [1]:
import tweepy as tw
import pandas as pd
import json

**AUTHENTICATION**

``Private Mode`` - It needs *consumer key*, *consumer secret key*, *access token* and *access token secret.* It's used when, for exemple, you want to do almost everything you can do on the website using code. If you wants to tweet and retweet something, you can. If you want a bot account, you can. And so on...


``Public Mode`` - It needs only *consumer key* and *consumer secret key.* The user only access public information.

In [2]:
## Keys reading

# We saved the keys/tokens on a plain text file to "hide" them.
# HERE WE´RE NOT GOING TO USE TOKEN/TOKEN SECRET

with open('twtk.txt', 'r') as file:
    CONSUMER_KEY = file.readline().strip('\n')
    CONSUMER_SECRET_KEY = file.readline().strip('\n')

## Connect the consumer key

auth = tw.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET_KEY) # To both Private and Public Modes


**CONNECTION TO TWITTER API**

We use the ``auth`` to connect the API. Here are some parameters, among others, to check:

* ``wait_on_rate_limit``. When we exceed the rate limit, the connection can be kept if 'True', waiting the API allows procedures again. If 'False', the connection is lost.
* ``wait_on_rate_limit_notify`` notifies when the limit is exceeded and the api is waiting for rate limits to replenish.
* ``timeout`` is the maximum amount of time (in seconds) to wait for a response from Twitter.

In [3]:
# Access API user
api = tw.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, timeout=60)

**COLLECT TWEETS**

- `api.search`: returns a collection of relevant Tweets matching a specified query;
- `q`: any word or list of words we want to check;
- `lang`: language given by an ISO 639-1 code;
- `result_type`:

    - *mixed*: include both popular and real time results in the response;
    - *recent*: return only the most recent results in the response;
    - *popular*: return only the most popular results in the response.
    
- `tweet_mode`: if 'compatibility', it returns the until 140 characters. If  'extended', over 140 characters.

You can use Cursor attributes:
- ``.items(x)`` returns a specific 'x' quantity of tweets;
- ``.pages(x)`` returns a specific 'x' quantity of pages (usually about few dozen items).



In [4]:
## Define parameters

# Twitter
QUERY1 = ['datascience OR machinelearning OR neuralnetwork']
ITEMS = 1000

# lists

RETWEETS = []
TWEETS = []

# The next loop for collects tweets and retweets according to ITEMS 
# defined. Note that quantity for each tweet and retweet will be less 
# than ITEMS. 

# Here we're not using filter on query previously due to the own loop
# has if statement segregating both tweet and retweet

for tweet in tw.Cursor(api.search,
                    q= QUERY1, 
                    lang= 'pt',
                    result_type='recent',
                    tweet_mode = 'extended'  # collect the full text (over 140 characters)
                    ).items(ITEMS):

                    if (tweet.retweeted) or ('RT @' in tweet.full_text):

                        RETWEETS.append([tweet.created_at, tweet.user.location, tweet.full_text.replace('\n', ' '), 
                                          tweet.retweet_count, [e['text'] for e in tweet._json['entities']['hashtags']]])

                    else:

                        TWEETS.append([tweet.created_at, tweet.user.location, tweet.full_text.replace('\n', ' '), 
                                          tweet.retweet_count, [e['text'] for e in tweet._json['entities']['hashtags']]])

In [5]:
# Putting tweets on a Data Frame for better view

df_tweets = pd.DataFrame(data=TWEETS,  columns=[ 'created_at', "location", 'tweet_text','retweet_count', 'hashtags'])
df_retweets = pd.DataFrame(data=RETWEETS,  columns=['created_at', "location", 'tweet_text','retweet_count', 'hashtags'])

## Saving on .csv file
df_tweets.to_csv('tweets.csv', index=0)
df_retweets.to_csv('retweets.csv', index=0)

In [6]:
df_tweets.head(10)

Unnamed: 0,created_at,location,tweet_text,retweet_count,hashtags
0,2020-06-11 20:05:40,Brasil,"A busca pela vantagem competitiva, novos conhe...",0,"[Deloitte, DataScience]"
1,2020-06-11 20:00:01,Brasil,"A busca pela vantagem competitiva, por conheci...",0,"[Deloitte, DataScience]"
2,2020-06-11 17:52:13,São Paulo-SP,#vemprawayon #vagasti #cientistadedados #R #Py...,0,"[vemprawayon, vagasti, cientistadedados, R, Py..."
3,2020-06-11 16:52:34,São Paulo,Tudo menos isso. #machinelearning https://t.co...,2,[machinelearning]
4,2020-06-11 16:09:06,,Estamos ficando velho! 2017 - primeira conferê...,0,"[tbt, botsbrasil, bots, machinelearning]"
5,2020-06-11 16:03:59,"Rio de Janeiro, Brasil","Pense na molecada preta voando em #Pandas, em ...",0,"[Pandas, NumPy, MachineLearning, AI, Matplotli..."
6,2020-06-11 15:34:43,,Facebook AI fez um modelo não supervisionado p...,1,"[deeplearning, machinelearning]"
7,2020-06-11 15:34:16,"BH, MG, Brasil",As 10 tendências em dados e analytics para sup...,1,"[COVID19, BigData, Analytics, AI, DataScience,..."
8,2020-06-11 15:29:49,Belém-PA,Pouco espaço mas a gente da um jeito. Seguindo...,1,"[DataScience, PythonProgramming, DataAnalytics..."
9,2020-06-11 15:15:23,Somewhere over the Mid-Pacific,Existem oportunidades para estrangeiros trabal...,0,"[datascience, Brasil]"


In [7]:
df_retweets.head(10)

Unnamed: 0,created_at,location,tweet_text,retweet_count,hashtags
0,2020-06-11 19:26:32,,RT @ofelipeb: Pense na molecada preta voando e...,0,"[Pandas, NumPy]"
1,2020-06-11 18:53:01,"São Paulo, Brasil",RT @DMSS_Software: De pesquisa de mercado até ...,2,[]
2,2020-06-11 17:26:06,"Paris, France 🇫🇷",RT @SaudenoBR: Tudo menos isso. #machinelearni...,2,[machinelearning]
3,2020-06-11 17:14:00,,RT @SaudenoBR: Tudo menos isso. #machinelearni...,2,[machinelearning]
4,2020-06-11 16:36:50,Benito Juárez,RT @marcusbhz: As 10 tendências em dados e ana...,1,[]
5,2020-06-11 16:25:11,Benito Juárez,RT @ric_med: Pouco espaço mas a gente da um je...,1,[DataScience]
6,2020-06-11 15:54:45,São Paulo,RT @IPTSP: O encontro acontece no dia 12 de ju...,1,[]
7,2020-06-11 13:56:47,,RT @marcusborba: Join me at #SAPPHIRENOW Regi...,12,[SAPPHIRENOW]
8,2020-06-11 11:50:35,Bedford,RT @marcusborba: Join me at #SAPPHIRENOW Regi...,12,[SAPPHIRENOW]
9,2020-06-11 11:49:34,,RT @marcusborba: Join me at #SAPPHIRENOW Regi...,12,[SAPPHIRENOW]


That was a simple way to collect some (re)tweets informations.

Hope you enjoyed.

**Timão Legal** :)
