# **Tweepy for Twitter API**

Twitter is one of the most accessed social networks in the world. Every type of company usually has an account. With that in mind, they are able to verify people's engagement with their products and services, in addition to being able to know their feelings about it. Thus, companies can increase or change their strategies in order to establish an improvement in their deliveries.

[Tweepy](http://docs.tweepy.org/en/latest/) is an open source package of Python and an easy way to connect with the Twitter API to collect information, perform analysis and do some automations.

Tweepy imposes a rate limit of frequency on the use of the API. Exceeding this limit, we will have to wait 15 minutes to use the API again.

*Note: we're going to use Public Mode in our procedures.*

### **What is the objective here?**

* Collect tweets and retweets.
    * timestamp
    * location of user
    * (re)tweet text
    * retweet count
    * hashtags

## Packages

To install tweepy package:

```
pip install tweepy
```

Alternatively, install directly from the GitHub repository:

```
pip install git+https://github.com/tweepy/tweepy.git
```


In [2]:
import tweepy as tw
import pandas as pd

**AUTHENTICATION**

``Private Mode`` - It needs *consumer key*, *consumer secret key*, *access token* and *access token secret.* It's used when, for exemple, you want to do almost everything you can do on the website using code. If you wants to tweet and retweet something, you can. If you want a bot account, you can. And so on...


``Public Mode`` - It needs only *consumer key* and *consumer secret key.* The user only access public information.

In [6]:
## Keys reading

# We saved the keys/tokens on a plain text file to "hide" them.
# HERE WE´RE NOT GOING TO USE TOKEN/TOKEN SECRET

with open('.kt/twtk.txt', 'r') as file:
    CONSUMER_KEY = file.readline().strip('\n')
    CONSUMER_SECRET_KEY = file.readline().strip('\n')

## Connect the consumer key

auth = tw.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET_KEY) # To both Private and Public Modes

**CONNECTION TO TWITTER API**

We use the ``auth`` to connect the API. Here are some parameters, among others, to check:

* ``wait_on_rate_limit``. When we exceed the rate limit, the connection can be kept if 'True', waiting the API allows procedures again. If 'False', the connection is lost.
* ``wait_on_rate_limit_notify`` notifies when the limit is exceeded and the api is waiting for rate limits to replenish.
* ``timeout`` is the maximum amount of time (in seconds) to wait for a response from Twitter.

In [7]:
# Access API user
api = tw.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, timeout=60)

**COLLECT TWEETS**

- `api.search`: returns a collection of relevant Tweets matching a specified query;
- `q`: any word or list of words we want to check;
- `lang`: language given by an ISO 639-1 code;
- `result_type`:

    - *mixed*: include both popular and real time results in the response;
    - *recent*: return only the most recent results in the response;
    - *popular*: return only the most popular results in the response.
    
- `tweet_mode`: if 'compatibility', it returns the until 140 characters. If  'extended', over 140 characters.

You can use Cursor attributes:
- ``.items(x)`` returns a specific 'x' quantity of tweets;
- ``.pages(x)`` returns a specific 'x' quantity of pages (usually about few dozen items).



In [43]:
## Define parameters

# Twitter
QUERY1 = ['covid -filter:retweets']
ITEMS = 500

# lists

TWEETS = []

# The next loop for collects tweets and retweets according to ITEMS 
# defined.

for tweet in tw.Cursor(api.search,
                    q= QUERY1, 
                    lang= 'pt',
                    result_type='recent',
                    tweet_mode = 'extended'  # collect the full text (over 140 characters)
                    ).items(ITEMS):

                    TWEETS.append([tweet.id, tweet.created_at, tweet.user.location, tweet.full_text.replace('\n', ' '), 
                                          tweet.retweet_count, [e['text'] for e in tweet._json['entities']['hashtags']]])

In [35]:
# Putting tweets on a Data Frame for better view

df_tweets = pd.DataFrame(data=TWEETS,  columns=['id', 'created_at', "location", 'tweet_text','retweet_count', 'hashtags'])

## Saving on .csv file
# df_tweets.to_csv('tweets.csv', index=0)

In [45]:
df_tweets

Unnamed: 0,id,created_at,location,tweet_text,retweet_count,hashtags
0,1275769428182020097,2020-06-24 12:35:08,,RT @secomvc: O Brasil já chegou a mais de 613 ...,85,[]
1,1275769426965602304,2020-06-24 12:35:08,"Governador Valadares, Brasil",Minha tia é burra ! Confirmado que tá com covi...,0,[]
2,1275769425057234947,2020-06-24 12:35:08,Curitiba,"Imagina 3 clientes desmarcando, pois as mesmas...",0,[]
3,1275769424914657280,2020-06-24 12:35:08,Chapecó,"✅ Número de casos de Covid-19, em Chapecó, seg...",0,[]
4,1275769420950929408,2020-06-24 12:35:07,,RT @michellebaessoo: 2Crônicas 7:13-14 Faz mui...,4425,"[apocalipse, COVID__19]"
...,...,...,...,...,...,...
995,1275767287136628736,2020-06-24 12:26:38,,RT @majorolimpio: Como não se emocionar junto?...,149,[]
996,1275767284309676033,2020-06-24 12:26:37,"Rio de Janeiro, Brasil",O #COVID__19 tem vários aliados no Brasil. A c...,0,[COVID__19]
997,1275767283563069440,2020-06-24 12:26:37,,RT @michellebaessoo: 2Crônicas 7:13-14 Faz mui...,4432,"[apocalipse, COVID__19]"
998,1275767282288001026,2020-06-24 12:26:37,"Santo Antônio do Descoberto, B","RT @JoseMedeirosMT: Segundo a Revista Ceará, e...",119,[]


That was a simple way to collect some (re)tweets informations.

Hope you enjoyed.

**Timão Legal** :)
