# Searching Tweets using Twitter's official v2 API in Python 
Copyleft 2021 Forrest Sheng Bao 

There are tons of APIs that enable you to search on Twitter, like DocNow's Twarc. But why do you bother them when Twitter has an official Python API? 

Of course, you can re-invent this API easily, just RESTful-ly query to Twitter server using `request` or `urllib` and then you get a JSON string as the response. 

## Installation

You wanna install from the `v2` branch 

```shell
pip3 install git+https://github.com/twitterdev/search-tweets-python.git@v2
```


## First, load the module

In [1]:

import searchtweets # twitter official API 

## Then, load the search arguments, including tokens from a YAML file 

The YAML file should be in this format (note the indentation)

```yaml
search_tweets_v2:
    consumer_key = "a string"
    consumer_secrete = "a string"
    bearer_token = "a string" 
```

The key `search_tweets_v2` allows you to easily switch between different endpoints, e.g., free vs. premium vs. academic. 

In [2]:
search_args = searchtweets.load_credentials(filename="credentials.yaml",
                 yaml_key="search_tweets_v2",
                 env_overwrite=False)

## Finally, the real game: search

There are two ways to search. Here I just show the simpler way which returns a nested JSON string, rather than the fancier way called `ResultStream`. 

The simpler way is done using [the `collect_results` function, which has only three arguments](https://github.com/twitterdev/search-tweets-python/blob/8883da7f4bea281c7d0df2dc58c1ec5c0fe28e04/searchtweets/result_stream.py#L439): 
1. the query, which is a dictionary with keys defined [in Twitter API v2 doc here](https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent) and values of correponding types. 
2. max number of Tweets to return 
3. the search arguments, which was just loaded above from the YAML file.

For the argument `query`, a key `query` is mandatory. In the example below, we also set an optional key `tweet.fields`. Specifically, we ask for the language and creation time of tweets in addition. 

In [3]:

query = {"query":"emnlp", # search emnlp on Twitter
         "tweet.fields":"lang,created_at"} # comma-separated 

tweets = searchtweets.collect_results(query, 
                                      max_tweets=10,
                                      result_stream_args=search_args)

tweets[0]["data"][0]

{'created_at': '2021-12-06T20:15:56.000Z',
 'id': '1467950959187968005',
 'text': 'Congratulations to Prof. Siva Reddy (@sivareddyg) and his postdoc Edoardo Ponti(@PontiEdoardo) who won the best paper award at EMNLP 2021, for their paper Visually Grounded Reasoning across Languages and Cultures: https://t.co/JDjutIpUS7',
 'lang': 'en'}

Now, let's see the result. 
Calling `collect_results` above returns a list of only one element which is a dict
of only two keys 'data' and 'meta', the first of which contains tweets matching your query. 
`tweets[0]["data"]` is a list of dictionaries, each of which looks like the print out above.

The two default fields in response are `'id'` and `'text'`. The two additional fields `lang` and `created_at` are due to `"tweet.fields":"lang,created_at"` in our query which askes for the language and creation time of tweets. If you have more fields in the query, you will have correspondingly more fields in the result. 


## The end. Just that simple. 