<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Enterprise-setup" data-toc-modified-id="Enterprise-setup-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Enterprise setup</a></span></li><li><span><a href="#Premium-Setup" data-toc-modified-id="Premium-Setup-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Premium Setup</a></span></li><li><span><a href="#Fast-Way" data-toc-modified-id="Fast-Way-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Fast Way</a></span></li><li><span><a href="#Working-with-the-ResultStream" data-toc-modified-id="Working-with-the-ResultStream-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Working with the ResultStream</a></span></li><li><span><a href="#Counts-Endpoint" data-toc-modified-id="Counts-Endpoint-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Counts Endpoint</a></span></li><li><span><a href="#Dated-searches-/-Full-Archive-Search" data-toc-modified-id="Dated-searches-/-Full-Archive-Search-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Dated searches / Full Archive Search</a></span></li></ul></div>

Working with the API within a Python program is straightforward both for Premium and Enterprise clients.

We'll assume that credentials are in the default location, `~/.twitter_keys.yaml`.

In [1]:
from searchtweets import ResultStream, gen_rule_payload, load_credentials

## Enterprise setup

In [2]:
enterprise_search_args = load_credentials("../dependencies/twitter_keys.yaml", yaml_key="search_tweets_enterprise", env_overwrite=False)

cannot read file ../dependencies/twitter_keys.yaml
Error parsing YAML file; searching for valid environment variables
Account type is not specified and cannot be inferred.
        Please check your credential file, arguments, or environment variables
        for issues. The account type must be 'premium' or 'enterprise'.
        


KeyError: 

## Premium Setup


In [None]:
premium_search_args = load_credentials("../dependencies/search_tweets_creds.yaml",
                                       yaml_key="search_tweets_30_day_dev",
                                       env_overwrite=True)

There is a function that formats search API rules into valid json queries called `gen_rule_payload`. It has sensible defaults, such as pulling more Tweets per call than the default 100 (but note that a sandbox environment can only have a max of 100 here, so if you get errors, please check this) not including dates, and defaulting to hourly counts when using the counts api. Discussing the finer points of generating search rules is out of scope for these examples; I encourage you to see the docs to learn the nuances within, but for now let's see what a rule looks like.

In [None]:
input_rule = input("Enter Boolean search ")
rule = gen_rule_payload(input_rule, results_per_call=100) # testing with a sandbox account
print(rule)

From this point, there are two ways to interact with the API. There is a quick method to collect smaller amounts of Tweets to memory that requires less thought and knowledge, and interaction with the `ResultStream` object which will be introduced later.


## Fast Way

We'll use the `search_args` variable to power the configuration point for the API. The object also takes a valid PowerTrack rule and has options to cutoff search when hitting limits on both number of Tweets and API calls.

We'll be using the `collect_results` function, which has three parameters.

- rule: a valid PowerTrack rule, referenced earlier
- max_results: as the API handles pagination, it will stop collecting when we get to this number
- result_stream_args: configuration args that we've already specified.


For the remaining examples, please change the args to either premium or enterprise depending on your usage.

Let's see how it goes:

In [None]:
from searchtweets import collect_results # to_date="202008210651

In [None]:
rule = gen_rule_payload("#womaninstem OR #womenindatascience OR #womenintech", results_per_call=100, to_date="202008180919")
print(rule)

In [None]:
tweets = collect_results(rule,
                         max_results=2000,
                         result_stream_args=premium_search_args) # change this if you need to

By default, Tweet payloads are lazily parsed into a `Tweet` [object](https://twitterdev.github.io/tweet_parser/). An overwhelming number of Tweet attributes are made available directly, as such:

In [None]:
import copy
tweets_first_2000 = copy.deepcopy(tweets)

In [None]:
import json
#with open('supernow.json', 'w') as outfile:
#    json.dump(tweets, outfile)
with open('202008180919.json', 'w') as outfile:
    json.dump(tweets, outfile)

Voila, we have some Tweets. For interactive environments and other cases where you don't care about collecting your data in a single load or don't need to operate on the stream of Tweets or counts directly, I recommend using this convenience function.


## Working with the ResultStream

The ResultStream object will be powered by the `search_args`, and takes the rules and other configuration parameters, including a hard stop on number of pages to limit your API call usage.

In [None]:
rs = ResultStream(rule_payload=rule,
                  max_results=500,
                  max_pages=1,
                  **premium_search_args)

print(rs)

There is a function, `.stream`, that seamlessly handles requests and pagination for a given query. It returns a generator, and to grab our 500 Tweets that mention `beyonce` we can do this:

In [None]:
tweets = list(rs.stream())

Tweets are lazily parsed using our [Tweet Parser](https://twitterdev.github.io/tweet_parser/), so tweet data is very easily extractable.

In [None]:
# using unidecode to prevent emoji/accents printing 
[print(tweet.all_text) for tweet in tweets[0:10]];

## Counts Endpoint

We can also use the Search API Counts endpoint to get counts of Tweets that match our rule. Each request will return up to *30* results, and each count request can be done on a minutely, hourly, or daily basis. The underlying `ResultStream` object will handle converting your endpoint to the count endpoint, and you have to specify the `count_bucket` argument when making a rule to use it.

The process is very similar to grabbing Tweets, but has some minor differences.


_Caveat - premium sandbox environments do NOT have access to the Search API counts endpoint._

In [None]:
count_rule = gen_rule_payload("beyonce", count_bucket="day")

counts = collect_results(count_rule, result_stream_args=enterprise_search_args)

Our results are pretty straightforward and can be rapidly used.

In [None]:
counts

## Dated searches / Full Archive Search

**Note that this will only work with the full archive search option**, which is available to my account only via the enterprise options. Full archive search will likely require a different endpoint or access method; please see your developer console for details.

Let's make a new rule and pass it dates this time.

`gen_rule_payload` takes timestamps of the following forms:


- `YYYYmmDDHHMM`
- `YYYY-mm-DD` (which will convert to midnight UTC (00:00)
- `YYYY-mm-DD HH:MM`
- `YYYY-mm-DDTHH:MM`

Note - all Tweets are stored in UTC time.

In [None]:
rule = gen_rule_payload("from:jack",
                        from_date="2017-09-01", #UTC 2017-09-01 00:00
                        to_date="2017-10-30",#UTC 2017-10-30 00:00
                        results_per_call=500)
print(rule)

In [None]:
tweets = collect_results(rule, max_results=500, result_stream_args=enterprise_search_args)

In [None]:
[print(tweet.all_text) for tweet in tweets[0:10]];