# Twitter API Quotas: V1 vs V2

This notebook is a scratch space to explore how different the Twitter v1 and v2 quota limits are, specifically the Search and Filter Stream API endpoints.

* v1: https://web.archive.org/web/20210326105531/https://developer.twitter.com/en/docs/twitter-api/v1/rate-limits
* v2: https://web.archive.org/web/20210326110654/https://developer.twitter.com/en/docs/twitter-api/rate-limits

An important difference between these two versions is how the API implements and enforces rate limits with respect to the *Application* fetching the data and the *User* who is fetching the data. These are concepts from the [OAuth](https://en.wikipedia.org/wiki/OAuth) standard which

> enables a third-party application to obtain limited access to an HTTP service, 
> either on behalf of a resource owner by orchestrating an approval interaction
> between the resource owner and the HTTP service, or by allowing the
> third-party application to obtain access on its own behalf.
>
> [RFC 6749](https://tools.ietf.org/html/rfc6749)

The distinction here is subtle but important. Twitter use their [developer portal](https://developer.twitter.com/en) to distribute keys that identify third-party applications that can access the API as well as users of those applications. The API enforces how many requests can be made of their search and filter stream APIs in a given time period for both applications and users.

## Stream

Twitter's v1 streaming endpoint allows for tweets to be collected in realtime by doing a HTTP POST to a URL and getting back a stream of tweets that match the requested query.

With v1 of the Twitter API a user is allowed to open two filter streams at a time. This is true of multiple users of a single application. These streams may be rate limited when the volume of tweets gets too high. Anecdotally rate limiting happens when the number of tweets being steamed becomes more than 1% of the total number of tweets being created. But that legend was started some years ago while and it's not exactly clear what the limit is. But the maximum at the moment seems to be around 70 tweets per second.

In Twitter's v2 API only applications are able to collect tweets from the filtered stream. Users are not able to connect through an application as themselves. Furthermore only one stream can be collected at a time. That stream is managed through a few other endpoints that let you add and remove queries. The stream is also constrained by a per-application monthly quota which 500,000 for standard applications and 10,000,000 for applications that have been designated as part of the Academic Research track. Enterprise users who pay Twitter money have these quotas written into their contract, presumably based on how much money they are paying.

So we can do some basic math to show what these differences look like. Every month a user can collect this many tweets from the v1.1 filter stream:

In [1]:
tweets_per_second = 70
tweets_per_month = tweets_per_second * 60 * 60 * 24 * 31
print(tweets_per_month)

187488000


And for the v2 API we are explitly bound by the type of access Standard 500,000 or Academic 10,000,000. If you are lucky enough to get an academic account you can get this percentage of what you could with v1:

In [2]:
10_000_000 / 187_488_000

0.053336746885133984

## Search

The v1 search API allows applications to submit 450 requests every 15 minutes to get tweets from the last 7 days that match a particular query. Users are able to do 180 requests. Each request can return 100 tweets. So doing the math we can figure out how many tweets can be retrieved from the search API in a month for an app:


In [3]:
450 * 100 * 4 * 24 * 31

133920000

and a user:

In [4]:
180 * 100 * 4 * 24 * 31

53568000

The situation is a little bit more complicated since a given application can request that many tweets on behalf of a theoretically unlimited number of users. But in practice there may be limits on the number of users an app can download data for before Twitter starts to take notice.

The v2 search API also lets applications do 450 requests every 15 minutes, and users 180. So v1 and v2 are equivalent in terms of how quickly data can be retrieved. But like with the filter stream, there is an application wide quota of 500,000 for Standard users and 10,000,000 for Academic users. So this means that an application using the v2 API that is blessed as Academic can only download this percentage of the total it could from the v1 API:

In [5]:
10_000_000 / 133_920_000

0.07467144563918758

or about 7%... And for users it's a little better:

In [6]:
10_000_000 / 53_568_000

0.18667861409796893

18%

## Visualization

We can create a simple visualization of what these difference look like.

* 100 tweets with each reqest
* 450 requests per every 15 minutes
* 4 15 minute windows in an hour
* 24 hour in a day


In [11]:
import pandas
import itertools

pandas.options.plotting.backend = "plotly"

v1 = list(itertools.accumulate(100 * 450 for n in range(0, 31 * 24 * 4)))
v2 = [n if n <= 10000000 else 10000000 for n in v1]

search = pandas.DataFrame(
    {"V2": v2, "V1": v1},
    index=pandas.date_range('2021-03-01', '2021-03-31 23:45:00', freq='15min')
)

search.plot.area(
    title="Monthly Data Collection from Application Twitter Search API",
    labels={"index": "Time", "value": "Tweets", "variable": "API Version"}
)

In [13]:
v1 = list(itertools.accumulate(100 * 180 for n in range(0, 31 * 24 * 4)))
v2 = [n if n <= 10000000 else 10000000 for n in v1]

search = pandas.DataFrame(
    {"V2": v2, "V1": v1},
    index=pandas.date_range('2021-03-01', '2021-03-31 23:45:00', freq='15min')
)

search.plot.area(
    title="Monthly Data Collection from User Twitter Search API",
    labels={"index": "Time", "value": "Tweets", "variable": "API Version"}
)

And here are similar graphs for the Filter Stream:

* 70 requests per second
* 60 seconds in a minute
* max of 2 active streams at once
* 31 days
* 24 hours in a day
* 60 minutes in an hour

In [4]:
v1 = list(itertools.accumulate(70 * 60 * 2 for n in range(0, 31 * 24 * 60)))
v2 = [n if n <= 10000000 else 10000000 for n in v1]

stream = pandas.DataFrame(
    {"V1": v1, "V2": v2},
    index=pandas.date_range('2021-03-01', '2021-03-31 23:59:00', freq='1min')
)

stream.plot(
    title="Monthly Data Collection from Filter Stream API",
    labels={"index": "Time", "value": "Tweets", "variable": "API Version"}
)