# TPS Estimator

Estimating Transactions Per Second (TPS) is a common requirement in designing API integrations. The factors affecting how many TPS is generated for a given API are number of users, traffic pattern and caching time used by the client. 

My main challenge with estimating TPS is the accuracy of extrapolating present TPS to a different number of users or different caching times. This notebook allows to simulate variety of different traffic patterns to estimate TPS when one of the parameters changes.

In [7]:
import numpy as np
import pandas as pd

## Setup

First, we want to build a realistic model of the existing API traffic. Broadly there are two types of API requests: user-generated and background traffic. For the purpose of TPS estimation I am mainly intersted in user-generated traffic. When simulating user-generated traffic, we are looking at these parameters:

- number of users
- how many peaks of traffic do we have per day
- how wide are the peaks (how steep is the increase / decrease in amount of requests)

In [12]:
# Create dummy data plot generated by X users with peaks / peak width. Users are not unique here

rows = 10*86400

data = np.random.rand(rows,1)
ts = pd.date_range(start='2022-01-01', end='2022-01-02', freq='S')
df = pd.DataFrame(data, columns=['userid'], index=ts)
print(df.head())

ValueError: Shape of passed values is (864001, 1), indices imply (86401, 1)

## Introducing caching

We now make an assumption that users calling the API are not unique and that caching can substantially reduce API load. Let's see what happens to the TPS as we increase and decrease caching times

In [2]:
# TPS plots for different caching times

Now let's see what happens when a percentage of API calls is using a different caching time

In [4]:
# TPS plots with % of caching times

## Change amount users

Now let's have more users calling the API and see what happens with the TPS to various caching times

In [5]:
# TPS plots vs amount of users vs caching time

## Change pattern in user behavior

Now let's change the actual distribution of users within a dataset. Let's have some of the users call the API very frequently and other less so

In [6]:
# TPS plots for different user behaviors