# Remote Endpoint Times/QPS/Latency

The goal of this experiment is to understand the latency of endpoints issueing requests to peer endpoints.

## Instructions

### Setup

1. Clone and install https://github.com/proxystore/proxystore-benchmarks
   ```bash
   $ git clone git@github.com:proxystore/proxystore-benchmarks.git
   $ cd proxystore-benchmarks
   $ virtualenv venv
   $ . venv/bin/activate
   $ pip install -e .
   ```
2. Configure a ProxyStore endpoint on the local and remote systems.
   ```bash
   $ proxystore-endpoint configure psbench
   $ proxystore-endpoint start psbench &> /dev/null &
   ```
   Note: endpoint logs will still be written to ~/.proxystore
   
### Run

```bash
$ python -m psbench.benchmarks.remote_ops \
      REDIS \
      --redis-host thetalogin5 \
      --redis-port 59465 \
      --ops GET SET EXISTS EVICT \
      --payload-sizes 1000 10000 100000 1000000 10000000 \
      --repeat 5 \
      --csv-file results/thetalogin4-thetalogin5-remote-ops.csv
      
$ python -m psbench.benchmarks.remote_ops \
      ENDPOINT \
      --endpoint {ENDPOINT UUID} \
      --server {SIGNALING SERVER URL } \
      --ops GET SET EXISTS EVICT \
      --payload-sizes 1000 10000 100000 1000000 10000000 \
      --repeat 5 \
      --csv-file results/thetalogin4-thetalogin5-remote-ops.csv
```

### Notes:
- N/A

In [33]:
%matplotlib inline

import math
from typing import Any

import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.markers as markers
import pandas
import numpy
import seaborn

# to change default colormap
plt.rcParams["image.cmap"] = "tab10"
# to change default color cycle
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=plt.cm.tab10.colors)

In [34]:
BACKEND_COLUMN = 'backend'
OP_COLUMN = 'op'
PAYLOAD_COLUMN = 'payload_size_bytes'
AVG_TIME_COLUMN = 'avg_time_ms'
MIN_TIME_COLUMN = 'min_time_ms'
MAX_TIME_COLUMN = 'max_time_ms'

def load(filepath: str) -> pandas.DataFrame:
    return pandas.read_csv(filepath)

In [69]:
theta_data = load('data/endpoint-times-remote/thetalogin4-thetalogin5-remote-ops.csv')
midway_data = load('data/endpoint-times-remote/midway2-thetalogin5-remote-ops.csv')
frontera_data = load('data/endpoint-times-remote/frontera-thetalogin5-remote-ops.csv')
theta_data.head()

Unnamed: 0,backend,op,payload_size_bytes,repeat,total_time_ms,avg_time_ms,min_time_ms,max_time_ms,avg_bandwidth_mbps
0,REDIS,GET,1000.0,5,0.679763,0.226588,0.203647,0.24244,4.413303
1,REDIS,GET,10000.0,5,0.46639,0.155463,0.129955,0.190086,64.323849
2,REDIS,GET,100000.0,5,0.70754,0.235847,0.191786,0.260782,424.004297
3,REDIS,GET,1000000.0,5,2.408085,0.802695,0.730678,0.910145,1245.8032
4,REDIS,GET,10000000.0,5,27.453748,9.151249,8.295218,9.902027,1092.746972


In [44]:
def rows_matching_column_value(data: pandas.DataFrame, column: str, value) -> pandas.DataFrame:
    return data.loc[data[column] == value]

def get_value_by_other_column(data: pandas.DataFrame, query_column: str, query_value: Any, target_column: str) -> Any:
    rows = data.loc[data[query_column] == query_value]
    assert len(rows) == 1
    return rows[target_column].values[0]

In [68]:
DATA = {
    'Theta → Theta': theta_data,
    'Campus Cluster → Theta': midway_data,
    'Frontera → Theta': frontera_data,
}
F_STR = '{system:22} | {op:6} | {size:<9} | {avg_endpoint:8.3f} | {avg_redis:8.3f}'

print('                                |  Payload  |  Avg Op Time (ms)')
print('Systems                | Op     |   Bytes   | Endpoint |  Redis')

for systems, data in DATA.items():
    print('-----------------------|--------|-----------|----------|---------')
    for op in ['GET', 'SET', 'EVICT', 'EXISTS']:
        op_data = rows_matching_column_value(data, OP_COLUMN, op)
        endpoint_data = rows_matching_column_value(op_data, BACKEND_COLUMN, 'ENDPOINT')
        redis_data = rows_matching_column_value(op_data, BACKEND_COLUMN, 'REDIS')
        if op in ['GET', 'SET']:
            for payload_size in list(op_data[PAYLOAD_COLUMN].unique()):
                avg_endpoint = get_value_by_other_column(endpoint_data, PAYLOAD_COLUMN, payload_size, AVG_TIME_COLUMN)
                avg_redis = get_value_by_other_column(redis_data, PAYLOAD_COLUMN, payload_size, AVG_TIME_COLUMN)
                print(F_STR.format(system=systems, op=op, size=int(payload_size), avg_endpoint=avg_endpoint, avg_redis=avg_redis))
        else:
            avg_endpoint = get_value_by_other_column(endpoint_data, OP_COLUMN, op, AVG_TIME_COLUMN)
            avg_redis = get_value_by_other_column(redis_data, OP_COLUMN, op, AVG_TIME_COLUMN)
            print(F_STR.format(system=systems, op=op, size='N/A', avg_endpoint=avg_endpoint, avg_redis=avg_redis))

                                |  Payload  |  Avg Op Time (ms)
Systems                | Op     |   Bytes   | Endpoint |  Redis
-----------------------|--------|-----------|----------|---------
Theta → Theta          | GET    | 1000      |    1.605 |    0.227
Theta → Theta          | GET    | 10000     |    1.987 |    0.155
Theta → Theta          | GET    | 100000    |    7.639 |    0.236
Theta → Theta          | GET    | 1000000   |   68.930 |    0.803
Theta → Theta          | GET    | 10000000  |  608.133 |    9.151
Theta → Theta          | SET    | 1000      |    1.620 |    0.155
Theta → Theta          | SET    | 10000     |    2.113 |    0.167
Theta → Theta          | SET    | 100000    |    7.925 |    0.182
Theta → Theta          | SET    | 1000000   |   72.135 |    0.744
Theta → Theta          | SET    | 10000000  |  611.600 |    6.503
Theta → Theta          | EVICT  | N/A       |    1.381 |    0.106
Theta → Theta          | EXISTS | N/A       |    1.434 |    0.158
--------------