# Use Case: Real Time Defect Analysis

Here we use ProxyStore to speed up a real science workflow for real time defect analysis of TEM images.

## Instructions

### Setup

#### Clone and install https://github.com/proxystore/real-time-defect-analysis
```bash
$ git clone git@github.com:proxystore/real-time-defect-analysis.git
$ cd real-time-defect-analysis
$ git checkout proxystore
$ conda env create --file environment.yml -p $(pwd)/env
$ conda activate $(pwd)/env
```
This will install FuncX and ProxyStore. A few notes:
- FuncX is now renamed Globus Compute and the latest version is 2.0.'
 It is possible that FuncX 1.0.13 stops working at some point.
- Additional install notes are provided in the repository.

#### Configure a FuncX endpoint on your machine of choice (ideally a GPU machine).
```bash
$ funcx-endpoint configure rtdefects
$ funcx-endpoint start rtdefects
```
Endpoint config reference: https://funcx.readthedocs.io/en/latest/endpoints.html.
The endpoint configuration will differ from machine to machine based on the system, allocations, etc.
   
### Run

Follow the instructions in the repo: https://github.com/proxystore/real-time-defect-analysis.

1. Create a directory to put images in:
   ```bash
   mkdir data-dir
   cp tests/test-image.tif data-dir/
   ```
2. Register the functions/compute:
   ```bash
   $ rtdefects register
   $ rtdefects config --funcx-endpoint <funcx-endpoint-uuid>
   ```
3. Run the baseline:
   ```bash
   $ rtdefects start data-dir --timeout 600 --redo-existing --no-server
   ```
4. Run with ProxyStore:
   Using the `FileConnector` which passed data via a shared file system directory (`./proxystore-dump`).
   ```bash
   $ rtdefects start data-dir --timeout 600 --redo-existing --no-server --ps-file-dir ./proxystore-dump
   ```
   Using ProxyStore endpoints.
   Note that the endpoints will need to be configured on the client and server if running across sites.
   If running locally, a single endpoint UUID can be provided.
   ```bash
   $ proxystore-endpoint configure rtdefects [options]
   $ proxystore-endpoint start rtdefects
   $ rtdefects start data-dir --timeout 600 --redo-existing --no-server --ps-endpoints <local-endpoint-uuid> <remote-endpoint-uuid>
   ```

## Collected Data

Each value in these lists is the roundtrip time in seconds of a single image processing task.
The times were manually recorded from the `stdout` of individual runs of the app.

Notes:
- Input images are 1 MB.
- Outputs are 1.1 MB.
- FuncX endpoint is on a Polaris login node and tasks are executed on compute nodes.
- For the FuncX baseline and `FileStore`, the client is on a Theta login node.
- For the `EndpointStore`, the client is on Midway. A ProxyStore endpoint is configured on Midway and on Polaris login.

In [4]:
DATA = {
    'funcx_baseline': [3.64, 3.51, 3.67, 3.74, 2.77, 3.64, 3.5, 2.82],
    'file_store_inputs_only': [2.23, 2.42, 2.54, 2.21, 2.44, 2.22, 2.28, 2.2],
    'file_store_inputs_only_async': [2.25, 2.33, 2.44, 2.4, 2.42, 2.47, 2.24, 2.28],
    'file_store_inputs_outputs': [2.16, 2.23, 2.13, 2.15, 2.11, 2.13, 2.23, 2.14],
    'endpoint_store_inputs_only': [2.44, 2.44, 2.46, 2.31, 2.25, 2.29, 2.3, 2.51],
    'endpoint_store_inputs_outputs': [2.27, 2.39, 2.19, 2.33, 2.47, 2.21, 2.2, 2.18],
    'endpoint_store_inputs_only_no_peering': [2.33, 2.52, 2.49, 2.36, 2.28, 2.29, 2.23, 2.5],
}

In [15]:
import statistics

print(f'{"Config":38} | {"Avg RTT (ms)":12} | {"Stdev (ms)":10}')
print('-' * (38 + 12 + 10 + (2 * 3)))

for config, data in DATA.items():
    data = [1000 * x for x in data]
    mean = sum(data) / len(data)
    stdev = statistics.stdev(data)
    print(f'{config:38} | {mean:12.2f} | {stdev:10.2f}')

Config                                 | Avg RTT (ms) | Stdev (ms)
------------------------------------------------------------------
funcx_baseline                         |      3411.25 |     388.79
file_store_inputs_only                 |      2317.50 |     130.36
file_store_inputs_only_async           |      2353.75 |      90.39
file_store_inputs_outputs              |      2160.00 |      45.67
endpoint_store_inputs_only             |      2375.00 |      97.54
endpoint_store_inputs_outputs          |      2280.00 |     107.04
endpoint_store_inputs_only_no_peering  |      2375.00 |     113.01
