# Horizontal autoscaling experiment
This experiment shows how horizontal autoscaling can maintain the performance of an inference service in user traffic spikes. 

The same sklearn red wine model was used in the experiment. We first disabled the horizontal scaling of the inference service. As a result, only one pod was running for the inference service regardless of user demand. Later we updated the inference service by enabling the horizontal autoscaling:
```yaml
spec:
  predictor:
    minReplicas: 1
    maxReplicas: 5
    scaleTarget: 1000
    scaleMetric: concurrency
```
We simulated 4500 concurrent POST requests for 30 seconds and sent them to the sklearn red wine inference service. The results show that horizontal autoscaling can shorten the response time of an inference service in user traffic spikes:
|                       | With autoscaling | Without autoscaling |
|-----------------------|------------------|-------------------- |
| Fastest response time | 0.0248s          | 0.0225s             |
| Slowest response time | 15.3414s         | 45.4707s            |
| Average response time | 7.7644s          | 17.3992s            |


### Details of experiment setup
The YAML files used in this experiment can be found from the "manifests" directory located at the same place with this Markdown file. 

### 1. Deploy an sklearn red wine inference service and disable horizontal autoscaling
```bash
kubectl apply -f manifests/exp-no-scale.yaml
```
Load the inference service using hey
```bash
model_name=redwine-exp
input_path=redwine-input.json
host=${model_name}.kserve-inference.example.com
url=http://kserve-gateway.local:30200/v1/models/${model_name}:predict

# Simulate 4500 concurrent POST requests for 30 seconds
# "-t 0" means infinite timeout for each request
hey -t 0 -z 30s -c 4500 -m POST -host ${host} -D ${input_path} ${url}
```

### 2. Update the inference service by enabling horizontal autoscaling
```bash
kubectl apply -f manifests/exp-scale.yaml
```
Then load the updated inference service using the same command as before.


