# VASIM Autoscaling Simulator Toolkit Example

### Goals

1. **Applicability:** Integration with various algorithms and parameter customization.

2. **Simulation:** Realistic workload modeling, achieved within minutes.

3. **Parameter Tuning:** Fine-tuning for optimal performance and cost savings.

4. **Cost Analysis:** Demonstrating potential cost savings.


### Overview: Autoscaling Components


<img src="../docs/demo_pics/Autoscaling_infra.png" alt="Autoscaling Components" title="Autoscaling Components" width="300" >

- Application: The running software, such as Postgres and SQL Server.
- Controller: Manages tasks and publishes metrics.
- Metrics Server: Stores and provides metrics.
- Recommender Algorithm: Makes resource allocation decisions.
- Scaler Entity: Enacts decisions, adjusting resource allocation.

These components work together for effective autoscaling.


### Components

VASIM is a standalone Python package that internally replicates these components, maintaining and processing scaling states. It also includes a cluster state simulator for current utilization and limits, along with a post-processing analyzer for performance assessment. Input is in the same format as timestamp and utilization datasets. Output is a set of metrics and scaling decision trace.

VASIM allows you to try different recommender algorithms to adapt to different workloads. Workloads can be bursty, monotononic, cyclical, and business needs may vary based on complexity, cost, and performance.

<img src="../docs/demo_pics/VASim_infra.png" alt="VASIM Overview" title="VASIM Overview" width="300">




# Getting started


### Prepare data:
Get your test data. For this experiment we'll be working with the [Alibaba dataset](https://github.com/alibaba/clusterdata), trace c_12104.

There are some important things to note about the data for Vasim.

1. You currently must name your csv's ending with `perf_event_log.csv`. (Ex: `c_12104.csv_perf_event_log.csv`.)  This is so the ingester does not accidentally read in other files in the directory such as output files.
2. You must format your data in two columns (`TIMESTAMP`,`CPU_USAGE_ACTUAL`) as follows:  (TODO: We [plan](https://github.com/microsoft/vasim/issues/34) to use Open Telemetry format in the future.) 
3. You may have multiple CSVs, just put everything in the same folder.

In [3]:
! head data/c_12104.csv_perf_event_log.csv

TIMESTAMP,CPU_USAGE_ACTUAL
2023.04.02-00:09:00:000,7.2
2023.04.02-00:10:00:000,7.04
2023.04.02-00:11:00:000,6.88
2023.04.02-00:12:00:000,6.72
2023.04.02-00:13:00:000,6.48
2023.04.02-00:14:00:000,6.501818181818182
2023.04.02-00:15:00:000,6.523636363636364
2023.04.02-00:16:00:000,6.545454545454546
2023.04.02-00:17:00:000,6.567272727272727


### Prepare algorithm

You also need to implement your Recommender algorithm. There are examples in the [recommender](../recommender) folder.

To implement a recommender you will need to create a file for your algorithm, add any paramters you need to the `__init__` function and then implement the core logic of your recommender in the `run()` function.

```python
class SimpleAdditiveRecommender(Recommender):
    def __init__(self, cluster_state_provider, save_metadata=True):
        # Copy the code at the top of this function as-is.

        # Put your parameters here hard-coded, or pass them in to your
        # `metadata.json` file in the `algo_specific_config` section.
        self.my_param = self.algo_params.get("myparam", 2)

    def run(self, recorded_data):
        """
        This method runs the recommender algorithm and returns the new number of
              cores to scale to (new limit).

        Inputs:
            recorded_data (pd.DataFrame): The recorded metrics data for the current time window to simulate
        Returns:
            latest_time (datetime): The latest time of the performance data.
            new_limit (float): The new number of cores to scale to.
        """

        # Your logic goes here! Look at the data in the `recorded_data` dataframe,
        #   do a calculation, and return the number of cores to scale to.

        return new_limit
```

For this example, we'll be working with the [DummyAdditiveRecommender](../recommender/DummyAdditiveRecommender.py), which takes a moving average of the CPU values and adds a fixed buffer amount to the top.

### Preparing metadata.json
We also need to prepare a file that provides the default set of configuration parameters. You can see some examples of this file in [the test folder](../tests/test_data/alibaba_control_c_29247_denom_1_mini). The default name of this file is `metadata.json`, but you can give it any name and pass it in as a parameter.

There are three sets of parameters:
* **`algo_specific_config`** : this is where you put the parameters you will use in your algorithm's `run` function
  * An example: `addend: 2` for the `DummyAdditiveRecommender` we'll be using for this example.
* **`general_config`** : These are the parameters related to how the csv trace data is passed in, and the simulation safe-guards.
  * `window` (int): the amount of data that is passed to the algorithm in the `recorded_data` paramaeter in the `run` function. _See "original window" below_
  * `lag` (float): Number of minutes to wait after making a decision
  * `min_cpu_limit` (int): The minimum number of cores to recommend. This is used as a [safety guard](https://github.com/microsoft/vasim/blob/198a06062a91f6455b87710b0e59834530b6ea29/simulator/SimulatedInfraScaler.py#L56) in the simulated infra.
  * `max_cpu_limit` (int): The maximum number of cores to recommend. (same as min)
* **`prediction_config`** : this relates to the window of [predicted](https://github.com/microsoft/vasim/blob/198a06062a91f6455b87710b0e59834530b6ea29/recommender/cluster_state_provider/PredictiveFileClusterStateProvider.py#L29) data that is fed into the algorithm. It uses a time series to forecast what the data might be in the future to help the algorithm proactively scale
  * `waiting_before_predict` (int of minutes) : This is the amount of data to consume before making prediction. It is usually set to `1440`, for 60 min * 24 hours = 1 day.
  * `frequency_minutes` (int) : This is how frequent your timestamps are in the csvs you provide. This *MUST* match your csvs.  (Ex: 1 in the above case.) TODO: automate.
  * `forecasting_models` (string): For now we only support "naive".  This parameter is not currently used because it's the only thing supported.
  * `minutes_to_predict` (int): This is the “forecasting horizon”/how much to look forward.  Increasing this means you’ll be MORE proactive in adjusting based on history. This is a good one to tune.
  * `total_predictive_window` (int): By [default](https://github.com/microsoft/vasim/blob/198a06062a91f6455b87710b0e59834530b6ea29/recommender/cluster_state_provider/PredictiveFileClusterStateProvider.py#L39), this parameter is `minutes_to_predict`/`frequency_minutes` + `window`.  If you would like to change this, you can do it here. 
   _This is the "new window" in the diagram below_, essentially the total amount of minutes you want 


Here is a picture that explains the `window` and `prediction_config`:

<img src="../docs/demo_pics/predictive_window.png" alt="Data Windows" title="Data Windows" >

