diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.de-de.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.de-de.md index 0e9245d3044..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.de-de.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.de-de.md @@ -1,78 +1,175 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example -We will use the following example: +We will use the following example: In case an app is based on the `AI1-1-CPU` flavor with a resource size of 2 (i.e. **2 CPUs**), this means that each replica of the application will be entitled to **2 vCores** and **8GiB RAM**. @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-asia.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-asia.md index db40c1e89eb..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-asia.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-asia.md @@ -1,74 +1,171 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-au.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-au.md index db40c1e89eb..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-au.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-au.md @@ -1,74 +1,171 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ca.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ca.md index db40c1e89eb..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ca.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ca.md @@ -1,74 +1,171 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-gb.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-gb.md index 0e9245d3044..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-gb.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-gb.md @@ -1,78 +1,175 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example -We will use the following example: +We will use the following example: In case an app is based on the `AI1-1-CPU` flavor with a resource size of 2 (i.e. **2 CPUs**), this means that each replica of the application will be entitled to **2 vCores** and **8GiB RAM**. @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ie.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ie.md index db40c1e89eb..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ie.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ie.md @@ -1,74 +1,171 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-sg.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-sg.md index db40c1e89eb..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-sg.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-sg.md @@ -1,74 +1,171 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-us.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-us.md index db40c1e89eb..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-us.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-us.md @@ -1,74 +1,171 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-es.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-es.md index 0e9245d3044..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-es.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-es.md @@ -1,78 +1,175 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example -We will use the following example: +We will use the following example: In case an app is based on the `AI1-1-CPU` flavor with a resource size of 2 (i.e. **2 CPUs**), this means that each replica of the application will be entitled to **2 vCores** and **8GiB RAM**. @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-us.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-us.md index 0e9245d3044..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-us.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-us.md @@ -1,78 +1,175 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example -We will use the following example: +We will use the following example: In case an app is based on the `AI1-1-CPU` flavor with a resource size of 2 (i.e. **2 CPUs**), this means that each replica of the application will be entitled to **2 vCores** and **8GiB RAM**. @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-ca.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-ca.md index 71ceaae08f8..fb76bef3ec6 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-ca.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-ca.md @@ -1,78 +1,175 @@ --- title: "AI Deploy - Stratégies de mise à l'échelle (EN)" -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example -We will use the following example: +We will use the following example: In case an app is based on the `AI1-1-CPU` flavor with a resource size of 2 (i.e. **2 CPUs**), this means that each replica of the application will be entitled to **2 vCores** and **8GiB RAM**. @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-fr.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-fr.md index 71ceaae08f8..fb76bef3ec6 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-fr.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-fr.md @@ -1,78 +1,175 @@ --- title: "AI Deploy - Stratégies de mise à l'échelle (EN)" -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example -We will use the following example: +We will use the following example: In case an app is based on the `AI1-1-CPU` flavor with a resource size of 2 (i.e. **2 CPUs**), this means that each replica of the application will be entitled to **2 vCores** and **8GiB RAM**. @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.it-it.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.it-it.md index 0e9245d3044..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.it-it.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.it-it.md @@ -1,78 +1,175 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example -We will use the following example: +We will use the following example: In case an app is based on the `AI1-1-CPU` flavor with a resource size of 2 (i.e. **2 CPUs**), this means that each replica of the application will be entitled to **2 vCores** and **8GiB RAM**. @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pl-pl.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pl-pl.md index 0e9245d3044..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pl-pl.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pl-pl.md @@ -1,78 +1,175 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example -We will use the following example: +We will use the following example: In case an app is based on the `AI1-1-CPU` flavor with a resource size of 2 (i.e. **2 CPUs**), this means that each replica of the application will be entitled to **2 vCores** and **8GiB RAM**. @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pt-pt.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pt-pt.md index 0e9245d3044..632038e7195 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pt-pt.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pt-pt.md @@ -1,78 +1,175 @@ --- title: AI Deploy - Scaling strategies -excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy -updated: 2023-04-04 +excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them +updated: 2025-10-08 --- > [!primary] -> > AI Deploy is covered by **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. -> ## Objective -This guide covers the use of the different scaling strategies for AI Deploy. The objective is to explain the difference between **static scaling** and **autoscaling** so that you can choose the best solution depending on the use case and type of deployment. +This guide provides a comprehensive understanding of the different scaling strategies for AI Deploy. The objective is to explain the differences between **static scaling** and **autoscaling**, guide users on how to choose between them, set them during app creation, and explain how to modify scaling strategies once apps are created. ## Requirements -- a **Public Cloud** project -- access to the [OVHcloud Control Panel](/links/manager) -- start deploying an app and get to **Step 3**: `Resources` +- An active **Public Cloud** project. +- Access to the [OVHcloud Control Panel](/links/manager). +- The **OVHcloud AI CLI** (`ovhai`) installed. For installation instructions, see [how to install ovhai](/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli). ## Scaling principles -In the [OVHcloud Control Panel](/links/manager), it is possible to select the **resources** in `Step 3` of the app deployment. +When creating an application via the [OVHcloud Control Panel*** (UI) or the `ovhai` CLI, you can choose one of two scaling strategies: -This step allows you to choose between two scaling strategies: **static scaling** and **autoscaling**. +- **[Static Scaling](#static-scaling)**: Fixed number of running replicas. +- **[Autoscaling](#autoscaling)**: Dynamic replicas based on usage metrics (CPU/RAM or custom metrics). -### Static scaling +## Static Scaling -The **static scaling** strategy allows you to choose the number of replicas on which the app will be deployed. +### What is Static Scaling? + +Static scaling allows you to configure a **fixed number of replicas** (identical instances of your application) running at all times. This is the **default strategy** if not specified. The minimum number of replicas is **1** and the maximum is **10**. > [!warning] > -> It is recommended to deploy on a **minimum of 2 replicas** to have high availability! -> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. -**When to choose static scaling?** +### When to choose Static Scaling? -- Static scaling can be used if you want to have fixed costs. -- This scaling strategy is also useful when your consumption or inference load are fixed. +- You have **predictable, consistent workloads**. +- You prefer **fixed, predictable costs** with no unexpected resource usage spikes. +- Your use case requires **minimal latency**, as replicas are always active. -### Autoscaling +### Setting Static Scaling (UI and CLI) -With the autoscaling strategy, you can play on several parameters. +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. To use this strategy, make sure that automatic scaling is not enabled. Then, you will be asked to choose the number of replicas on which your application will run. +>> +>> ![Set static scaling on AI Deploy via UI](images/set-static-scaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the `--replicas` parameter to set the number of replicas at deployment: +>> +>> ```bash +>> ovhai app run /: \ +>> --replicas 2 \ +>> -- +>> ``` +>> -#### Minimum and maximum number of replicas +## Autoscaling -With the autoscaling strategy, it is possible to choose both the **minimum number of replicas** (1 by default) and the **maximum number of replicas**. +### What is Autoscaling? -#### Monitored metric +Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. -It is also possible to choose the metric to be monitored. This will act as a trigger for **autoscaling**. There are two metrics to choose from: `CPU` or `RAM`. +### Autoscaling Key Configuration Parameters -#### Trigger threshold +Using this strategy, it is possible to choose: -The threshold for the percentage of average use can also be chosen. It is an integer between 1 and 100%. - -The threshold of the average usage percentage will trigger the scaling (up or down) of the app replicas. +| Parameter | Description | +|----------------------------|-----------------------------------------------------------------------------------------------| +| **Minimum Replicas** | Lowest number of running replicas. | +| **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | > [!primary] > -> **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. -> -> Conversely, it will remove instances when this average resource utilisation falls below the threshold. +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. > -**When to choose autoscaling?** +### When to Choose Autoscaling? + +- Your app has **irregular or fluctuating** inference/load patterns. +- You want to **scale cost-effectively** with actual usage. +- You are managing a **high-throughput application** with sudden demand spikes. + +### Setting Autoscaling (UI and CLI) + +> [!tabs] +> **Using the Control Panel (UI)** +>> +>> When creating your application, you will have the opportunity to choose your **scaling strategy**. By default, the strategy is set to **static scaling**. Toggle the button to switch to **Autoscaling** Then, configure minimum/maximum replicas, metric, and threshold. +>> +>> ![Set autoscaling on AI Deploy via UI](images/set-autoscaling.png){.thumbnail} +>> +> **Using ovhai CLI** +>> +>> Use the `ovhai app run` command with the following autoscaling parameters: +>> +>> ```bash +>> ovhai app run /: \ +>> --auto-min-replicas 1 \ +>> --auto-max-replicas 5 \ +>> --auto-resource-type CPU \ +>> --auto-resource-usage-threshold 75 +>> ``` +>> + +## Advanced: Custom Metrics for Autoscaling + +**Custom metrics are recommended for workloads such as GPU based inference where CPU and RAM usage provide an incomplete picture of the system’s performance or request load.** + +For advanced scenarios, you can define **custom metrics** to drive autoscaling decisions. This requires an API endpoint to fetch metrics from. + +### Required Parameter + +- `--auto-custom-api-url`: URL of the API operation to call to get the metric value. A specific `` placeholder can be given whenever metrics API is served by the deployed app itself. + +### Optional Parameters + +| Parameter | Description | +|--------------------------------|-----------------------------------------------------------------------------| +| `--auto-custom-value-location` | Specifies where the metric value is located in the API response payload. This value is format-specific. See the valueLocation from the parameters list in the [Trigger Specification documentation](https://keda.sh/docs/2.16/scalers/metrics-api/#trigger-specification) for details. | +| `--auto-custom-target-value` | Target value for metric to scale on. | +| `--auto-custom-metric-format` | Format of the metric to scale on (`JSON`, `XML`, `YAML`, `PROMETHEUS`). Default is `JSON`. | -- You can use autoscaling if you have irregular or sawtooth inference loads. +**Example**: + +Scaling based on a custom metric from an internal endpoint: + +```bash +ovhai app run /: \ + --auto-custom-api-url http://:6000/metrics \ + --auto-custom-value-location foo.bar \ + --auto-custom-target-value 42 \ + --auto-custom-metric-format JSON +``` + +## Modifying Scaling Strategies Post-Deployment + +You can also modify the scaling strategy after the app has been created using the `ovhai app scale` CLI command. This feature is not available on the UI. + +### Updating Static Scaling + +To change the number of replicas for a static scaling strategy, use the `ovhai app scale` command with the `--replicas` parameter: + +```bash +ovhai app scale --replicas +``` + +### Updating Autoscaling + +To change the autoscaling parameters, use the `ovhai app scale` command with the appropriate autoscaling parameters: + +```bash +ovhai app scale \ + --auto-min-replicas \ + --auto-max-replicas \ + --auto-resource-type \ + --auto-resource-usage-threshold \ + +``` ## Scaling example -We will use the following example: +We will use the following example: In case an app is based on the `AI1-1-CPU` flavor with a resource size of 2 (i.e. **2 CPUs**), this means that each replica of the application will be entitled to **2 vCores** and **8GiB RAM**. @@ -94,13 +191,11 @@ In this example, the app will be scaled up when the average RAM usage across all > [!primary] > -> The **total deployment price** will be calculated using the minimum number of replicas. -> +> The total deployment price for **autoscaling apps** is calculated based on the **minimum number of replicas**, **but** costs can **increase** during scaling. -> [!warning] -> -> The cost may increase as `Autoscaling` increases. -> +## Conclusion + +Choosing the right scaling strategy is critical for balancing cost, performance, and reliability in your AI Deploy applications. Static scaling offers stability and predictability, while autoscaling provides flexibility for dynamic workloads. ## Feedback diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-autoscaling.png b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-autoscaling.png new file mode 100644 index 00000000000..c1b314575c4 Binary files /dev/null and b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-autoscaling.png differ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-static-scaling.png b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-static-scaling.png new file mode 100644 index 00000000000..fdc821c4982 Binary files /dev/null and b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-static-scaling.png differ