From e27a94f4c4e4a2a5641d3b17645c2332cfac36b4 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Thu, 13 Mar 2025 13:09:35 +0100 Subject: [PATCH 1/5] feat(gpu): add h100-sxm --- menu/navigation.json | 4 ++ .../gpu/how-to/use-nvidia-mig-technology.mdx | 2 +- .../use-scratch-storage-h100-instances.mdx | 10 ++++- .../choosing-gpu-instance-type.mdx | 22 +++++++++- .../gpu-instances-bandwidth-overview.mdx | 4 ++ .../understanding-nvidia-fp8.mdx | 4 +- .../understanding-nvidia-nvlink.mdx | 42 +++++++++++++++++++ pages/instances/faq.mdx | 4 ++ .../organization-quotas.mdx | 4 ++ 9 files changed, 89 insertions(+), 7 deletions(-) create mode 100644 pages/gpu/reference-content/understanding-nvidia-nvlink.mdx diff --git a/menu/navigation.json b/menu/navigation.json index e347a21d35..ae25de806b 100644 --- a/menu/navigation.json +++ b/menu/navigation.json @@ -1817,6 +1817,10 @@ { "label": "Understanding NVIDIA FP8 format", "slug": "understanding-nvidia-fp8" + }, + { + "label": "Understanding NVIDIA NVLink", + "slug": "understanding-nvidia-nvlink" } ], "label": "Additional Content", diff --git a/pages/gpu/how-to/use-nvidia-mig-technology.mdx b/pages/gpu/how-to/use-nvidia-mig-technology.mdx index 0e942d64d8..1194c7241c 100644 --- a/pages/gpu/how-to/use-nvidia-mig-technology.mdx +++ b/pages/gpu/how-to/use-nvidia-mig-technology.mdx @@ -79,7 +79,7 @@ By default, the MIG feature of NVIDIA GPUs is disabled. To use it with your GPU MIG is now enabled for the GPU Instance. ## How to list MIG Profiles -The NVIDIA driver provides several predefined profiles you can choose from while setting up the MIG (Multi-Instance GPU) feature on the H100. +The NVIDIA driver provides several predefined profiles you can choose from while setting up the MIG (Multi-Instance GPU) feature on the H100 and H100-SXM. These profiles determine the sizes and functionalities available of the MIG partitions that users can generate. Additionally, the driver supplies details regarding placements, which specify the types and quantities of Instances that can be established. diff --git a/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx b/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx index 9331a36838..74743e9918 100644 --- a/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx +++ b/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx @@ -13,7 +13,7 @@ categories: - compute --- -Scaleway H100 and L40S GPU Instances are equipped with additional scratch storage. This form of temporary Local Storage operates differently from our regular Local Storage. +Scaleway H100, H100-SXM and L40S GPU Instances are equipped with additional scratch storage. This form of temporary Local Storage operates differently from our regular Local Storage. Scratch storage temporarily accommodates data during computational or data processing tasks. It is commonly used for storing intermediate results, processing input data, or holding output data before that data is moved to more permanent storage. @@ -41,10 +41,16 @@ This enhancement allows us to provide the GPU with a substantial amount of scrat * for L40S-8-48G Instances: 12.8 TB * for H100-1-80G Instances: 3 TB * for H100-2-80G Instances: 6 TB + * for H100-SXM-1-80G Instances: ~1.5 TB + * for H100-SXM-2-80G Instances: ~3 TB + * for H100-SXM-4-80G Instances: ~6 TB + * for H100-SXM-1-80G Instances: ~12 TB + + ## How can I add scratch storage to my GPU Instance using the Scaleway CLI or console? -Scratch storage is automatically added when creating H100 and L40S Instances. +Scratch storage is automatically added when creating H10, H100-SXM and L40S Instances. ## How can I add scratch storage to my GPU Instance when using the API? You need to add an extra volume, for example: diff --git a/pages/gpu/reference-content/choosing-gpu-instance-type.mdx b/pages/gpu/reference-content/choosing-gpu-instance-type.mdx index dc317df561..0c716220ed 100644 --- a/pages/gpu/reference-content/choosing-gpu-instance-type.mdx +++ b/pages/gpu/reference-content/choosing-gpu-instance-type.mdx @@ -34,9 +34,9 @@ Below, you will find a guide to help you make an informed decision: * **Scaling:** Consider the scalability requirements of your workload. The most efficient way to scale up your workload is by using: * Bigger GPU * Up to 2 PCIe GPU with [H100 Instances](https://www.scaleway.com/en/h100-pcie-try-it-now/) or 8 PCIe GPU with [L4](https://www.scaleway.com/en/l4-gpu-instance/) or [L4OS](https://www.scaleway.com/en/contact-l40s/) Instances. - * An HGX-based server setup with 8x NVlink GPUs + * An HGX-based server setup with up to 8x NVlink GPUs with [H100-SXM Instances]() * A [supercomputer architecture](https://www.scaleway.com/en/ai-supercomputers/) for a larger setup for workload-intensive tasks - * Another way to scale your workload is to use [Kubernetes and MIG](/gpu/how-to/use-nvidia-mig-technology/): You can divide a single H100 GPU into as many as 7 MIG partitions. This means that instead of employing seven P100 GPUs to set up seven K8S pods, you could opt for a single H100 GPU with MIG to effectively deploy all seven K8S pods. + * Another way to scale your workload is to use [Kubernetes and MIG](/gpu/how-to/use-nvidia-mig-technology/): You can divide a single H100 or H100-SXM GPU into as many as 7 MIG partitions. This means that instead of employing seven P100 GPUs to set up seven K8S pods, you could opt for a single H100 GPU with MIG to effectively deploy all seven K8S pods. * **Online resources:** Check for online resources, forums, and community discussions related to the specific GPU type you are considering. This can provide insights into common issues, best practices, and optimizations. Remember that there is no one-size-fits-all answer, and the right GPU Instance type will depend on your workload’s unique requirements and budget. It is important that you regularly reassess your choice as your workload evolves. Depending on which type best fits your evolving tasks, you can easily migrate from one GPU Instance type to another. @@ -62,6 +62,24 @@ Remember that there is no one-size-fits-all answer, and the right GPU Instance t | Better used for | Image / Video encoding (4K) | 7B LLM Fine-Tuning / Inference | 70B LLM Fine-Tuning / Inference | | What they are not made for | Large models (especially LLM) | Graphic or video encoding use cases | Graphic or video encoding use cases | +| | **[H100-SXM-1-80G](https://www.scaleway.com/en/TBD/)** | **[H100-SXM-2-80G](https://www.scaleway.com/en/TBD/)** | **[H100-SXM-4-80G](https://www.scaleway.com/en/TBD/)** | **[H100-SXM-80G](https://www.scaleway.com/en/TBD/)** | +|--------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------| +| GPU Type | 1x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | 2x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | 4x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | 8x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | +| NVIDIA architecture | Hopper 2022 | Hopper 2022 | Hopper 2022 | Hopper 2022 | +| Tensor Cores | Yes | Yes | Yes | Yes | +| Performance (training in FP16 Tensor Cores) | 1x 1513 TFLOPS | 2x 1513 TFLOPS | 4x 1513 TFLOPS | 8x 1513 TFLOPS | +| VRAM | 1x 80 GB HBM2E (Memory bandwidth: 2TB/s) | 2x 80 GB HBM2E (Memory bandwidth: 2TB/s) | 4x 80 GB HBM2E (Memory bandwidth: 2TB/s) | 8x 80 GB HBM2E (Memory bandwidth: 2TB/s) | +| CPU Type | Xeon Platinum 8452Y (2.0 GHz) | Xeon Platinum 8452Y (2.0 GHz) | Xeon Platinum 8452Y (2.0 GHz) | Xeon Platinum 8452Y (2.0 GHz) | +| vCPUs | 16 | 32 | 64 | 128 | +| RAM | 120 GB DDR5 | 240 GB DDR5 | 480 GB DDR5 | 960 GB DDR5 | +| Storage | Boot on Block 5K | Boot on Block 5K | Boot on Block 5K | Boot on Block 5K | +| [Scratch Storage](/gpu/how-to/use-scratch-storage-h100-instances/) | Yes (~1.5 TB) | Yes (~3 TB) | Yes (~6 TB) | Yes (~12 TB) | +| [MIG compatibility](/gpu/how-to/use-nvidia-mig-technology/) | Yes | Yes | Yes | Yes | +| Bandwidth | 10 Gbps | 20 Gbps | 20 Gbps | 20 Gbps | +| Network technology | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | +| Better used for | *To be defined* | *To be defined* | *To be defined* | *To be defined* | +| What they are not made for | *To be defined* | *To be defined* | *To be defined* | *To be defined* | + | | **[L4-1-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | **[L4-2-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | **[L4-4-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | **[L4-8-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | |---------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | GPU Type | 1x [L4](https://www.nvidia.com/en-us/data-center/l4/) PCIe4 | 2x [L4](https://www.nvidia.com/en-us/data-center/l4/) PCIe4 | 4x [L4](https://www.nvidia.com/en-us/data-center/l4/)PCIe4 | 8x [L4](https://www.nvidia.com/en-us/data-center/l4/) PCIe4 | diff --git a/pages/gpu/reference-content/gpu-instances-bandwidth-overview.mdx b/pages/gpu/reference-content/gpu-instances-bandwidth-overview.mdx index f1ca1d8402..787037415e 100644 --- a/pages/gpu/reference-content/gpu-instances-bandwidth-overview.mdx +++ b/pages/gpu/reference-content/gpu-instances-bandwidth-overview.mdx @@ -35,6 +35,10 @@ GPU workloads often involve processing large datasets, requiring high-bandwidth | Instance Type | Internet Bandwidth | Block Bandwidth | |-------------------|-------------------------|---------------------| +| H100-SXM-1-80G | 10 Gbit/s | 5 GiB/s | +| H100-SXM-2-80G | 20 Gbit/s | 5 GiB/s | +| H100-SXM-4-80G | 20 Gbit/s | 5 GiB/s | +| H100-SXM-8-80G | 20 Gbit/s | 5 GiB/s | | H100-1-80G | 10 Gbit/s | 2 GiB/s | | H100-2-80G | 20 Gbit/s | 4 GiB/s | | L40S-1-48G | 2.5 Gbit/s | 1 GiB/s | diff --git a/pages/gpu/reference-content/understanding-nvidia-fp8.mdx b/pages/gpu/reference-content/understanding-nvidia-fp8.mdx index a1f209822d..8d88cba22f 100644 --- a/pages/gpu/reference-content/understanding-nvidia-fp8.mdx +++ b/pages/gpu/reference-content/understanding-nvidia-fp8.mdx @@ -7,13 +7,13 @@ content: paragraph: This section provides information about NVIDIA FP8 (8-bit floating point) format tags: NVIDIA FP8 GPU cloud dates: - validation: 2024-10-14 + validation: 2025-03-13 posted: 2023-10-23 categories: - compute --- -Scaleway offers GPU Instances featuring [L4, L40S and H100 GPUs](https://www.scaleway.com/en/h100-pcie-try-it-now/) that support FP8 (8-bit floating point), a revolutionary datatype introduced by NVIDIA. It enables higher throughput of matrix multipliers and convolutions. +Scaleway offers GPU Instances featuring [L4, L40S, H100 and H100-SXM GPUs](/gpu/reference-content/choosing-gpu-instance-type/) that support FP8 (8-bit floating point), a revolutionary datatype introduced by NVIDIA. It enables higher throughput of matrix multipliers and convolutions. FP8 is an 8-bit floating point standard which was jointly developed by NVIDIA, ARM, and Intel to speed up AI development by improving memory efficiency during AI training and inference processes. diff --git a/pages/gpu/reference-content/understanding-nvidia-nvlink.mdx b/pages/gpu/reference-content/understanding-nvidia-nvlink.mdx new file mode 100644 index 0000000000..225d5d741e --- /dev/null +++ b/pages/gpu/reference-content/understanding-nvidia-nvlink.mdx @@ -0,0 +1,42 @@ +--- +meta: + title: Understanding NVIDIA NVLink + description: This section provides information about NVIDIA NVLink +content: + h1: Understanding NVIDIA NVLink + paragraph: This section provides information about NVIDIA NVLink +tags: NVIDIA NVLink +dates: + validation: 2025-03-13 + posted: 2025-03-13 +categories: + - compute +--- + +NVLink is NVIDIA's high-bandwidth, low-latency GPU-to-GPU interconnect with built-in resiliency features, available on Scaleway's [H100-SGX Instances](/gpu/reference-content/choosing-gpu-instance-type/#gpu-instances-and-ai-supercomputer-comparison-table). It was designed to significantly improve the performance and efficiency when connecting GPUs, CPUs, and other components within the same node. +It provides much higher bandwidth (up to 900 GB/s total GPU-to-GPU bandwidth in an 8-GPU configuration) and lower latency compared to traditional PCIe Gen 4 (up to 32 GB/s per link). +This allows more data to be transferred between GPUs in less time while also reducing latency. + +The high bandwidth and low latency make NVLink ideal for applications that require real-time data synchronization and processing, such as AI and HPC use-case scenarios. +NVLink provides up to 900 GB/s total bandwidth for multi-GPU I/O and shared memory accesses, which is 7x the bandwidth of PCIe Gen 5. +NVLink allows direct GPU-to-GPU interconnection, improving data transfer efficiency and reducing the need for CPU intervention, which can introduce bottlenecks. + +NVLink supports the connection of multiple GPUs, enabling the creation of powerful multi-GPU systems capable of handling more complex and demanding workloads. +With Unified Memory Access, NVLink enables direct memory access between GPUs without CPU mediation, enhancing efficiency in large-scale AI and HPC workloads. + +### Comparison: NVLink vs. PCIe +NVLink and PCI Express (PCIe) are both used for GPU communication, but NVLink is specifically designed to address the bandwidth and latency bottlenecks of PCIe in multi-GPU setups. + +| Feature | NVLink 4.0 (H100-SGX) | PCIe 5.0 | +|-------------------|---------------------------|------------------------------------| +| **Use case** | High-performance computing, deep learning | General-purpose computing, graphics | +| **Bandwidth** | Up to 900 GB/s (aggregate, multi-GPU) | 128 GB/s (x16 bidirectional) | +| **Latency** | Lower than PCIe (sub-microsecond) | Higher compared to NVLink | +| **Communication** | Direct GPU-to-GPU | Through CPU or PCIe switch | +| **Memory sharing** | Unified memory space across GPUs | Requires CPU intervention (higher overhead) | +| **Scalability** | Multi-GPU direct connection via NVSwitch | Limited by PCIe lanes | +| **Efficiency** | Optimized for GPU workloads | More general-purpose | + +**Unified Memory Access** allows GPUs to access each other's memory directly without CPU mediation, which is particularly beneficial for large-scale AI and HPC workloads. + +In summary, NVLink, available on [H100-SGX Instances](/gpu/reference-content/choosing-gpu-instance-type/#gpu-instances-and-ai-supercomputer-comparison-table), is **superior** for **multi-GPU AI and HPC** workloads due to its **higher bandwidth, lower latency, and memory-sharing capabilities**, while **PCIe remains essential** for broader system connectivity and general computing. diff --git a/pages/instances/faq.mdx b/pages/instances/faq.mdx index 98b358b05f..beb7415ebe 100644 --- a/pages/instances/faq.mdx +++ b/pages/instances/faq.mdx @@ -151,6 +151,10 @@ You can change the storage type and flexible IP after the Instance creation, whi | Range | Available in | Price | |-------------------|------------------------|-------------------| +| H100-SXM-1-80G | TBD | €X.XX/hour¹ | +| H100-SXM-2-80G | TBD | €X.XX/hour¹ | +| H100-SXM-4-80G | TBD | €X.XX/hour¹ | +| H100-SXM-8-80G | TBD | €X.XX/hour¹ | | H100-1-80G | PAR2, WAW2 | €2.52/hour¹ | | H100-2-80G | PAR2, WAW2 | €5.04/hour¹ | | L40S-1-48G | PAR2 | €1.40/hour¹ | diff --git a/pages/organizations-and-projects/additional-content/organization-quotas.mdx b/pages/organizations-and-projects/additional-content/organization-quotas.mdx index f61a701229..1c722e9392 100644 --- a/pages/organizations-and-projects/additional-content/organization-quotas.mdx +++ b/pages/organizations-and-projects/additional-content/organization-quotas.mdx @@ -138,6 +138,10 @@ At Scaleway, quotas are applicable per [Organization](/iam/concepts/#organizatio | GPU 3070 - S| To use this product, you must [validate your identity](/account/how-to/verify-identity/). | 1 | | H100-1-80G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | | H100-2-80G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | +| H100-SXM-1-80G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | +| H100-SXM-2-80G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | +| H100-SXM-4-80G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | +| H100-SXM-8-80G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | | L4-1-24G | To use this product, you must [validate your identity](/account/how-to/verify-identity/). | 1 | | L4-2-24G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | | L4-4-24G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | From d715295f31ba1ebe29101c2dbbb88c8ab760e713 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Thu, 13 Mar 2025 13:32:39 +0100 Subject: [PATCH 2/5] feat(gpu): update wording --- pages/account/reference-content/products-availability.mdx | 2 ++ .../gpu/reference-content/understanding-nvidia-nvlink.mdx | 6 ++---- pages/instances/faq.mdx | 8 ++++---- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/pages/account/reference-content/products-availability.mdx b/pages/account/reference-content/products-availability.mdx index 10fd4e844c..f29436c8e4 100644 --- a/pages/account/reference-content/products-availability.mdx +++ b/pages/account/reference-content/products-availability.mdx @@ -24,6 +24,8 @@ Scaleway products are available in multiple regions and locations worldwide. Thi | Product Category | Product | Paris region | Amsterdam region | Warsaw region | |---------------------------|---------------------------------------|------------------------|-------------------------|------------------------| | **Compute** | Instances | PAR1, PAR2, PAR3 | AMS1, AMS2, AMS3 | WAW1, WAW2, WAW3 | +| | GPU H100-SXM-X-80G | PAR2 | Not available yet | Not available yet | +| | GPU H100-X-80G | PAR2 | Not available yet | WAW2 | | | GPU H100-X-80G | PAR2 | Not available yet | WAW2 | | | GPU L40S-X-48G | PAR2 | Not available yet | WAW2 | | | GPU L4-X-24G | PAR2 | Not available yet | WAW2 | diff --git a/pages/gpu/reference-content/understanding-nvidia-nvlink.mdx b/pages/gpu/reference-content/understanding-nvidia-nvlink.mdx index 225d5d741e..358d333768 100644 --- a/pages/gpu/reference-content/understanding-nvidia-nvlink.mdx +++ b/pages/gpu/reference-content/understanding-nvidia-nvlink.mdx @@ -22,7 +22,7 @@ NVLink provides up to 900 GB/s total bandwidth for multi-GPU I/O and shared memo NVLink allows direct GPU-to-GPU interconnection, improving data transfer efficiency and reducing the need for CPU intervention, which can introduce bottlenecks. NVLink supports the connection of multiple GPUs, enabling the creation of powerful multi-GPU systems capable of handling more complex and demanding workloads. -With Unified Memory Access, NVLink enables direct memory access between GPUs without CPU mediation, enhancing efficiency in large-scale AI and HPC workloads. +Unified Memory Access allows GPUs to access each other's memory directly without CPU mediation, which is particularly beneficial for large-scale AI and HPC workloads. ### Comparison: NVLink vs. PCIe NVLink and PCI Express (PCIe) are both used for GPU communication, but NVLink is specifically designed to address the bandwidth and latency bottlenecks of PCIe in multi-GPU setups. @@ -37,6 +37,4 @@ NVLink and PCI Express (PCIe) are both used for GPU communication, but NVLink is | **Scalability** | Multi-GPU direct connection via NVSwitch | Limited by PCIe lanes | | **Efficiency** | Optimized for GPU workloads | More general-purpose | -**Unified Memory Access** allows GPUs to access each other's memory directly without CPU mediation, which is particularly beneficial for large-scale AI and HPC workloads. - -In summary, NVLink, available on [H100-SGX Instances](/gpu/reference-content/choosing-gpu-instance-type/#gpu-instances-and-ai-supercomputer-comparison-table), is **superior** for **multi-GPU AI and HPC** workloads due to its **higher bandwidth, lower latency, and memory-sharing capabilities**, while **PCIe remains essential** for broader system connectivity and general computing. +In summary, NVLink, available on [H100-SGX Instances](/gpu/reference-content/choosing-gpu-instance-type/#gpu-instances-and-ai-supercomputer-comparison-table), is **superior** for **multi-GPU AI and HPC** workloads due to its **higher bandwidth, lower latency, and memory-sharing capabilities**, while PCIe remains essential for broader system connectivity and general computing. diff --git a/pages/instances/faq.mdx b/pages/instances/faq.mdx index beb7415ebe..c9f267f510 100644 --- a/pages/instances/faq.mdx +++ b/pages/instances/faq.mdx @@ -151,10 +151,10 @@ You can change the storage type and flexible IP after the Instance creation, whi | Range | Available in | Price | |-------------------|------------------------|-------------------| -| H100-SXM-1-80G | TBD | €X.XX/hour¹ | -| H100-SXM-2-80G | TBD | €X.XX/hour¹ | -| H100-SXM-4-80G | TBD | €X.XX/hour¹ | -| H100-SXM-8-80G | TBD | €X.XX/hour¹ | +| H100-SXM-1-80G | PAR2 | €X.XX/hour¹ | +| H100-SXM-2-80G | PAR2 | €X.XX/hour¹ | +| H100-SXM-4-80G | PAR2 | €X.XX/hour¹ | +| H100-SXM-8-80G | PAR2 | €X.XX/hour¹ | | H100-1-80G | PAR2, WAW2 | €2.52/hour¹ | | H100-2-80G | PAR2, WAW2 | €5.04/hour¹ | | L40S-1-48G | PAR2 | €1.40/hour¹ | From ee27dda4addbe15d738543f790e095f101a12dcd Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Fri, 18 Apr 2025 17:39:03 +0200 Subject: [PATCH 3/5] feat(gpu): update docs --- .../use-scratch-storage-h100-instances.mdx | 1 - .../choosing-gpu-instance-type.mdx | 38 +++++++++---------- .../gpu-instances-bandwidth-overview.mdx | 1 - pages/instances/faq.mdx | 7 ++-- 4 files changed, 22 insertions(+), 25 deletions(-) diff --git a/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx b/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx index 74743e9918..7cfccbcdef 100644 --- a/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx +++ b/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx @@ -41,7 +41,6 @@ This enhancement allows us to provide the GPU with a substantial amount of scrat * for L40S-8-48G Instances: 12.8 TB * for H100-1-80G Instances: 3 TB * for H100-2-80G Instances: 6 TB - * for H100-SXM-1-80G Instances: ~1.5 TB * for H100-SXM-2-80G Instances: ~3 TB * for H100-SXM-4-80G Instances: ~6 TB * for H100-SXM-1-80G Instances: ~12 TB diff --git a/pages/gpu/reference-content/choosing-gpu-instance-type.mdx b/pages/gpu/reference-content/choosing-gpu-instance-type.mdx index 0c716220ed..a270f0e496 100644 --- a/pages/gpu/reference-content/choosing-gpu-instance-type.mdx +++ b/pages/gpu/reference-content/choosing-gpu-instance-type.mdx @@ -22,7 +22,7 @@ It empowers European AI startups, giving them the tools (without the need for a ## How to choose the right GPU Instance type -Scaleway provides a range of GPU Instance offers, from [GPU RENDER Instances](https://www.scaleway.com/en/gpu-render-instances/) and [H100 PCIe Instances](https://www.scaleway.com/en/h100-pcie-try-it-now/) to [custom build clusters](https://www.scaleway.com/en/ai-supercomputers/). There are several factors to consider when choosing the right GPU Instance type to ensure that it meets your performance, budget, and scalability requirements. +Scaleway provides a range of GPU Instance offers, from [GPU RENDER Instances](https://www.scaleway.com/en/gpu-render-instances/) and [H100 SXM Instances](https://www.scaleway.com/en/gpu-instances/) to [custom build clusters](https://www.scaleway.com/en/ai-supercomputers/). There are several factors to consider when choosing the right GPU Instance type to ensure that it meets your performance, budget, and scalability requirements. Below, you will find a guide to help you make an informed decision: * **Workload requirements:** Identify the nature of your workload. Are you running machine learning, deep learning, high-performance computing (HPC), data analytics, or graphics-intensive applications? Different Instance types are optimized for different types of workloads. For example, the H100 is not designed for graphics rendering. However, other models are. As [stated by Tim Dettmers](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/), “Tensor Cores are most important, followed by the memory bandwidth of a GPU, the cache hierarchy, and only then FLOPS of a GPU.”. For more information, refer to the [NVIDIA GPU portfolio](https://docs.nvidia.com/data-center-gpu/line-card.pdf). @@ -34,7 +34,7 @@ Below, you will find a guide to help you make an informed decision: * **Scaling:** Consider the scalability requirements of your workload. The most efficient way to scale up your workload is by using: * Bigger GPU * Up to 2 PCIe GPU with [H100 Instances](https://www.scaleway.com/en/h100-pcie-try-it-now/) or 8 PCIe GPU with [L4](https://www.scaleway.com/en/l4-gpu-instance/) or [L4OS](https://www.scaleway.com/en/contact-l40s/) Instances. - * An HGX-based server setup with up to 8x NVlink GPUs with [H100-SXM Instances]() + * Or better, an HGX-based server setup with up to 8x NVlink GPUs with [H100-SXM Instances](https://www.scaleway.com/en/gpu-instances/) * A [supercomputer architecture](https://www.scaleway.com/en/ai-supercomputers/) for a larger setup for workload-intensive tasks * Another way to scale your workload is to use [Kubernetes and MIG](/gpu/how-to/use-nvidia-mig-technology/): You can divide a single H100 or H100-SXM GPU into as many as 7 MIG partitions. This means that instead of employing seven P100 GPUs to set up seven K8S pods, you could opt for a single H100 GPU with MIG to effectively deploy all seven K8S pods. * **Online resources:** Check for online resources, forums, and community discussions related to the specific GPU type you are considering. This can provide insights into common issues, best practices, and optimizations. @@ -62,23 +62,23 @@ Remember that there is no one-size-fits-all answer, and the right GPU Instance t | Better used for | Image / Video encoding (4K) | 7B LLM Fine-Tuning / Inference | 70B LLM Fine-Tuning / Inference | | What they are not made for | Large models (especially LLM) | Graphic or video encoding use cases | Graphic or video encoding use cases | -| | **[H100-SXM-1-80G](https://www.scaleway.com/en/TBD/)** | **[H100-SXM-2-80G](https://www.scaleway.com/en/TBD/)** | **[H100-SXM-4-80G](https://www.scaleway.com/en/TBD/)** | **[H100-SXM-80G](https://www.scaleway.com/en/TBD/)** | -|--------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------| -| GPU Type | 1x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | 2x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | 4x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | 8x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | -| NVIDIA architecture | Hopper 2022 | Hopper 2022 | Hopper 2022 | Hopper 2022 | -| Tensor Cores | Yes | Yes | Yes | Yes | -| Performance (training in FP16 Tensor Cores) | 1x 1513 TFLOPS | 2x 1513 TFLOPS | 4x 1513 TFLOPS | 8x 1513 TFLOPS | -| VRAM | 1x 80 GB HBM2E (Memory bandwidth: 2TB/s) | 2x 80 GB HBM2E (Memory bandwidth: 2TB/s) | 4x 80 GB HBM2E (Memory bandwidth: 2TB/s) | 8x 80 GB HBM2E (Memory bandwidth: 2TB/s) | -| CPU Type | Xeon Platinum 8452Y (2.0 GHz) | Xeon Platinum 8452Y (2.0 GHz) | Xeon Platinum 8452Y (2.0 GHz) | Xeon Platinum 8452Y (2.0 GHz) | -| vCPUs | 16 | 32 | 64 | 128 | -| RAM | 120 GB DDR5 | 240 GB DDR5 | 480 GB DDR5 | 960 GB DDR5 | -| Storage | Boot on Block 5K | Boot on Block 5K | Boot on Block 5K | Boot on Block 5K | -| [Scratch Storage](/gpu/how-to/use-scratch-storage-h100-instances/) | Yes (~1.5 TB) | Yes (~3 TB) | Yes (~6 TB) | Yes (~12 TB) | -| [MIG compatibility](/gpu/how-to/use-nvidia-mig-technology/) | Yes | Yes | Yes | Yes | -| Bandwidth | 10 Gbps | 20 Gbps | 20 Gbps | 20 Gbps | -| Network technology | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | -| Better used for | *To be defined* | *To be defined* | *To be defined* | *To be defined* | -| What they are not made for | *To be defined* | *To be defined* | *To be defined* | *To be defined* | +| | **[H100-SXM-2-80G](https://www.scaleway.com/en/TBD/)** | **[H100-SXM-4-80G](https://www.scaleway.com/en/TBD/)** | **[H100-SXM-80G](https://www.scaleway.com/en/TBD/)** | +|--------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------| +| GPU Type | 2x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | 4x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | 8x [H100-SXM](https://www.nvidia.com/en-us/data-center/h100/) SXM | +| NVIDIA architecture | Hopper 2022 | Hopper 2022 | Hopper 2022 | +| Tensor Cores | Yes | Yes | Yes | +| Performance (training in FP16 Tensor Cores) | 2x 1979 TFLOPS | 4x 1979 TFLOPS | 8x 1979 TFLOPS | +| VRAM | 2x 80 GB HBM3 (Memory bandwidth: 3.35TB/s) | 4x 80 GB HBM3 (Memory bandwidth: 3.35TB/s) | 8x 80 GB HBM3 (Memory bandwidth: 3.35TB/s) | +| CPU Type | Xeon Platinum 8452Y (2.0 GHz) | Xeon Platinum 8452Y (2.0 GHz) | Xeon Platinum 8452Y (2.0 GHz) | +| vCPUs | 32 | 64 | 128 | +| RAM | 240 GB DDR5 | 480 GB DDR5 | 960 GB DDR5 | +| Storage | Boot on Block 5K | Boot on Block 5K | Boot on Block 5K | +| [Scratch Storage](/gpu/how-to/use-scratch-storage-h100-instances/) | Yes (~3 TB) | Yes (~6 TB) | Yes (~12 TB) | +| [MIG compatibility](/gpu/how-to/use-nvidia-mig-technology/) | Yes | Yes | Yes | +| Bandwidth | 20 Gbps | 20 Gbps | 20 Gbps | +| Network technology | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | +| Better used for | LLM fine-tuning, LLM inference with lower quantization and/or larger parameter counts, fast computer vision training model training | LLM fine-tuning, LLM inference with lower quantization and/or larger parameter counts, fast computer vision training model training | Llama 4 or Deepseek R1 inference | +| What they are not made for | Training of LLM (single node), Graphic or video encoding use cases | Training of LLM (single node), Graphic or video encoding use cases | Training of LLM (single node), Graphic or video encoding use cases | | | **[L4-1-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | **[L4-2-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | **[L4-4-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | **[L4-8-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | |---------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| diff --git a/pages/gpu/reference-content/gpu-instances-bandwidth-overview.mdx b/pages/gpu/reference-content/gpu-instances-bandwidth-overview.mdx index 787037415e..e6e7a4774c 100644 --- a/pages/gpu/reference-content/gpu-instances-bandwidth-overview.mdx +++ b/pages/gpu/reference-content/gpu-instances-bandwidth-overview.mdx @@ -35,7 +35,6 @@ GPU workloads often involve processing large datasets, requiring high-bandwidth | Instance Type | Internet Bandwidth | Block Bandwidth | |-------------------|-------------------------|---------------------| -| H100-SXM-1-80G | 10 Gbit/s | 5 GiB/s | | H100-SXM-2-80G | 20 Gbit/s | 5 GiB/s | | H100-SXM-4-80G | 20 Gbit/s | 5 GiB/s | | H100-SXM-8-80G | 20 Gbit/s | 5 GiB/s | diff --git a/pages/instances/faq.mdx b/pages/instances/faq.mdx index c9f267f510..af5a5c4a42 100644 --- a/pages/instances/faq.mdx +++ b/pages/instances/faq.mdx @@ -151,10 +151,9 @@ You can change the storage type and flexible IP after the Instance creation, whi | Range | Available in | Price | |-------------------|------------------------|-------------------| -| H100-SXM-1-80G | PAR2 | €X.XX/hour¹ | -| H100-SXM-2-80G | PAR2 | €X.XX/hour¹ | -| H100-SXM-4-80G | PAR2 | €X.XX/hour¹ | -| H100-SXM-8-80G | PAR2 | €X.XX/hour¹ | +| H100-SXM-2-80G | PAR2 | €6.018/hour¹ | +| H100-SXM-4-80G | PAR2 | €11.61/hour¹ | +| H100-SXM-8-80G | PAR2 | €23.028/hour¹ | | H100-1-80G | PAR2, WAW2 | €2.52/hour¹ | | H100-2-80G | PAR2, WAW2 | €5.04/hour¹ | | L40S-1-48G | PAR2 | €1.40/hour¹ | From a2173b9270673cdfe5666bd51d9d8ec7ccb36947 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 23 Apr 2025 17:12:30 +0200 Subject: [PATCH 4/5] Apply suggestions from code review Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- pages/gpu/how-to/use-scratch-storage-h100-instances.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx b/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx index 7cfccbcdef..a029d6b00a 100644 --- a/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx +++ b/pages/gpu/how-to/use-scratch-storage-h100-instances.mdx @@ -13,7 +13,7 @@ categories: - compute --- -Scaleway H100, H100-SXM and L40S GPU Instances are equipped with additional scratch storage. This form of temporary Local Storage operates differently from our regular Local Storage. +Scaleway H100, H100-SXM, and L40S GPU Instances are equipped with additional scratch storage. This form of temporary Local Storage operates differently from our regular Local Storage. Scratch storage temporarily accommodates data during computational or data processing tasks. It is commonly used for storing intermediate results, processing input data, or holding output data before that data is moved to more permanent storage. @@ -49,7 +49,7 @@ This enhancement allows us to provide the GPU with a substantial amount of scrat ## How can I add scratch storage to my GPU Instance using the Scaleway CLI or console? -Scratch storage is automatically added when creating H10, H100-SXM and L40S Instances. +Scratch storage is automatically added when creating H10, H100-SXM, and L40S Instances. ## How can I add scratch storage to my GPU Instance when using the API? You need to add an extra volume, for example: From 849db9738087e4063111ac4c911ac056dd1bb4a0 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Wed, 23 Apr 2025 17:15:44 +0200 Subject: [PATCH 5/5] Apply suggestions from code review Co-authored-by: Jessica <113192637+jcirinosclwy@users.noreply.github.com> --- pages/gpu/reference-content/choosing-gpu-instance-type.mdx | 2 +- pages/gpu/reference-content/understanding-nvidia-fp8.mdx | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pages/gpu/reference-content/choosing-gpu-instance-type.mdx b/pages/gpu/reference-content/choosing-gpu-instance-type.mdx index a270f0e496..f1c604f2bf 100644 --- a/pages/gpu/reference-content/choosing-gpu-instance-type.mdx +++ b/pages/gpu/reference-content/choosing-gpu-instance-type.mdx @@ -78,7 +78,7 @@ Remember that there is no one-size-fits-all answer, and the right GPU Instance t | Bandwidth | 20 Gbps | 20 Gbps | 20 Gbps | | Network technology | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | [NVLink](/gpu/reference-content/understanding-nvidia-nvlink/) | | Better used for | LLM fine-tuning, LLM inference with lower quantization and/or larger parameter counts, fast computer vision training model training | LLM fine-tuning, LLM inference with lower quantization and/or larger parameter counts, fast computer vision training model training | Llama 4 or Deepseek R1 inference | -| What they are not made for | Training of LLM (single node), Graphic or video encoding use cases | Training of LLM (single node), Graphic or video encoding use cases | Training of LLM (single node), Graphic or video encoding use cases | +| What they are not made for | Training of LLM (single node), graphic or video encoding use cases | Training of LLM (single node), Graphic or video encoding use cases | Training of LLM (single node), graphic or video encoding use cases | | | **[L4-1-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | **[L4-2-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | **[L4-4-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | **[L4-8-24G](https://www.scaleway.com/en/l4-gpu-instance/)** | |---------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| diff --git a/pages/gpu/reference-content/understanding-nvidia-fp8.mdx b/pages/gpu/reference-content/understanding-nvidia-fp8.mdx index 8d88cba22f..5eefd32736 100644 --- a/pages/gpu/reference-content/understanding-nvidia-fp8.mdx +++ b/pages/gpu/reference-content/understanding-nvidia-fp8.mdx @@ -13,7 +13,7 @@ categories: - compute --- -Scaleway offers GPU Instances featuring [L4, L40S, H100 and H100-SXM GPUs](/gpu/reference-content/choosing-gpu-instance-type/) that support FP8 (8-bit floating point), a revolutionary datatype introduced by NVIDIA. It enables higher throughput of matrix multipliers and convolutions. +Scaleway offers GPU Instances featuring [L4, L40S, H100, and H100-SXM GPUs](/gpu/reference-content/choosing-gpu-instance-type/) that support FP8 (8-bit floating point), a revolutionary datatype introduced by NVIDIA. It enables higher throughput of matrix multipliers and convolutions. FP8 is an 8-bit floating point standard which was jointly developed by NVIDIA, ARM, and Intel to speed up AI development by improving memory efficiency during AI training and inference processes.