From 37142820985a544f97a72534696e5c96e37070b6 Mon Sep 17 00:00:00 2001
From: Eric Harper <complex451@gmail.com>
Date: Wed, 12 Jun 2024 17:01:33 -0600
Subject: [PATCH] Update readme with mlperf news (#9457)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* remove link to image

Signed-off-by: eharper <eharper@nvidia.com>

* remove link to image

Signed-off-by: eharper <eharper@nvidia.com>

* fix formatting

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
---
 README.rst | 122 ++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 88 insertions(+), 34 deletions(-)
diff --git a/README.rst b/README.rst
index c4cbf759d975..ab3a4b6b06c9 100644
--- a/README.rst
+++ b/README.rst
@@ -45,58 +45,112 @@ Latest News
 
   <details open>
     <summary><b>Large Language Models and Multimodal</b></summary>
-        <details>
-          <summary><a href="https://cloud.google.com/blog/products/compute/gke-and-nvidia-nemo-framework-to-train-generative-ai-models">Accelerate your generative AI journey with NVIDIA NeMo Framework on GKE</a> (2024/03/16) </summary>
+      <details>
+        <summary>
+          <a href="https://developer.nvidia.com/blog/nvidia-sets-new-generative-ai-performance-and-scale-records-in-mlperf-training-v4-0/">
+            NVIDIA sets new generative AI performance and scale records in MLPerf Training v4.0
+          </a> (2024/06/12)
+        </summary>
+
+        Using NVIDIA NeMo Framework and NVIDIA Hopper GPUs NVIDIA was able to scale to 11,616 H100 GPUs and achieve near-linear performance scaling on LLM pretraining. 
+        NVIDIA also achieved the highest LLM fine-tuning performance and raised the bar for text-to-image training.
+        <br><br>
+      </details>
 
-          An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework.
+      <details>
+          <summary>
+            <a href="https://cloud.google.com/blog/products/compute/gke-and-nvidia-nemo-framework-to-train-generative-ai-models">
+              Accelerate your generative AI journey with NVIDIA NeMo Framework on GKE
+            </a> (2024/03/16)
+          </summary>
+
+          An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. 
+          The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework.
           <br><br>
         </details>
 
       <details>
-        <summary><a href="https://blogs.nvidia.com/blog/bria-builds-responsible-generative-ai-using-nemo-picasso/">Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso</a> (2024/03/06) </summary>
-
-        Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference.
+        <summary>
+          <a href="https://blogs.nvidia.com/blog/bria-builds-responsible-generative-ai-using-nemo-picasso/">
+            Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso
+          </a> (2024/03/06)
+        </summary>
+
+        Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. 
+        The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. 
+        Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference.
         <br><br>
-    </details>
-
-    <details>
-      <summary><a href="https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility/">New NVIDIA NeMo Framework Features and NVIDIA H200</a> (2023/12/06) </summary>
+      </details>
 
-      NVIDIA NeMo Framework now includes several optimizations and enhancements, including: 1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs.
-      <br><br>
-      <a href="https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility"><img src="https://github.com/sbhavani/TransformerEngine/blob/main/docs/examples/H200-NeMo-performance.png" alt="H200-NeMo-performance" style="width: 600px;"></a>
-      <br><br>
-    </details>
-
-    <details>
-      <summary><a href="https://blogs.nvidia.com/blog/nemo-amazon-titan/">NVIDIA now powers training for Amazon Titan Foundation models</a> (2023/11/28) </summary>
+      <details>
+        <summary>
+          <a href="https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility/">
+            New NVIDIA NeMo Framework Features and NVIDIA H200
+          </a> (2023/12/06)
+        </summary>
+
+        NVIDIA NeMo Framework now includes several optimizations and enhancements, 
+        including: 
+        1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 
+        2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 
+        3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 
+        4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs.
+        <br><br>
+        <a href="https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility">
+        <img src="https://github.com/sbhavani/TransformerEngine/blob/main/docs/examples/H200-NeMo-performance.png" alt="H200-NeMo-performance" style="width: 600px;"></a>
+        <br><br>
+      </details>
 
-      NVIDIA NeMo Framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. The NeMo Framework provides a versatile framework for building, customizing, and running LLMs.
-      <br><br>
-    </details>
+      <details>
+        <summary>
+          <a href="https://blogs.nvidia.com/blog/nemo-amazon-titan/">
+            NVIDIA now powers training for Amazon Titan Foundation models
+          </a> (2023/11/28)
+        </summary>
+
+        NVIDIA NeMo Framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). 
+        The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. 
+        The NeMo Framework provides a versatile framework for building, customizing, and running LLMs.
+        <br><br>
+      </details>
 
   </details>
 
   <details open>
     <summary><b>Speech Recognition</b></summary>
-        <details>
-          <summary><a href="https://developer.nvidia.com/blog/new-standard-for-speech-recognition-and-translation-from-the-nvidia-nemo-canary-model/">New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model</a> (2024/04/18) </summary>
-
-          The NeMo team just released Canary, a multilingual model that transcribes speech in English, Spanish, German, and French with punctuation and capitalization. Canary also provides bi-directional translation, between English and the three other supported languages.
-          <br><br>
-        </details>
-
       <details>
-        <summary><a href="https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/">Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models</a> (2024/04/18) </summary>
+        <summary>
+          <a href="https://developer.nvidia.com/blog/new-standard-for-speech-recognition-and-translation-from-the-nvidia-nemo-canary-model/">
+            New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model
+          </a> (2024/04/18)
+        </summary>
+
+        The NeMo team just released Canary, a multilingual model that transcribes speech in English, Spanish, German, and French with punctuation and capitalization. 
+        Canary also provides bi-directional translation, between English and the three other supported languages.
+        <br><br>
+      </details>
 
-        NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere—on any cloud and on-premises—released the Parakeet family of automatic speech recognition (ASR) models. These state-of-the-art ASR models, developed in collaboration with Suno.ai, transcribe spoken English with exceptional accuracy.
+      <details>
+        <summary>
+          <a href="https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/">
+            Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models
+          </a> (2024/04/18)
+        </summary>
+
+        NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere—on any cloud and on-premises—released the Parakeet family of automatic speech recognition (ASR) models. 
+        These state-of-the-art ASR models, developed in collaboration with Suno.ai, transcribe spoken English with exceptional accuracy.
         <br><br>
-    </details>
+      </details>
 
     <details>
-      <summary><a href="https://developer.nvidia.com/blog/turbocharge-asr-accuracy-and-speed-with-nvidia-nemo-parakeet-tdt/">Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT</a> (2024/04/18) </summary>
-
-      NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere—on any cloud and on-premises—recently released Parakeet-TDT. This new addition to the  NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B.
+      <summary>
+        <a href="https://developer.nvidia.com/blog/turbocharge-asr-accuracy-and-speed-with-nvidia-nemo-parakeet-tdt/">
+          Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT
+        </a> (2024/04/18)
+      </summary>
+
+      NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere—on any cloud and on-premises—recently released Parakeet-TDT. 
+      This new addition to the  NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B.
       <br><br>
     </details>