diff --git a/chapters/introduction/index.mdx b/chapters/introduction/index.mdx
index e73570f..afd48fc 100644
--- a/chapters/introduction/index.mdx
+++ b/chapters/introduction/index.mdx
@@ -10,11 +10,13 @@ import IntegrationsCards from '/snippets/integrations-cards.mdx';
   className="block dark:hidden rounded-lg" 
   noZoom
   src="/assets/dark-banner.png" 
+  alt="Gladia documentation hero banner in the light theme."
 />
 <img 
   className="hidden dark:block rounded-lg" 
   noZoom
   src="/assets/light-banner.png"
+  alt="Gladia documentation hero banner in the dark theme."
 />
 
 
@@ -61,4 +63,4 @@ in both Real-time and asynchronous ways, with audio intelligence tools to extrac
 # Our integration partners
 
 <IntegrationsCards />
------
\ No newline at end of file
+-----
diff --git a/chapters/pre-recorded-stt/benchmarking.mdx b/chapters/pre-recorded-stt/benchmarking.mdx
new file mode 100644
index 0000000..fe42614
--- /dev/null
+++ b/chapters/pre-recorded-stt/benchmarking.mdx
@@ -0,0 +1,166 @@
+---
+title: Benchmarking
+description: A practical guide to benchmarking speech-to-text accuracy — from defining goals to choosing datasets, normalizing transcripts, computing WER, and interpreting results.
+---
+
+Benchmarking speech-to-text systems is easy to get wrong.
+Small methodology changes can produce large swings in reported quality, which makes comparisons misleading.
+
+## Benchmarking at a glance
+
+<Steps>
+  <Step title="Define your goal" >
+    Decide what "good" means for your product before comparing systems.
+  </Step>
+  <Step title="Choose the right dataset" >
+    Benchmark on audio that matches your real traffic and target users. 
+  </Step>
+  <Step title="Normalize transcripts" >
+    Normalize both references and hypothesis outputs before computing WER. 
+  </Step>
+  <Step title="Compute WER" >
+    Measure substitutions, deletions, and insertions on normalized text. 
+  </Step>
+  <Step title="Interpret results carefully" >
+    Look beyond one average score and inspect meaningful slices. 
+  </Step>
+</Steps>
+
+## 1. Define your evaluation goal
+
+Before comparing providers and models, the first step is to define which aspects of performance matter most for your use case.
+
+Below are **examples of performance aspects that would be more weighted for domain applications of speech to text**:
+
+- <u>Accuracy on noisy backgrounds</u>: for contact centers, telephony, and field recordings.
+- <u>Speaker diarization quality</u>: for meeting assistants and multi-speaker calls.
+- <u>Named entity accuracy</u>: for workflows that extract people, organizations, phone numbers, or addresses.
+- <u>Domain-specific vocabulary handling</u>: for medical, legal, or financial transcription.
+- <u>Timestamp accuracy</u>: for media workflows that need readable, well-timed captions.
+- <u>Filler-word handling</u>: for agentic workflows .
+
+Those choices shape every downstream decision: which dataset to use, which normalization rules to apply, and which metrics to report.
+
+
+If your benchmark does not reflect your real traffic, the result will not tell you much about production performance.
+
+## 2. Choose the right dataset
+
+The right dataset depends on the use case and traffic shape you want to measure.
+You of course wouldn't want to be benchmarking call-center audio with clean podcast recordings.
+
+So pick audio that matches your real traffic along these axes:
+
+- <u>Language</u>: target language(s), accents, code-switching frequency.
+- <u>Audio quality</u>: noisy field recordings, telephony, studio, or browser microphone.
+- <u>Topics and domain</u>: medical, financial, operational, legal, etc.
+- <u>Typical words that matter</u>: numbers, proper nouns, acronyms, domain-specific terms.
+- <u>Interaction pattern</u>: single-speaker dictation, dialogue, multi-speaker meetings, or long-form recordings.
+
+Use transcripts that are strong enough to serve as ground truth, and prefer a mix of:
+ - public datasets (for comparability and immediate availability)
+ - private in-domain datasets, when available, to ensure no data is "spoiled" by some speech-to-text providers training their models on the very datasets you're benchmarking. 
+
+<Tip>
+  Your favorite LLM with internet access is be very effective at finding public datasets that match your use case.
+</Tip>
+
+
+## 3. Normalize transcripts before computing WER
+
+Normalization removes surface-form differences (casing, abbreviations, numeric rendering) so you compare apples to apples when judging transcription output.
+
+| Reference | Prediction | Why raw WER is wrong |
+|-----------|------------|----------------------|
+| `It's $50` | `it is fifty dollars` | Contraction and currency formatting differ, but the semantic content is the same. |
+| `Meet at Point 14` | `meet at point fourteen` | The normalization should preserve the numbered entity instead of collapsing it into an unrelated form. |
+| `Mr. Smith joined at 3:00 PM` | `mister smith joined at 3 pm` | Honorific and timestamp formatting differ, but the transcript content is equivalent. |
+
+One common limitation is "Whisper-style normalization" (OpenAI, 2022): implemented in packages like [`whisper-normalizer`](https://pypi.org/project/whisper-normalizer/). It does not affect numbers, and applies aggressive lowercasing and punctuation stripping.
+
+Gladia's recommended approach is [`gladia-normalization`](https://github.com/gladiaio/normalization), our open-source library designed for transcript evaluation:
+
+- `It's $50` -> `it is 50 dollars`
+- `Meet at Point 14` -> `meet at point 14`
+- `Mr. Smith joined at 3:00 PM` -> `mister smith joined at 3 pm`
+
+<Card
+  title="gladia-normalization"
+  icon="github"
+  href="https://github.com/gladiaio/normalization"
+>
+  Open-source transcript normalization library used before WER computation.
+</Card>
+
+```python
+from normalization import load_pipeline
+
+pipeline = load_pipeline("gladia-3", language="en")
+
+reference = "Meet at Point 14. It's $50 at 3:00 PM."
+prediction = "meet at point fourteen it is fifty dollars at 3 pm"
+
+normalized_reference = pipeline.normalize(reference)
+normalized_prediction = pipeline.normalize(prediction)
+```
+
+<Tip>
+  Always apply the same normalization pipeline to **both** the reference transcript **and** every hypothesis output you compare. Changing the normalization rules between vendors — or forgetting to normalize one side — invalidates the benchmark.
+</Tip>
+
+## 4. Compute WER correctly
+
+Word Error Rate measures the edit distance between a reference transcript and a predicted transcript at the word level.
+
+The standard formula is:
+
+```text
+WER = (S + D + I) / N
+```
+
+Where:
+
+- `S` = substitutions
+- `D` = deletions
+- `I` = insertions
+- `N` = number of words in the reference transcript
+
+Lower is better. In practice:
+
+1. Prepare a reference transcript for each audio sample.
+2. Run each provider on the exact same audio.
+3. Normalize both the reference and each prediction with the same pipeline.
+4. Compute WER on the normalized outputs.
+5. Aggregate results across the full dataset.
+
+<Warning>
+  Do not compute WER on raw transcripts if providers format numbers, punctuation, abbreviations, or casing differently. That mostly measures formatting conventions, not recognition quality.
+</Warning>
+
+<Tip>
+  Inspect your reference transcripts carefully before computing WER. If a reference contains text that is not actually present in the audio, for example an intro such as "this audio is a recording of...", it can make WER look much worse across all providers.
+</Tip>
+
+## 5. Interpret results carefully
+
+Do not stop at a single WER number. Review:
+
+- overall average WER
+- median WER and spread across files
+- breakdowns by language, domain, or audio condition
+- failure modes on proper nouns, acronyms, and numbers
+- whether differences are consistent or concentrated in a few hard samples
+
+Two systems can post similar average WER while failing on different error classes. Separate statistically meaningful gaps from noise introduced by dataset composition or normalization choices.
+
+If two systems are close, inspect actual transcript examples before drawing strong conclusions.
+
+## Common pitfalls
+
+- Comparing providers on different datasets
+- Using low-quality or inconsistent ground truth
+- Treating punctuation and formatting differences as recognition errors
+- Drawing conclusions from too few samples
+- Reporting one average score without any slice analysis
+- Not inspecting the reference transcript: if it contains text not present in the audio, for example an intro like "this audio is a recording of...", it will inflate WER across all providers
+- Not experimenting with provider configurations: for example, using Gladia's [custom vocabulary](/chapters/audio-intelligence/custom-vocabulary) to improve proper noun accuracy, then comparing against the ground truth
diff --git a/docs.json b/docs.json
index ca37881..84e2c39 100644
--- a/docs.json
+++ b/docs.json
@@ -44,6 +44,7 @@
                   "chapters/pre-recorded-stt/features/sentences"
                 ]
               },
+              "chapters/pre-recorded-stt/benchmarking",
               {
                 "group": "Live Transcription",
                 "expanded": false,
diff --git a/snippets/get-transcription-result.mdx b/snippets/get-transcription-result.mdx
index 5cfc609..53f360f 100644
--- a/snippets/get-transcription-result.mdx
+++ b/snippets/get-transcription-result.mdx
@@ -12,7 +12,7 @@ You can get your transcription results in **3 different ways**:
   </Accordion>
   <Accordion icon="webhook" title="Webhook">
   You can configure webhooks at https://app.gladia.io/webhooks to be notified when your transcriptions are done.
-  <img src='/assets/images/webhooks-1.png'/>
+  <img src='/assets/images/webhooks-1.png' alt="Gladia dashboard webhook settings page for configuring transcription notifications." />
   Once a transcription is done, a `POST` request will be made to the endpoint you configured. The request body is a JSON object containing the transcription `id` that you can use to retrieve your result with [our API](/api-reference/v2/pre-recorded/get).  
   For the full body definition, check [our API definition](/api-reference/v2/pre-recorded/webhook/success).  
   </Accordion>
diff --git a/snippets/getting-started-playground.mdx b/snippets/getting-started-playground.mdx
index 6483a30..01263e9 100644
--- a/snippets/getting-started-playground.mdx
+++ b/snippets/getting-started-playground.mdx
@@ -8,7 +8,7 @@ audio transcription.
     Choose your audio source (stream from you microphone, or upload a local file)
 
     <Frame>
-      <img src="/assets/images/playground-1.png" />
+      <img src="/assets/images/playground-1.png" alt="Gladia playground step showing audio source selection options." />
     </Frame>
 
     Then proceed to the next step.
@@ -25,7 +25,7 @@ audio transcription.
     </Note>
 
     <Frame>
-      <img src="/assets/images/playground-2.png" />
+      <img src="/assets/images/playground-2.png" alt="Gladia playground feature selection screen with transcription options enabled." />
     </Frame>
   </Step>
 
@@ -35,7 +35,7 @@ audio transcription.
       Text in italic in the transcription represents [partials transcripts](/chapters/live-stt/features#partial-transcripts).
     </Note>
     <Frame>
-      <img src="/assets/images/playground-3.png" />
+      <img src="/assets/images/playground-3.png" alt="Gladia playground live transcription screen after starting capture." />
     </Frame>
   </Step>
 
@@ -44,7 +44,7 @@ audio transcription.
     the result in JSON format (the one you'd get with an API call).
 
     <Frame>
-      <img src="/assets/images/playground-4.png" />
+      <img src="/assets/images/playground-4.png" alt="Gladia playground transcription results view with formatted transcript and JSON output." />
     </Frame>
   </Step>
 </Steps>
diff --git a/snippets/setup-account.mdx b/snippets/setup-account.mdx
index 438842d..62055fc 100644
--- a/snippets/setup-account.mdx
+++ b/snippets/setup-account.mdx
@@ -10,9 +10,9 @@ Now that you signed up, login to app.gladia.io and go to the [API keys section](
 created a default key for you. You can use this one or create your own.
 
 <Frame>
-  <img src='/assets/images/account-api-keys.png'/>
+  <img src='/assets/images/account-api-keys.png' alt="Gladia dashboard API keys page showing the location of the default key." />
 </Frame>
 
 <Tip>Gladia offers 10 Hours of free audio transcription per month if you want to test the service!</Tip>
 
-With your API key, you're now ready to use Gladia APIs.
\ No newline at end of file
+With your API key, you're now ready to use Gladia APIs.
diff --git a/style.css b/style.css
index f3d4c05..90c4cdb 100644
--- a/style.css
+++ b/style.css
@@ -28,4 +28,18 @@
 .prose pre {
   max-width: 100%;
   overflow: auto;
-}
\ No newline at end of file
+}
+
+.benchmark-status-pill {
+  display: inline-flex;
+  align-items: center;
+  padding: 0.2rem 0.5rem;
+  border-radius: 999px;
+  border: 1px solid rgba(46, 52, 62, 0.18);
+  background: rgba(46, 52, 62, 0.08);
+  color: #2e343e;
+  font-size: 0.85rem;
+  font-weight: 600;
+  line-height: 1.2;
+  white-space: nowrap;
+}