# 15 – Case Study: Sugar Reduction (Biomarker Underutilised)

**Learning Objectives:**
- Understand that a biomarker for sugar intake EXISTS but has not been widely deployed
- Compare evaluation approaches for the Soft Drinks Industry Levy (SDIL)
- Critically assess the evidence hierarchy: purchase data, dietary surveys, modelled estimates
- Appreciate the role of uncertainty in policy evaluation
- Contrast the sugar and salt reduction programmes

---

## 1. Introduction: The Sugar Challenge

### A Common Misconception

It is often stated that sugar intake cannot be measured with a biomarker. **This is incorrect.**

A validated biomarker for total sugars intake exists:
- **24-hour urinary sucrose + fructose** (Tasevska et al., 2005)
- Validated in controlled feeding studies
- Successfully used in EPIC-Norfolk and Health Survey for England

### So Why Hasn't It Been Used?

Unlike the UK salt reduction programme, which incorporated regular biomarker monitoring, sugar reduction policies have **not** routinely used the urinary sugars biomarker. This is largely due to:

1. **Logistical challenges**: 24-hour urine collection is burdensome
2. **Cost**: Laboratory analysis at population scale is expensive
3. **Historical timing**: The biomarker was validated (2005) but sugar policy only became a priority later (2015+)
4. **Policy design**: SDIL focused on reformulation, not population intake monitoring

This represents a **missed opportunity** for objective evaluation.

## 2. Setup

In [None]:
# ============================================================
# Bootstrap cell
# ============================================================

import os
import sys
import pathlib
import subprocess

REPO_URL = "https://github.com/ggkuhnle/phn-epi.git"
REPO_DIR = "phn-epi"

cwd = pathlib.Path.cwd()

if (cwd / "scripts" / "epi_utils.py").is_file():
    repo_root = cwd
elif (cwd.parent / "scripts" / "epi_utils.py").is_file():
    repo_root = cwd.parent
else:
    repo_root = cwd / REPO_DIR
    if not repo_root.is_dir():
        print(f"Cloning repository from {REPO_URL} ...")
        subprocess.run(["git", "clone", REPO_URL, str(repo_root)], check=True)
    else:
        print(f"Using existing repository at {repo_root}")
    os.chdir(repo_root)
    repo_root = pathlib.Path.cwd()

scripts_dir = repo_root / "scripts"
if str(scripts_dir) not in sys.path:
    sys.path.insert(0, str(scripts_dir))

print(f"Repository root: {repo_root}")
print("Bootstrap completed successfully.")

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = [10, 6]
np.random.seed(42)

print("Libraries loaded successfully.")

## 3. The Urinary Sugars Biomarker

### How It Works

- Small amounts of dietary sucrose and fructose are excreted intact in urine
- The sum of 24-hour urinary sucrose and fructose correlates with total sugars intake
- A calibration equation converts biomarker values to estimated intake

### Key Research

| Study | Finding |
|-------|--------|
| Tasevska et al. (2005) | Validated biomarker in controlled feeding study |
| Joosen et al. (2008) | Confirmed validity in obese individuals |
| Kuhnle et al. (2015) | Association with overweight/obesity in EPIC-Norfolk |
| Campbell et al. (2017) | Association with obesity in nationally-representative HSE sample |

In [None]:
# Data from Campbell et al. (2017) - HSE 2005
# Source: Campbell R et al. (2017). PLoS ONE 12(7): e0179508

hse_biomarker_data = pd.DataFrame({
    'Measure': ['Estimated sugar intake (self-report, NDNS)', 
                'Estimated sugar intake (biomarker, HSE)'],
    'Women (g/day)': [78, 117],
    'Men (g/day)': [107, 162]
})

print("Sugar Intake: Self-Report vs Biomarker Estimates")
print("=" * 60)
display(hse_biomarker_data)
print("\nBiomarker estimates are ~50% higher than self-reported intakes.")
print("This suggests substantial underreporting of sugar intake.")
print("\nSource: Campbell et al. (2017). PLoS ONE.")

In [None]:
# Association between biomarker-estimated sugar intake and obesity
# Source: Campbell et al. (2017) Table 2

obesity_or = pd.DataFrame({
    'Obesity measure': ['BMI >= 30', 'Waist circumference', 'Waist-to-hip ratio'],
    'OR per 10g sugar': [1.02, 1.03, 1.04],
    '95% CI lower': [1.00, 1.01, 1.02],
    '95% CI upper': [1.04, 1.05, 1.06],
    'p-value': ['<0.05', '<0.01', '<0.001']
})

print("Association: Biomarker Sugar Intake and Obesity Risk")
print("=" * 60)
display(obesity_or)
print("\nInterpretation: Each 10g/day increase in sugar intake is associated")
print("with 2-4% higher odds of obesity.")
print("\nSource: Campbell et al. (2017). PLoS ONE.")

## 4. The Soft Drinks Industry Levy (SDIL)

The SDIL uses a **tiered structure**:

| Sugar content | Levy rate |
|---------------|----------|
| < 5g per 100ml | £0.00 |
| 5-8g per 100ml | £0.18 per litre |
| > 8g per 100ml | £0.24 per litre |

*Source: HMRC (2018). Soft Drinks Industry Levy.*

In [None]:
# Product reformulation data
# Source: Public Health England (2020). Sugar reduction: progress report 2015-2019

reformulation_data = pd.DataFrame({
    'Period': ['Pre-announcement (2015)', 'Post-announcement (2017)', 'Post-implementation (2019)'],
    'Mean sugar (g/100ml)': [4.4, 3.5, 2.9],
    'Products in high levy band (%)': [52, 38, 15],
    'Products in zero levy band (%)': [39, 49, 73]
})

print("Soft Drink Reformulation")
print("=" * 60)
display(reformulation_data)
print("\nSource: PHE (2020). Sugar reduction progress report.")

In [None]:
# Visualise reformulation
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

ax = axes[0]
colors = ['#e74c3c', '#f39c12', '#27ae60']
bars = ax.bar(range(3), reformulation_data['Mean sugar (g/100ml)'], color=colors)
ax.set_xticks(range(3))
ax.set_xticklabels(['2015', '2017', '2019'])
ax.set_ylabel('Mean sugar (g/100ml)')
ax.set_title('Average Sugar Content in Soft Drinks')
ax.set_ylim(0, 5)
for i, v in enumerate(reformulation_data['Mean sugar (g/100ml)']):
    ax.text(i, v + 0.1, f'{v}g', ha='center', fontweight='bold')

ax = axes[1]
width = 0.35
x = np.arange(3)
ax.bar(x - width/2, reformulation_data['Products in high levy band (%)'], width, 
       label='High levy (>8g)', color='#e74c3c')
ax.bar(x + width/2, reformulation_data['Products in zero levy band (%)'], width, 
       label='Zero levy (<5g)', color='#27ae60')
ax.set_xticks(x)
ax.set_xticklabels(['2015', '2017', '2019'])
ax.set_ylabel('Percentage of products')
ax.set_title('Distribution Across Levy Bands')
ax.legend()

plt.tight_layout()
plt.show()

print("Key finding: Substantial reformulation occurred, mostly before implementation.")

## 5. Comparing Salt and Sugar Programme Evaluation

### Why Was Salt Easier to Evaluate?

| Factor | Salt Programme | Sugar Programme |
|--------|---------------|----------------|
| **Biomarker used?** | Yes (24-h urinary Na) | No (despite availability) |
| **Regular monitoring** | FSA/PHE surveys 2000-2019 | No systematic biomarker surveys |
| **Intake data quality** | High (objective) | Low (self-report, purchase data) |
| **Policy focus** | Population intake | Product reformulation |

In [None]:
# Evidence confidence by outcome level
evidence_layers = [
    ('Product reformulation', 0.9, 'Direct measurement of products'),
    ('Household purchases', 0.7, 'Kantar panel data'),
    ('Individual intake (self-report)', 0.4, 'NDNS dietary surveys'),
    ('Individual intake (biomarker)', 0.85, 'AVAILABLE but not deployed'),
    ('Weight/BMI change', 0.2, 'Confounding, secular trends'),
    ('Disease outcomes', 0.1, 'Decades away')
]

fig, ax = plt.subplots(figsize=(10, 5))

labels = [e[0] for e in evidence_layers]
confidence = [e[1] for e in evidence_layers]
notes = [e[2] for e in evidence_layers]

colors = plt.cm.RdYlGn(np.array(confidence))
bars = ax.barh(labels, confidence, color=colors)

# Highlight the biomarker bar
bars[3].set_edgecolor('blue')
bars[3].set_linewidth(3)

ax.set_xlim(0, 1)
ax.set_xlabel('Confidence in evidence')
ax.set_title('SDIL Evaluation: Evidence Confidence by Outcome Level')

plt.tight_layout()
plt.show()

print("Note: The urinary sugars biomarker (blue outline) could provide high-confidence")
print("evidence of intake change, but has not been deployed for routine monitoring.")

## 6. The Counterfactual Problem

Without biomarker data, we must infer intake from indirect sources. This introduces the **counterfactual problem**: What would have happened without the SDIL?

In [None]:
# Different counterfactual scenarios
# Based on: Pell D et al. (2021). BMJ 372:n254

years = np.arange(2014, 2022)
observed = np.array([155, 150, 140, 125, 105, 100, 98, 95])

counterfactuals = {
    'Flat trend': np.array([155, 155, 155, 155, 155, 155, 155, 155]),
    'Slow decline (-2%/yr)': 155 * (0.98 ** np.arange(8)),
    'Moderate decline (-3%/yr)': 155 * (0.97 ** np.arange(8)),
}

fig, ax = plt.subplots(figsize=(12, 6))

ax.plot(years, observed, 'ko-', linewidth=3, markersize=10, label='Observed')

colors = ['#3498db', '#e74c3c', '#2ecc71']
for i, (name, cf) in enumerate(counterfactuals.items()):
    ax.plot(years, cf, '--', linewidth=2, color=colors[i], label=f'Counterfactual: {name}')

ax.axvline(x=2018.25, color='red', linestyle=':', linewidth=2)
ax.text(2018.4, 160, 'SDIL', fontsize=10, color='red')
ax.set_xlabel('Year', fontsize=12)
ax.set_ylabel('Sugar from soft drinks (g/hh/week)', fontsize=12)
ax.set_title('The Counterfactual Problem', fontsize=13)
ax.legend(loc='upper right')

plt.tight_layout()
plt.show()

print("Estimated effect in 2021 depends on counterfactual assumption:")
for name, cf in counterfactuals.items():
    effect = cf[-1] - observed[-1]
    print(f"  {name}: {effect:.0f} g/hh/week reduction")

## 7. Discussion Questions

1. **Missed opportunity**: Why do you think the urinary sugars biomarker was not incorporated into SDIL evaluation? What would it have taken to do so?

2. **Reformulation paradox**: If manufacturers reformulated before implementation, the levy raised less revenue. Is this success or failure?

3. **Future policy**: If you were designing a new sugar reduction policy, would you incorporate biomarker monitoring? What would be the costs and benefits?

4. **Evidence standards**: Should we require biomarker evidence before concluding that a dietary policy has worked?

## 8. Exercises

### Exercise 1: Design a Monitoring Programme

Design a biomarker-based monitoring programme for sugar intake:
- What sample size would you need?
- How often would you survey?
- What would be the approximate cost?
- What are the barriers to implementation?

In [None]:
# YOUR DESIGN HERE



---

## Summary

- A **validated biomarker** for sugar intake exists (urinary sucrose + fructose)
- This biomarker has **not** been routinely used for policy evaluation
- SDIL evaluation relies on product data and purchase data, not intake biomarkers
- The counterfactual problem introduces substantial uncertainty
- This represents a **missed opportunity** for objective policy evaluation

---

## References

- Tasevska N et al. (2005). Urinary sucrose and fructose as biomarkers for sugar consumption. *Cancer Epidemiol Biomarkers Prev.*
- Joosen AMCP et al. (2008). Urinary sucrose and fructose as biomarkers: comparison of normal weight and obese volunteers. *Int J Obes.*
- Kuhnle GGC et al. (2015). Association between sucrose intake and risk of overweight and obesity in EPIC-Norfolk. *Public Health Nutr.*
- Campbell R et al. (2017). Association between urinary biomarkers of total sugars intake and measures of obesity. *PLoS ONE.*
- Pell D et al. (2021). Changes in soft drinks purchased by British households associated with the SDIL. *BMJ.*
- PHE (2020). Sugar reduction: progress report 2015-2019.