# Hands-on Session: Benchmarking the Energy Consumption of Image Generation

---

**Make sure you switch the notebook runtime to GPU!**

The tutorial will work with the default GPU runtime with NVIDIA T4.

---

We'll be using Hugging Face Diffusers to run a small SD Turbo diffusion model for text-to-image.

In [None]:
import time
import logging
logging.basicConfig(level=logging.INFO, format="[%(asctime)s] [%(name)s:%(lineno)d] %(message)s")
logging.getLogger("zeus").setLevel(logging.INFO)

import torch
import matplotlib.pyplot as plt
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16").to("cuda")

## Installing Zeus

- GitHub: https://github.com/ml-energy/zeus
- Documentation: https://ml.energy/zeus

In [None]:
%%capture
!pip install zeus

## Setting Up Measurement

Zeus provides a few convenient classes for measurement.

### `ZeusMonitor`

The main `ZeusMonitor` class lets you measure the energy consumption of arbitrary *ranges* or *windows* of code. You let the monitor know the range of code by beginning and ending a **measurement window** under a consistent name.

```python
monitor.begin_window("much compute")
# Much
# Compute
measurement = monitor.end_window("much compute")
```

These windows can be **nested** arbitrarily, and With `str`-based window names, you can just pass around the `ZeusMonitor` object everywhere and measure any range of execution as needed.

### `PowerMonitor`

This is another convenience class that measures the power draw of the GPU over time in the background. Just instantiating the class will automatically start power monitoring, and you can later query power draw over any range of time.

```python
# GPU index -> List of (timestamp, power)
timeline: dict[int, list[tuple[float, float]]] = power_monitor.get_power_timeline("device_instant")
```

In [None]:
from zeus.monitor import ZeusMonitor, PowerMonitor

# Collects device power consumption in the background
power_monitor = PowerMonitor()

# Measures time and energy within "windows" of execution
monitor = ZeusMonitor()

## Exercise: Measuring Energy Consumption

Our goal is to measure the **energy consumption of generating one image**.

An important parameter that affects this is inference *batch size*: the number of images generated at the same time. So we want to normalize/average correctly.

In [None]:
prompt = "A cinematic scene of Zeus throwing lightning"
num_repeats = 5
batch_sizes = [1, 2, 4, 8]
images = []
latency_measurements = []
energy_measurements = []
start_time = time.time()
monitor.reset_windows()

monitor.begin_window("whole benchmark")

# Warm up
for _ in range(5):
    _ = pipe(prompt=[prompt] * 4, num_inference_steps=1, guidance_scale=0.0)

# Measurement
for batch_size in [1, 2, 4, 8]:

    monitor.begin_window("image generation")

    # Run `num_repeats` repetitions for stable measurement
    for _ in range(num_repeats):
        output = pipe(prompt=[prompt] * batch_size, num_inference_steps=4, guidance_scale=0.0)
        images.extend(output.images)

    ###################### FIXME #######################
    # The monitor has to know from where to where we're
    # trying to measure the energy of.
    measurement = monitor.end_window("totally correct window name")
    ####################################################

    latency = measurement.time / num_repeats

    ################################ FIXME ###############################
    # We want to know how much energy was consumed per generated image.
    energy_per_image = measurement.total_energy
    ######################################################################

    latency_measurements.append(latency)
    energy_measurements.append(energy_per_image)
    print(f"batch size: {batch_size}, generation time: {latency} s, energy per generation: {energy_per_image:.2f} J")

measurement = monitor.end_window("whole benchmark")
print(f"The whole benchmark consumed an aggregate of {measurement.total_energy} Joules.")
power_timeline = power_monitor.get_all_power_timelines(start_time=start_time - 5.0)

# Draw images
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(5, 5), tight_layout=True)
axes = axes.flatten()
for ax, img in zip(axes, images):
    ax.imshow(img)
    ax.axis("off")

# Validation
if energy_measurements[0] < energy_measurements[-1]:
    print("\nOops, wanna double check the second FIXME?\nThe monitor measures the energy consumption of `batch_size` many image generations.\n")
    raise AssertionError
elif not all(energy < 100.0 for energy in energy_measurements):
    print("\nOops, wanna double check the second FIXME?\nWe measured running the same image generation batch `num_repeats` times.\n")
    raise AssertionError
else:
    print("\nLooks good! ðŸŽ‰\n")

## Understanding the Measurements

Let's look at how our measurements look like. First run the cell below.

### Plot 1: Changing Batch Size

How does changing the batch size impact time and energy?

**Interpretations:**
- With a larger batch size, latency increases because the amount of computation increases.
- Energy consumption **per image** decreases because the GPUâ€™s fixed costs (e.g., idle/static power, scheduling overhead) are amortized better over more parallel image generations.

### Plot 2: Power Over Time

We can visualize the `PowerMonitor`â€™s power draw timeline to better understand how GPU power usage behaves during image generation. Power readings tend to fluctuate significantly over time, especially for instantaneous measurements. Overall, the power draw pattern reflects how actively the GPU is being utilized during computation.  

**Interpretations:**  
- During image generation, the power draw stays close to **70 W**, which matches the **Thermal Design Power (TDP)** of the NVIDIA T4 GPU provided by Colab.
- Even with a batch size of 1, the GPU is already quite well utilized (though not fully saturated) because T4 is a small GPU, and thus exhibits quite high power draw.  
- Instant power draw can **exceeds 70 W** for short bursts. This is normal, since **TDP does not mean a hard cap** but a specification indicating that *the windowed average power draw* will not exceed 70 W.

In [None]:
# Plotting
fig, ax1 = plt.subplots(figsize=(5, 4), tight_layout=True)
ax2 = ax1.twinx()
ax1.plot(batch_sizes, energy_measurements, marker="o")
ax2.plot(batch_sizes, latency_measurements, marker="o", color="black")
ax1.set_xlabel("Batch size")
ax1.set_ylabel("Energy per generation (J)", color="C0")
ax2.set_ylabel("Latency (s)")

fig, ax = plt.subplots(figsize=(12, 4), tight_layout=True)
ax.axhline(70, color="red")
timeline = power_timeline["device_instant"][0]
ax.plot([entry[0] for entry in timeline], [entry[1] for entry in timeline])
ax.set_ylim(0)
ax.set_xlabel("Time (s)")
ax.set_ylabel("Device instant power (W)")