# Hyperwave Inverse Design Demo

Gradient-based topology optimization of a silicon photonic grating coupler
using the Hyperwave Community SDK. Heavy computation (FDTD, gradients) runs
on cloud GPUs; lightweight operations (arrays, mode solving, plotting) run
locally.

**Local steps (free):** Parameters, theta, layer stack, absorbers, mode solve, monitors

**Cloud GPU steps (credits):** Source generation, forward sim, optimization

> Fabrication constraints (density filtering, binarization, minimum feature
> size) are in development and will be added in a future release.

**Full tutorial with parameter explanations:** [Inverse Design Workflow](https://hyperwave-community.readthedocs.io/en/latest/workflows/inverse_design.html)

## Install

In [None]:
%pip install "hyperwave-community[gds] @ git+https://github.com/spinsphotonics/hyperwave-community.git" -q

## Imports

In [None]:
import os
import hyperwave_community as hwc
import numpy as np
import jax.numpy as jnp
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import pickle
import time

## API Key Setup

This notebook uses Colab Secrets to securely store your API key.
To set up your key:

1. Click the key icon in the left sidebar of this notebook
2. Add a new secret named `HYPERWAVE_API_KEY`
3. Paste your API key as the value
4. Toggle "Notebook access" to ON

If you don't have an API key, [sign up](https://spinsphotonics.com/signup) to get one for free.

In [None]:
from google.colab import userdata
hwc.configure_api(
    api_key=userdata.get('HYPERWAVE_API_KEY'),
)
hwc.get_account_info()

---
## Step 1: Physical Parameters

220nm SOI at 1550nm, partial-etch grating coupler.

In [None]:
import math

# Materials
n_si = 3.48
n_sio2 = 1.44
n_clad = 1.44
n_air = 1.0

# Wavelength
wavelength_um = 1.55

# Layer thicknesses (um)
h_dev = 0.220        # total silicon device layer
etch_depth = 0.110   # partial etch depth
h_box = 2.0          # buried oxide
h_clad = 0.78        # SiO2 cladding
h_sub = 0.8          # silicon substrate
h_air = 1.0          # air above cladding
pad = 3.0            # absorber padding (top and bottom)

# Grid resolution. 35nm is the reference GC resolution
dx = 0.035           # 35nm structure grid
pixel_size = dx / 2  # 17.5nm theta grid (2x for subpixel averaging)
domain = 20.0        # um total domain

# Waveguide
wg_width = 0.5       # um
wg_length = 2.5      # um

# Fiber
beam_waist = 5.2     # um (SMF-28 mode field radius at 1550nm)
fiber_angle = 14.5   # degrees from vertical

# Structure grid dimensions
Lx = int(domain / dx)
Ly = Lx

# Theta grid dimensions (2x structure)
theta_Lx = 2 * Lx
theta_Ly = 2 * Ly

# Layer thicknesses in pixels (float for mode solve structure)
h_p_f = pad / dx
h0_f = h_air / dx
h1_f = h_clad / dx
h2_f = etch_depth / dx
h3_f = (h_dev - etch_depth) / dx
h4_f = h_box / dx
h5_f = h_sub / dx

# Integer pixel thicknesses for simulation structure
h_p = int(round(h_p_f))
h0 = int(round(h0_f))
h1 = int(round(h1_f))
h2 = int(round(h2_f))
h3 = int(round(h3_f))
h4 = int(round(h4_f))
h5 = int(round(h5_f))
Lz = h_p + h0 + h1 + h2 + h3 + h4 + h5 + h_p

# Key Z positions
z_etch = h_p + h0 + h1
z_slab = z_etch + h2
z_box = z_slab + h3

# Frequency
wl_px = wavelength_um / dx
freq = 2 * np.pi / wl_px
freq_band = (freq, freq, 1)

# Permittivity
eps_si = n_si**2
eps_sio2 = n_sio2**2
eps_clad = n_clad**2
eps_air = n_air**2

print(f"Structure grid: {Lx} x {Ly} x {Lz} ({dx * 1000:.0f} nm)")
print(f"Theta grid: {theta_Lx} x {theta_Ly} ({pixel_size * 1000:.1f} nm)")
print(f"Layers (px): pad={h_p} air={h0} clad={h1} etch={h2} slab={h3} BOX={h4} sub={h5} pad={h_p}")

## Step 2: Design Variables (Theta)

`theta` controls the etch layer (top 110nm of 220nm Si):
`theta=1` = unetched silicon, `theta=0` = etched to cladding.

Theta is at 2x structure resolution (17.5nm vs 35nm). Fixed waveguide
on the left, design region initialized to 0.5.

In [None]:
# Waveguide in structure pixels (35nm grid)
wg_len = int(round(wg_length / dx))
wg_hw = int(round(wg_width / 2 / dx))

# Waveguide in theta pixels (17.5nm grid, 2x structure)
wg_len_theta = int(round(wg_length / pixel_size))
wg_hw_theta = int(round(wg_width / 2 / pixel_size))

# Design region with uniform 5um margin on all sides
design_region = {
    'x_start': wg_len_theta,
    'x_end': theta_Lx - wg_len_theta,
    'y_start': wg_len_theta,
    'y_end': theta_Ly - wg_len_theta,
}

# Build theta: zeros -> fill design region with 0.5 -> stamp waveguide as 1.0
dr = design_region
theta_init = np.zeros((theta_Lx, theta_Ly), dtype=np.float32)
theta_init[dr['x_start']:dr['x_end'], dr['y_start']:dr['y_end']] = 0.5
theta_init[:wg_len_theta, theta_Ly // 2 - wg_hw_theta : theta_Ly // 2 + wg_hw_theta] = 1.0

# Plot
extent = [0, theta_Lx * pixel_size, 0, theta_Ly * pixel_size]
plt.figure(figsize=(6, 6))
plt.imshow(theta_init.T, origin='upper', cmap='gray', vmin=0, vmax=1, extent=extent)
plt.colorbar(label='theta')
plt.xlabel('x (um)')
plt.ylabel('y (um)')
plt.title('Initial Theta')
plt.tight_layout()
plt.show()

print(f"Theta shape: {theta_init.shape}")
print(f"Design region: {(dr['x_end'] - dr['x_start']) * pixel_size:.1f} x "
      f"{(dr['y_end'] - dr['y_start']) * pixel_size:.1f} um")

## Step 3: Layer Stack

Build the 3D structure from stacked layers, same as the local workflow.
Only the etch layer (index 3) is optimized; all others have fixed permittivity.

| Layer | Material | Thickness |
|-------|----------|-----------|
| pad (absorber) | air | 3.0 um |
| air | air | 1.0 um |
| cladding | SiO2 | 0.78 um |
| **etch (design)** | **SiO2/Si** | **0.11 um** |
| slab | SiO2/Si (device footprint) | 0.11 um |
| BOX | SiO2 | 2.0 um |
| substrate | Si | 0.8 um |
| pad (absorber) | Si | 3.0 um |

In [None]:
# Slab pattern (uniform zero for non-design layers)
slab = jnp.zeros(theta_init.shape)

# Slab pattern for the unetched bottom half of the device layer:
# Silicon under the design region + waveguide, cladding elsewhere.
# This models a finite device where silicon is fully etched away outside
# the device footprint (half-etch inside, full-etch outside).
slab_device = np.zeros((theta_Lx, theta_Ly), dtype=np.float32)
slab_device[dr['x_start']:dr['x_end'], dr['y_start']:dr['y_end']] = 1.0
slab_device[:wg_len_theta, theta_Ly // 2 - wg_hw_theta : theta_Ly // 2 + wg_hw_theta] = 1.0

# 8-layer SOI stack
design_layers = [
    hwc.Layer(density_pattern=slab,                       permittivity_values=eps_air,            layer_thickness=h_p),
    hwc.Layer(density_pattern=slab,                       permittivity_values=eps_air,            layer_thickness=h0),
    hwc.Layer(density_pattern=slab,                       permittivity_values=eps_clad,           layer_thickness=h1),
    hwc.Layer(density_pattern=jnp.array(theta_init),      permittivity_values=(eps_clad, eps_si), layer_thickness=h2),
    hwc.Layer(density_pattern=jnp.array(slab_device),     permittivity_values=(eps_clad, eps_si), layer_thickness=h3),
    hwc.Layer(density_pattern=slab,                       permittivity_values=eps_sio2,           layer_thickness=h4),
    hwc.Layer(density_pattern=slab,                       permittivity_values=eps_si,             layer_thickness=h5),
    hwc.Layer(density_pattern=slab,                       permittivity_values=eps_si,             layer_thickness=h_p),
]

# Build 3D structure locally (same as local_workflow)
structure = hwc.create_structure(layers=design_layers, vertical_radius=0)

_, Lx_s, Ly_s, Lz_s = structure.permittivity.shape
z_dev = z_etch + int(h2 // 2)

print(f"Structure: {structure.permittivity.shape}")
print(f"Etch z={z_etch} ({z_etch * dx:.3f} um), slab z={z_slab} ({z_slab * dx:.3f} um)")

# Visualize cross-sections
structure.view(show_permittivity=True, show_conductivity=False, axis="z", position=z_dev)
structure.view(show_permittivity=True, show_conductivity=False, axis="x", position=Lx // 2)

## Step 4: Absorbing Boundaries

Adiabatic absorbers prevent reflections at the simulation edges.
`hwc.absorber_params()` returns auto-tuned widths and coefficient
from power-law fits to Bayesian-optimized parameters.

In [None]:
ap = hwc.absorber_params(wavelength_um, dx, structure_dimensions=(Lx, Ly, Lz))
abs_widths = ap["absorption_widths"]
abs_coeff = ap["abs_coeff"]

print(f"Absorber widths (x,y,z): {abs_widths} px = "
      f"({abs_widths[0]*dx:.2f}, {abs_widths[1]*dx:.2f}, {abs_widths[2]*dx:.2f}) um")
print(f"Absorber coefficient: {abs_coeff:.6f}")

# Create absorption mask and add to structure (same as local_workflow)
absorber = hwc.create_absorption_mask(
    grid_shape=(Lx, Ly, Lz),
    absorption_widths=abs_widths,
    absorption_coeff=abs_coeff,
)
structure.conductivity = jnp.zeros_like(structure.conductivity) + absorber

# Visualize absorber regions
structure.view(show_permittivity=False, show_conductivity=True, axis="z", position=Lz // 2)

---
## Step 5: Source

Gaussian beam generated on cloud GPU via the wave equation error method.
Produces a clean downward-propagating beam at the fiber angle. Negative
`theta` tilts the beam toward the waveguide (-x direction).

In [None]:
# Source position: in the air gap, 50nm above cladding surface
source_above_surface_um = 0.05
source_z = int(round((pad + h_air - source_above_surface_um) / dx))

# Grating center in structure pixels
grating_x = int(round((dr['x_start'] + dr['x_end']) / 2 * pixel_size / dx))
grating_y = Ly // 2
waist_px = beam_waist / dx

# Estimate cost
est = hwc.estimate_cost(
    structure_shape=(3, Lx, Ly, Lz),
    max_steps=5000,
    gpu_type="B200",
    simulation_type="fdtd_simulation",
)
if est:
    print(f"Source gen estimate: {est['estimated_seconds']:.0f}s, "
          f"{est['estimated_credits']:.4f} credits")

# Generate Gaussian source on cloud GPU (wave equation error method)
t0 = time.time()
source_field, input_power = hwc.generate_gaussian_source(
    sim_shape=(Lx, Ly, Lz),
    frequencies=np.array([freq]),
    source_pos=(grating_x, grating_y, source_z),  # center of grating
    waist_radius=waist_px,       # beam waist in pixels
    theta=-fiber_angle,          # negative = tilt toward waveguide (-x)
    phi=0.0,                     # tilt in XZ plane
    polarization='y',            # TE polarization
    max_steps=5000,              # FDTD convergence limit
    wavelength_um=wavelength_um, # for absorber_params() auto-tuning
    dx_um=dx,                    # for absorber_params() auto-tuning
    gpu_type="B200",
)
print(f"Source generated in {time.time() - t0:.1f}s")

source_offset = (0, 0, source_z)  # inject at source z-plane
input_power = float(np.mean(input_power))  # scalar for normalization

# Plot source
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
for ax, idx, name in [(axes[0], 1, '|Ey|'), (axes[1], 3, '|Hx|')]:
    ax.imshow(np.abs(np.array(source_field[0, idx, :, :, 0])).T,
              origin='upper', cmap='hot', extent=[0, Lx * dx, 0, Ly * dx])
    ax.set_xlabel('x (um)')
    ax.set_ylabel('y (um)')
    ax.set_title(f'Source {name}')
plt.tight_layout()
plt.show()

print(f"Source shape: {source_field.shape}")
print(f"Source offset: {source_offset}")
print(f"Input power: {input_power:.6f}")

## Step 6: Waveguide Mode

Compute the fundamental TE0 waveguide mode for the loss function. Solves
the E-only eigenmode locally, then converts to full E+H fields via a short
FDTD propagation on cloud GPU.

In [None]:
# Build small waveguide structure for mode solving (narrow x, full y)
small_x_theta = 40  # only need a few pixels in propagation direction
theta_mode = np.zeros((small_x_theta, theta_Ly), dtype=np.float32)
theta_mode[:, theta_Ly // 2 - wg_hw_theta : theta_Ly // 2 + wg_hw_theta] = 1.0

theta_mode_bot = np.where(theta_mode > 0, 1.0, 0.0).astype(np.float32)  # slab under WG
d_mode_slab = jnp.zeros((small_x_theta, theta_Ly))

wg_structure = hwc.create_structure(layers=[
    hwc.Layer(density_pattern=d_mode_slab,               permittivity_values=eps_air,            layer_thickness=h_p),
    hwc.Layer(density_pattern=d_mode_slab,               permittivity_values=eps_air,            layer_thickness=h0),
    hwc.Layer(density_pattern=d_mode_slab,               permittivity_values=eps_clad,           layer_thickness=h1),
    hwc.Layer(density_pattern=jnp.array(theta_mode),     permittivity_values=(eps_clad, eps_si), layer_thickness=h2),  # etch
    hwc.Layer(density_pattern=jnp.array(theta_mode_bot), permittivity_values=(eps_clad, eps_si), layer_thickness=h3),  # slab
    hwc.Layer(density_pattern=d_mode_slab,               permittivity_values=eps_sio2,           layer_thickness=h4),
    hwc.Layer(density_pattern=d_mode_slab,               permittivity_values=eps_si,             layer_thickness=h5),
    hwc.Layer(density_pattern=d_mode_slab,               permittivity_values=eps_si,             layer_thickness=h_p),
], vertical_radius=0)

eps_wg = np.array(wg_structure.permittivity)
Lz_wg = eps_wg.shape[3]
Ly_perm = eps_wg.shape[2]
print(f"WG structure: {eps_wg.shape}")

# Crop YZ cross-section around waveguide core for eigenmode solve
eps_yz = eps_wg[0, eps_wg.shape[1] // 2, :, :]
crop_y = min(50, Ly_perm // 4)
crop_z = min(30, Lz_wg // 4)
y_c = Ly_perm // 2
y0, y1 = y_c - crop_y, y_c + crop_y
z0 = max(0, z_etch - crop_z)
z1 = min(Lz_wg, z_box + crop_z)
eps_crop = eps_yz[y0:y1, z0:z1]
print(f"Cropped eps: {eps_crop.shape}")

# Solve E-only eigenmode locally (no GPU needed)
from hyperwave_community.mode_solver import mode as hwc_mode
eps_4d = jnp.stack([jnp.array(eps_crop)] * 3, axis=0)[:, jnp.newaxis, :, :]  # (3, 1, y, z)
mode_E, beta_arr, _ = hwc_mode(freq_band=freq_band, permittivity=eps_4d, axis=0, mode_num=0)
n_eff = float(beta_arr[0]) / (2 * np.pi / wl_px)
print(f"n_eff = {n_eff:.4f}")
assert 2.0 < n_eff < 3.0, f"n_eff={n_eff:.4f} out of range"

# Convert E-only -> full E+H via short FDTD propagation on cloud GPU
mode_EH = hwc.mode_convert(
    mode_E_field=mode_E[0:1, 0:3, :, :, :],  # (1, 3, 1, y_crop, z_crop)
    freq_band=freq_band,
    permittivity_slice=np.array(eps_crop),
    propagation_axis='x',
    propagation_length=500,  # pixels of waveguide to propagate through
    gpu_type="B200",
)

# Negate H for backward (-x) propagation (mode propagates toward source)
mode_EH = np.array(mode_EH, copy=True)
mode_EH[:, 3:6, ...] *= -1

# P_mode_cross: mode self-overlap integral for normalization
mode_e = np.array(mode_EH[0, 0:3, 0, :, :])  # (3, y_crop, z_crop)
mode_h = np.array(mode_EH[0, 3:6, 0, :, :])
cross = np.cross(mode_e, np.conj(mode_h), axis=0)
P_mode_cross = float(np.abs(np.real(np.sum(cross[0, :, :]))))  # x-component
print(f"P_mode_cross = {P_mode_cross:.6f}")

# Pad mode to full YZ domain (zeros outside waveguide region)
mode_field = np.zeros((1, 6, 1, Ly_perm, Lz_wg), dtype=np.complex64)
mode_field[:, :, :, y0:y1, z0:z1] = np.array(mode_EH)

# Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
ext_crop = [y0 * dx, y1 * dx, z0 * dx, z1 * dx]

axes[0].imshow(eps_crop.T, origin='lower', cmap='viridis', extent=ext_crop)
axes[0].set_title('Cropped permittivity')

axes[1].imshow(np.abs(mode_e[1]).T, origin='lower', cmap='hot', extent=ext_crop)
axes[1].set_title(f'Mode |Ey| (n_eff={n_eff:.4f})')

E_mag = np.sqrt(np.sum(np.abs(mode_e)**2, axis=0))
axes[2].imshow(E_mag.T, origin='lower', cmap='hot', extent=ext_crop)
axes[2].set_title(f'Mode |E| (P_cross={P_mode_cross:.4f})')

for ax in axes:
    ax.set_xlabel('y (um)')
    ax.set_ylabel('z (um)')
plt.tight_layout()
plt.show()

## Step 7: Monitors

Field monitors for visualization and optimization. Four types:

- **Visualization monitors**: full-plane slices for plotting field intensity
- **Waveguide output**: YZ cross-section at the waveguide for coupling measurement
- **Loss monitor**: where the optimizer measures mode overlap (same as WG output)
- **Design monitor**: volume where gradients are computed (must match design region)

In [None]:
# --- Visualization monitors (full-plane slices) ---
monitors = hwc.MonitorSet()

# XY at device layer, XZ at y=center, YZ at x=center
monitors.add(hwc.Monitor(shape=(Lx, Ly, 1), offset=(0, 0, z_dev)), name='Output_xy_device')
monitors.add(hwc.Monitor(shape=(Lx, 1, Lz), offset=(0, Ly // 2, 0)), name='Output_xz_center')
monitors.add(hwc.Monitor(shape=(1, Ly, Lz), offset=(Lx // 2, 0, 0)), name='Output_yz_center')

# --- Waveguide output monitor (for coupling measurement) ---
output_x = abs_widths[0] + 10
monitors.add(hwc.Monitor(shape=(1, Ly, Lz), offset=(output_x, 0, 0)), name='Output_wg_output')

monitors.list_monitors()

# Visualize monitor positions overlaid on structure (same as local_workflow)
monitors.view(
    structure=structure,
    axis="z",
    position=z_dev,
    absorber_boundary=absorber,
)

# --- Loss monitor (where optimizer evaluates mode overlap) ---
loss_monitor_shape = (1, Ly, Lz)
loss_monitor_offset = (output_x, 0, 0)

# --- Design monitor (gradient computation volume = design region only) ---
dr_x0 = dr['x_start'] // 2  # theta to structure coordinates
dr_x1 = dr['x_end'] // 2
dr_y0 = dr['y_start'] // 2
dr_y1 = dr['y_end'] // 2
design_monitor_shape = (dr_x1 - dr_x0, dr_y1 - dr_y0, int(round(h2)))
design_monitor_offset = (dr_x0, dr_y0, z_etch)

print(f"\nLoss monitor at x={output_x} ({output_x * dx:.1f} um)")
print(f"Design monitor: {design_monitor_shape} at offset {design_monitor_offset}")
print(f"  Design region: {(dr_x1 - dr_x0) * dx:.1f} x {(dr_y1 - dr_y0) * dx:.1f} um")

---
## Step 8: Forward Simulation

Verify setup before optimization: source illuminates design region,
light propagates toward waveguide, monitors placed correctly.

In [None]:
# Extract recipe from local structure (same as local_workflow)
structure_recipe = structure.extract_recipe()

# Estimate cost for forward simulation
est = hwc.estimate_cost(
    structure_shape=(3, Lx, Ly, Lz),
    max_steps=10000,
    gpu_type="B200",
    simulation_type="fdtd_simulation",
)
if est:
    print(f"Forward sim estimate: {est['estimated_seconds']:.0f}s, "
          f"{est['estimated_credits']:.4f} credits (${est['estimated_cost_usd']:.2f})")

t0 = time.time()
fwd_results = hwc.simulate(
    structure_recipe=structure_recipe,
    source_field=source_field,
    source_offset=source_offset,
    freq_band=freq_band,
    monitors_recipe=monitors.recipe,
    absorption_widths=abs_widths,
    absorption_coeff=abs_coeff,
    gpu_type="B200",
    simulation_steps=10000,
)

print(f"Forward sim complete: {fwd_results['sim_time']:.1f}s GPU, {time.time() - t0:.0f}s total")

In [None]:
# Quick visualization of all monitors (same as local_workflow)
hwc.quick_view_monitors(fwd_results, component='all')

# Check power at waveguide output
wg_field = np.array(fwd_results['monitor_data']['Output_wg_output'])
S = hwc.S_from_slice(jnp.mean(jnp.array(wg_field), axis=2))
power = float(jnp.abs(jnp.sum(S[0, 0, :, :])))
print(f"Waveguide output power: {power:.6f}")
print(f"Coupling (approx): {power / input_power * 100:.1f}%")

---
## Step 9: Optimization

Adjoint-method inverse design on cloud GPU. Each step runs forward + adjoint
FDTD, then updates theta via the Adam optimizer to maximize the loss function.

### Loss Function: Mode Coupling Efficiency

The default loss measures how efficiently the scattered field couples into
the target waveguide mode (computed in Step 6). It uses the standard
bidirectional overlap integral from coupled mode theory:

$$\eta = \frac{|\operatorname{Re}(I_1 \cdot I_2)|}{2 \, P_\text{in} \, P_\text{mode}}$$

where the overlap integrals are cross-products over the loss monitor plane $A$:

$$I_1 = \int_A (\mathbf{E}_\text{mode} \times \mathbf{H}_\text{sim}^*) \cdot \hat{n} \, dA, \qquad I_2 = \int_A (\mathbf{E}_\text{sim} \times \mathbf{H}_\text{mode}^*) \cdot \hat{n} \, dA$$

and the normalization terms are:
- $P_\text{in}$: input source power (from Step 5)
- $P_\text{mode} = \int_A \operatorname{Re}(\mathbf{E}_\text{mode} \times \mathbf{H}_\text{mode}^*) \cdot \hat{n} \, dA$: mode self-overlap (`P_mode_cross` from Step 6)

For TE polarization propagating along $x$, only the $E_y H_z - E_z H_y$ components contribute.

This is exactly the formula used in Step 11 (verification) to independently check the result.

### Loss Types

`run_optimization` supports 3 built-in loss types (in priority order):

| Loss | Parameter | Use Case |
|------|-----------|----------|
| **Mode coupling** | `mode_field=...` | Waveguide coupling (this notebook) |
| **Poynting power** | `power_axis=0` | Maximize directional power flow |
| **Intensity** | `intensity_component='Ey'` | Maximize field at a point |

See the cell above for parameter details on each type.

**Note:** Each step takes ~1-2 min at 35nm/20um. At 5 steps this
cell runs for ~10 min. Make sure your runtime won't disconnect before starting.

In [None]:
# The 3 built-in loss types cover common inverse design objectives:
#
# 1. MODE COUPLING (used in this notebook):
#    Maximizes overlap between simulated field and a target waveguide mode.
#    Best for waveguide couplers, mode converters, and similar devices.
#
#      mode_field=mode_field,
#      input_power=input_power,
#      mode_cross_power=P_mode_cross,
#      mode_axis=0,
#
# 2. POYNTING POWER:
#    Maximizes total power flow through the loss monitor plane along a
#    specified axis. Good for splitters, routers, and directional devices.
#
#      power_axis=0,          # 0=x, 1=y, 2=z
#      power_maximize=True,
#
# 3. INTENSITY:
#    Maximizes |E|^2 of a single field component at the loss monitor.
#    Useful for focusing, resonators, and field enhancement.
#
#      intensity_component='Ey',  # 'Ex', 'Ey', or 'Ez'
#      intensity_maximize=True,

print("This notebook uses mode coupling (option 1). See run_optimization() below.")

In [None]:
# Structure spec: layer stack template for the GPU to rebuild permittivity
# from theta at each optimization step
structure_spec = {
    'layers_info': [{
        'permittivity_values': [float(v) for v in l.permittivity_values] if isinstance(l.permittivity_values, tuple) else float(l.permittivity_values),
        'layer_thickness': float(l.layer_thickness),
        'density_radius': 0,
        'density_alpha': 0,
    } for l in design_layers],
    'construction_params': {'vertical_radius': 0},
}

# Waveguide mask: forces theta=1 in waveguide region (not optimized)
waveguide_mask = np.zeros((theta_Lx, theta_Ly), dtype=np.float32)
waveguide_mask[:wg_len_theta, theta_Ly // 2 - wg_hw_theta : theta_Ly // 2 + wg_hw_theta] = 1.0

# Optimizer settings. 5 steps for demo, increase to 50+ for production
NUM_STEPS = 5
LR = 0.1
GRAD_CLIP = 1.0

# Estimate cost for optimization (2 FDTD sims per step: forward + adjoint)
est = hwc.estimate_cost(
    structure_shape=(3, Lx, Ly, Lz),
    max_steps=10000 * NUM_STEPS * 2,
    gpu_type="B200",
    simulation_type="fdtd_simulation",
)
if est:
    print(f"Optimization estimate ({NUM_STEPS} steps): {est['estimated_seconds']:.0f}s, "
          f"{est['estimated_credits']:.4f} credits (${est['estimated_cost_usd']:.2f})")

print(f"Optimizer: Adam, LR={LR}, grad_clip={GRAD_CLIP}, {NUM_STEPS} steps")

In [None]:
# Run optimization. Each step streams back as it completes.
# Press stop/interrupt to cancel early. Completed steps are kept.
results = []
t_opt_start = time.time()

try:
    for step_result in hwc.run_optimization(
        theta=theta_init,
        source_field=source_field,
        source_offset=source_offset,
        freq_band=freq_band,
        structure_spec=structure_spec,
        loss_monitor_shape=loss_monitor_shape,
        loss_monitor_offset=loss_monitor_offset,
        design_monitor_shape=design_monitor_shape,
        design_monitor_offset=design_monitor_offset,
        mode_field=mode_field,
        input_power=input_power,
        mode_cross_power=P_mode_cross,
        mode_axis=0,
        waveguide_mask=waveguide_mask,
        num_steps=NUM_STEPS,
        learning_rate=LR,  # Adam + cosine decay (alpha=0.1), clip_norm=1.0
        grad_clip_norm=GRAD_CLIP,
        absorption_widths=abs_widths,
        absorption_coeff=abs_coeff,
        gpu_type="B200",
    ):
        results.append(step_result)
        eff = abs(step_result['loss']) * 100
        print(f"Step {step_result['step']:3d}/{NUM_STEPS}:  eff = {eff:.2f}%  "
              f"|grad|_max = {step_result['grad_max']:.3e}  ({step_result['step_time']:.1f}s)",
              flush=True)
except KeyboardInterrupt:
    elapsed = time.time() - t_opt_start
    print(f"\nCancelled after {len(results)} steps ({elapsed:.0f}s). "
          f"Completed steps are kept.", flush=True)

# Summary
if results:
    efficiencies = [abs(r['loss']) * 100 for r in results]
    best_idx = int(np.argmax(efficiencies))
    best_eff = efficiencies[best_idx]
    loss_dB = -10 * np.log10(max(best_eff / 100, 1e-10))
    print(f"\nBest: {best_eff:.2f}% ({loss_dB:.2f} dB) at step {best_idx + 1}")
    theta_final = results[-1]['theta']
else:
    print("No optimization steps completed.")
    theta_final = theta_init

## Step 10: Results

In [None]:
efficiencies = [abs(r['loss']) * 100 for r in results]
best_idx = int(np.argmax(efficiencies))
best_theta = results[best_idx]['theta']

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Efficiency curve
axes[0].plot(range(1, len(efficiencies) + 1), efficiencies, 'b-o', markersize=3)
axes[0].set_xlabel('Step')
axes[0].set_ylabel('Efficiency (%)')
axes[0].set_title('Mode Coupling Efficiency')
axes[0].grid(True, alpha=0.3)

# Initial vs best theta
extent = [0, theta_Lx * pixel_size, 0, theta_Ly * pixel_size]
axes[1].imshow(theta_init.T, origin='upper', cmap='gray', vmin=0, vmax=1, extent=extent)
axes[1].set_title('Initial')

axes[2].imshow(best_theta.T, origin='upper', cmap='gray', vmin=0, vmax=1, extent=extent)
axes[2].set_title(f'Best (step {best_idx + 1})')

for ax in axes[1:]:
    ax.set_xlabel('x (um)')
    ax.set_ylabel('y (um)')

plt.tight_layout()
plt.show()

## Step 11: Verification

Forward simulation with the best theta to verify coupling.

**Note:** The optimizer reports efficiency *before* each step's gradient update,
so the last reported value (Step 9) is for the theta *before* the final update.
The verified `best_theta` includes that final update, so the verification
efficiency will typically be slightly higher than the last reported value.

In [None]:
# Rebuild structure with optimized theta (same pattern as initial build)
opt_layers = list(design_layers)
opt_layers[3] = hwc.Layer(
    density_pattern=jnp.array(best_theta),
    permittivity_values=(eps_clad, eps_si),
    layer_thickness=h2,
)
opt_structure = hwc.create_structure(layers=opt_layers, vertical_radius=0)
opt_structure.conductivity = jnp.zeros_like(opt_structure.conductivity) + absorber

# Extract recipe for cloud simulation
opt_recipe = opt_structure.extract_recipe()

est = hwc.estimate_cost(structure_shape=(3, Lx, Ly, Lz), max_steps=10000, gpu_type="B200")
if est:
    print(f"Verification sim estimate: {est['estimated_seconds']:.0f}s, "
          f"{est['estimated_credits']:.4f} credits")

t0 = time.time()
opt_results = hwc.simulate(
    structure_recipe=opt_recipe,
    source_field=source_field,
    source_offset=source_offset,
    freq_band=freq_band,
    monitors_recipe=monitors.recipe,
    absorption_widths=abs_widths,
    absorption_coeff=abs_coeff,
    gpu_type="B200",
    simulation_steps=10000,
    # convergence="full" ensures this verification sim uses the same FDTD solver
    # (mem_efficient_multi_freq) as the optimizer in Step 9. Without this, simulate()
    # defaults to convergence="default" which routes to early_stopping_solve, a
    # different solver that can produce slightly different results (~0.5% gap).
    convergence="full",
)
print(f"Verification sim complete: {opt_results['sim_time']:.1f}s GPU, {time.time() - t0:.0f}s total")

# Quick visualization of all monitors
hwc.quick_view_monitors(opt_results, component='all')

# Mode overlap coupling: |Re(I1 * I2)| / (2 * P_in * P_mode_cross)
# Same formula the optimizer uses internally
wg_field = np.array(opt_results['monitor_data']['Output_wg_output'])
field_avg = np.mean(wg_field, axis=2)
E_out = field_avg[0, 0:3, :, :]
H_out = field_avg[0, 3:6, :, :]
E_m = mode_field[0, 0:3, 0, :, :]
H_m = mode_field[0, 3:6, 0, :, :]
I1 = np.sum(E_m[1] * np.conj(H_out[2]) - E_m[2] * np.conj(H_out[1]))
I2 = np.sum(E_out[1] * np.conj(H_m[2]) - E_out[2] * np.conj(H_m[1]))
mode_eff = abs(np.real(I1 * I2)) / (2.0 * input_power * P_mode_cross) * 100

# Poynting power (quick sanity check)
S = hwc.S_from_slice(jnp.mean(jnp.array(wg_field), axis=2))
power_eff = float(jnp.abs(jnp.sum(S[0, 0, :, :]))) / input_power * 100

print(f"Mode coupling:   {mode_eff:.2f}% ({-10*np.log10(max(mode_eff/100, 1e-10)):.2f} dB)")
print(f"Power coupling:  {power_eff:.2f}% ({-10*np.log10(max(power_eff/100, 1e-10)):.2f} dB)")

---
## Summary

| Step | Function | Runs On | Cost |
|------|----------|---------|------|
| 1-4 | Parameters, theta, layers, absorbers | Local | Free |
| 5 | `generate_gaussian_source()` | Cloud GPU | Credits |
| 6 | `create_structure()`, `mode()`, `mode_convert()` | Local + Cloud | Credits |
| 7 | `MonitorSet()`, `Monitor()` | Local | Free |
| 8 | `simulate()` (forward) | Cloud GPU | Credits |
| 9 | `run_optimization()` (adjoint) | Cloud GPU | Credits |
| 10-11 | Analysis, verification | Local + Cloud | Credits |

### Next Steps

- **More steps:** Increase `NUM_STEPS` to 50-100 for convergence
- **Different starting points:** Try several theta initializations and keep the best
- **Finer grid:** Decrease `dx` (e.g., 25nm) for more accurate results
- **Fabrication constraints:** Density filtering, binarization, minimum feature size (coming soon)