# Updated Realistic Synthetic Waveguide Dataset

This synthetic dataset is designed for training physics‐inspired neural networks in the domain of optical waveguide characterization. The dataset consists of 50,000 samples. Each sample comprises 15 input features that capture the physical, material, and geometrical attributes of a waveguide and 14 output targets that describe its optical performance (losses, mode characteristics, effective index, and polarization components).

The dataset is especially useful for developing data‐driven models that can predict waveguide performance from basic design parameters, enabling applications in integrated photonics for both glass and silicon-based devices.

## Input Parameter List (15 Features)


### core_index

Description: Complex refractive index of the waveguide core.

Range: Real part between 1.48 and 1.52 (for single-mode) or 1.50–1.52 (for multimode) with a very small negative imaginary part (loss) between –1×10⁻⁸ and –1×10⁻⁷.

Format: Stored as a string in the form “x+yj” (e.g., “1.488000-3.200000e-08j”) without any extra brackets.

### clad_index

Description: Complex refractive index of the cladding.

Range: Real part between ~1.44 and just below the core index (close for single-mode, larger difference for multimode) with a similar small negative imaginary part.

Format: Same string format as the core index.

### core_radius_m

Description: Core radius (a) in meters.

Range: For single-mode: 0.5–2 µm (diameter 1–4 µm); for multimode: 2–10 µm (diameter 4–20 µm).

### clad_radius_m

Description: Cladding radius (b) in meters.

Range: 20 µm to 50 µm.

### length_m

Description: Waveguide length (L) in meters.

Range: 1 mm to 50 cm (0.001 to 0.5 m).

### wavelength_m

Description: Operating wavelength (λ) in meters.

Range: 500 nm to 1600 nm (500×10⁻⁹ to 1.6×10⁻⁶ m).

### polarization

Description: Input polarization as a unitless number, where 0 represents pure TE and 1 represents pure TM.

Range: 0 to 1.

### alpha_core

Description: Intrinsic loss coefficient for the core (α₁) in inverse meters (m⁻¹).

Range: 1×10⁻⁴ to 1×10⁻³ m⁻¹.

### alpha_clad

Description: Intrinsic loss coefficient for the cladding (α₂) in m⁻¹.

Range: 1×10⁻⁴ to 1×10⁻³ m⁻¹.

### photoelastic_coeff

Description: Photoelastic coefficient (p) of the core material.

Range: 0.20 to 0.25.

### delta_rho_over_rho

Description: Density variation ratio (Δρ/ρ) representing the fractional density fluctuation.

Range: 1×10⁻¹² to 1×10⁻¹¹.

### sigma_rms_m

Description: RMS surface roughness (σ) at the core–cladding interface (in meters).

Range: 1 to 10 nm (1×10⁻⁹ to 1×10⁻⁸ m).

### roughness_corr_length_m

Description: Correlation length (L_corr) of the interface roughness (in meters).

Range: 100 nm to 1 µm (1×10⁻⁷ to 1×10⁻⁶ m).

### w_in_m

Description: Input beam waist (w_in) in meters.

Range: 1 µm to 5 µm (1×10⁻⁶ to 5×10⁻⁶ m).

### input_power

Description: Input optical power (P_in) in Watts.

Range: 1 mW to 10 mW (1×10⁻³ to 1×10⁻² W).

## Output Parameter List (14 Targets)


### propagation_loss_dB

Description: Propagation loss (in dB) computed based on the exponential decay of optical power along the waveguide.

Equation:

```math
P_{out} = P_{in}\exp(-\alpha_{total}L),
\quad
Propagation\ Loss\ (dB) = 10\log_{10}\left(\frac{P_{in}}{P_{out}}\right)
```

### insertion_loss_dB

Description: Insertion (or coupling) loss (in dB) computed from the mode mismatch between the input beam and the guided mode.

### coupling_loss_dB

Description: Coupling loss (in dB). In our model, this is identical to the insertion loss.

### mode_field_diameter_m

Description: Mode field diameter (MFD) computed via an empirical (Marcuse) formula.

Equation:

```math
w = a\left(0.65 + \frac{1.619}{V^{1.5}} + \frac{2.879}{V^6}\right),
MFD = 2w
```

### mode_confinement_factor

Description: Fraction of the optical power confined in the core.

Equation:

```math
\Gamma = \frac{u^2}{V^2},
u = \begin{cases}0.9V, & V<2.405, \\ V-0.5, & V\ge2.405\end{cases}
```

### single_mode

Description: A flag indicating whether the waveguide is single-mode (“Y”) (i.e., V<2.405).

### multi_mode

Description: A flag indicating multimode operation (“Y”) (i.e., V\ge2.405); complementary to the single_mode flag.

### scattering_loss_dB

Description: Scattering loss (in dB) computed from the scattering loss coefficient.

Equation:

```math
scattering_loss_dB = 4.343\,\alpha_{scatt,total}\,L
\alpha_{scatt,total} = \alpha_{scatt,bulk} + \alpha_{scatt,surface}
\end{cases}
```

### effective_index

Description: Effective refractive index (n_{eff}) of the guided mode, computed from an approximate eigenvalue solution.

Equation:

```math
n_{eff} = \sqrt{n_{2,real}^2 + \frac{u^2}{V^2}(n_{1,real}^2 - n_{2,real}^2)}
```

### cross_coupling

Description: An approximate measure of mode cross coupling.

Equation:

```math
Cross\ Coupling = \begin{cases}0, & V<2.405, \\ 0.5\,(V-2.405)/V, & V\ge2.405\end{cases}
```

### TE_percent

Description: Percentage of the transverse electric (TE) component in the mode field. For single-mode, it is computed as (1−polarization)×100, while for multimode it is adjusted by cross coupling.

### TM_percent

Description: Percentage of the transverse magnetic (TM) component (complementary to TE_percent).

### V_parameter

Description: The normalized frequency of the waveguide, computed as

```math
V = \frac{2\pi\,a}{\lambda} \sqrt{n_1^2 - n_2^2}
```

### output_power

Description: The optical power at the output of the waveguide computed from the exponential decay of the input power.

Equation:

```math
P_{out} = P_{in}\exp(-\alpha_{total}L)
```

## Equations and Methodology

**Key Equations:**

**Normalized Frequency (V):**

```math
V = \frac{2\pi\, a}{\lambda} \sqrt{n_{1,real}^2 - n_{2,real}^2}
```

**Mode Field Diameter (MFD):**

```math
w = a\left(0.65 + \frac{1.619}{V^{1.5}} + \frac{2.879}{V^6}\right),\quad MFD = 2w
```

**Mode Confinement Factor:**

```math
\Gamma = \frac{u^2}{V^2},\quad u = \begin{cases}0.9V, & V<2.405, \\ V-0.5, & V\ge2.405\end{cases}
```

**Effective Attenuation:**

```math
\alpha_{eff} = \alpha_{core}\Gamma + \alpha_{clad}(1-\Gamma)
```

**Scattering Loss Coefficient:**

```math
\alpha_{scatt,bulk} = \frac{8\pi^3}{3\lambda^4}p^2(\frac{\Delta\rho}{\rho})^2\Gamma,\quad \alpha_{scatt,surface} = \frac{4\pi^3}{\lambda^2}\sigma_{rms}^2 L_{corr}
```

**Total Attenuation:**

```math
\alpha_{total} = \alpha_{eff} + \alpha_{scatt,bulk} + \alpha_{scatt,surface}
```

**Output Power and Propagation Loss:**

```math
P_{out} = P_{in}\exp(-\alpha_{total}L),\quad Propagation Loss (dB) = 10\log_{10}(\frac{P_{in}}{P_{out}})
```

**Insertion Loss (Coupling Loss) via Gaussian Overlap:**

```math
T_{nom} = \frac{2 w_{in} w}{w_{in}^2 + w^2}\exp\left(-\frac{\Delta x^2}{w_{in}^2 + w^2}\right),\quad IL(dB) = -20 \log_{10}(T_{nom})
```

**Methodology:**
- Random Sampling: Each input parameter is sampled uniformly from its range.
- Mode Balancing: Half the samples forced single-mode, half multimode.
- Physics-Based Computation: Use the equations above to compute all outputs.
- Complex Indices: Stored as strings “x+yj” for ease of parsing.