# Updated Realistic Synthetic Waveguide Dataset
This repository contains a realistic synthetic dataset generator for optical waveguide characterization. It builds on a physics-inspired core, injects controlled noise, and tunes to match real experimental statistics. The final output is:

- **50,000 samples**  
- **15 input features** (geometrical, material, and physical parameters)  
- **14 output targets** (losses, mode properties, effective index, polarization, etc.)  

Use this dataset to train data-driven models that predict waveguide performance from basic design parameters—with the added realism of experimental-data correction.

## 📂 Input Parameters (15 Features)

| Name | Description | Units / Format |
|---|---|---|
| `core_index` | Complex refractive index of the core, as string `n_real+n_imagj` (e.g. `1.488000-3.200000e-08j`). | — |
| `clad_index` | Complex refractive index of the cladding, same format. | — |
| `core_radius_m` | Core radius $a$. | meters (0.5 × 10⁻⁶–10 × 10⁻⁶ m) |
| `clad_radius_m` | Cladding radius $b$. | meters (20–50 µm) |
| `length_m` | Waveguide length $L$. | meters (1 mm–0.5 m) |
| `wavelength_m` | Operating wavelength $\lambda$. | meters (500 nm–1600 nm) |
| `polarization` | Input polarization (0 = TE, 1 = TM). | unitless (0–1) |
| `alpha_core` | Core intrinsic loss $\alpha_{\rm core}$. | m⁻¹ (1e-4–1e-3) |
| `alpha_clad` | Cladding intrinsic loss $\alpha_{\rm clad}$. | m⁻¹ (1e-4–1e-3) |
| `photoelastic_coeff` | Photoelastic coefficient $p$. | unitless (0.20–0.25) |
| `delta_rho_over_rho` | Density variation ratio $\Delta\rho/\rho$. | unitless (1e-12–1e-11) |
| `sigma_rms_m` | RMS surface roughness $\sigma$. | meters (1–10 nm) |
| `roughness_corr_length_m` | Roughness correlation length $L_{\rm corr}$. | meters (100 nm–1 µm) |
| `w_in_m` | Input beam waist $w_{\rm in}$. | meters (1–5 µm) |
| `input_power` | Input optical power $P_{\rm in}$. | Watts (1–10 mW) |

## 🌟 Output Targets (14 Features)

| Name | Description |
|---|---|
| `propagation_loss_dB` | Propagation loss (dB) |
| `insertion_loss_dB` | Insertion (coupling) loss (dB) |
| `coupling_loss_dB` | Same as insertion loss |
| `mode_field_diameter_m` | Mode field diameter $2w$ |
| `mode_confinement_factor` | Fraction of power confined in the core $\Gamma$ |
| `single_mode` | `Y` if single-mode ($V<2.405$), else `N` |
| `multi_mode` | Complement of `single_mode` |
| `scattering_loss_dB` | Scattering loss (dB) |
| `effective_index` | Effective refractive index $n_{\rm eff}$ |
| `cross_coupling` | Cross-coupling metric |
| `TE_percent`, `TM_percent` | Mode polarization percentages |
| `V_parameter` | Normalized frequency $V$ |
| `output_power` | Output power $P_{\rm out}$ |

## 🧮 Key Equations

**Normalized Frequency**  
$$V = \frac{2\pi\,a}{\lambda}\sqrt{n_{\mathrm{core,real}}^2 - n_{\mathrm{clad,real}}^2}$$

**Mode Field Diameter**  
$$w = a\Bigl(0.65 + 1.619\,V^{-1.5} + 2.879\,V^{-6}\Bigr)\quad\mathrm{MFD}=2w$$

**Mode Confinement Factor**  
$$
u = \begin{cases}
0.9\,V, & V < 2.405,\\
V - 0.5, & V \ge 2.405,
\end{cases}
\quad \Gamma = \frac{u^2}{V^2}
$$

**Intrinsic Attenuation**  
$$\alpha_{\rm eff} = \alpha_{\rm core}\,\Gamma + \alpha_{\rm clad}\,(1-\Gamma)$$

**Scattering Loss Coefficients**  
$$
\alpha_{\rm scatt,bulk}
= \frac{8\pi^3}{3\,\lambda^4}\,p^2\Bigl(\frac{\Delta\rho}{\rho}\Bigr)^2\,\Gamma,
\quad
\alpha_{\rm scatt,surf}
= \frac{4\pi^3}{\lambda^2}\,\sigma_{\rm rms}^2\,L_{\rm corr}
$$

**Total Attenuation**  
$$\alpha_{\rm total} = \alpha_{\rm eff} + \alpha_{\rm scatt,bulk} + \alpha_{\rm scatt,surf}$$

**Output Power**  
$$P_{\rm out} = P_{\rm in}\,\exp\bigl(-\alpha_{\rm total}L\bigr)$$

**Propagation Loss (dB)**  
$$\mathcal{L}_{\rm prop} = 10\,\log_{10}\!\Bigl(\frac{P_{\rm in}}{P_{\rm out}}\Bigr)$$

**Gaussian Overlap**  
$$
T_{\rm nom}
= \frac{2\,w_{\rm in}\,w}{w_{\rm in}^2 + w^2}
  \exp\!\Bigl(-\frac{\Delta x^2}{w_{\rm in}^2 + w^2}\Bigr),
\quad
\Delta x \sim \mathcal{U}(0,2w)
$$

**Insertion/Coupling Loss**  
$$\mathrm{IL}_{\rm dB} = -20\,\log_{10}(T_{\rm nom}),\quad \mathrm{CL}_{\rm dB} = \mathrm{IL}_{\rm dB}$$

**Effective Index**  
$$n_{\rm eff} = \sqrt{n_{\rm clad,real}^2 + \frac{u^2}{V^2}(n_{\rm core,real}^2 - n_{\rm clad,real}^2)}$$

**Cross-Coupling**  
$$
\mathrm{CrossCoupling} =
\begin{cases}
0, & V<2.405,\\
\frac{1}{2}\,\frac{V-2.405}{V}, & V\ge2.405
\end{cases}
$$

**Polarization Percentages**  
$$
\mathrm{TE}\% = (1 - p_{\rm pol})\times100,\quad
\mathrm{TM}\% = p_{\rm pol}\times100
$$
For \(V\ge2.405\):
$$
\mathrm{TE}\% = \bigl((1-p_{\rm pol})(1-C) + 0.5\,C\bigr)\times100,\quad
\mathrm{TM}\% = \bigl(p_{\rm pol}(1-C) + 0.5\,C\bigr)\times100
$$

**Noise Injection**  
$$\eta \sim \mathcal{N}\bigl(0,(\frac{\text{noise}\%}{100})^2\bigr),\quad x' = x\,(1 + \eta)$$

**Clamping / Experimental Correction**  
$$
\mathcal{L}_{\rm prop} \ge 0.1,\;
\mathrm{IL}_{\rm dB},\;\mathrm{CL}_{\rm dB},\;\mathrm{scattering\_loss\_dB}\ge0,\;
P_{\rm out} \ge 10^{-20}\,\mathrm{W}
$$

## ⚙️ Data Generation Procedure

1. **Load experimental data**  
   Read propagation losses and MFDs from the literature; compute mean & std.  
2. **Sample inputs & compute physics**  
   Uniformly sample inputs; compute parameters $V$, $w$, $\Gamma$, loss coefficients, $n_{\rm eff}$, polarization, etc.  
3. **Inject noise**  
   Apply 5\% Gaussian noise to each computed output.  
4. **Experimental correction**  
   Adjust loss & MFD distributions to match experimental statistics.  
5. **Persist to CSV**  
   Batch-write 1,000 samples at a time into `final_realistic_synthetic_dataset.csv`.

In [None]:
# Usage Example
!git clone https://github.com/yourusername/waveguide-dataset.git
%cd waveguide-dataset
!python generate_waveguide_dataset.py
# Output: final_realistic_synthetic_dataset.csv

## 🔧 Parameter Definitions in Equations

Below is a brief description of each symbol used in the key equations:

- **a**: core radius (m)  
- **λ**: operating wavelength (m)  
- **n_{\rm core,real}, n_{\rm clad,real}**: real parts of the refractive indices of core and cladding  
- **V**: normalized frequency (unitless)  
- **w**: Gaussian mode-field radius (m)  
- **MFD**: mode field diameter (m)  
- **u**: eigenvalue parameter (unitless)  
- **Γ**: mode confinement factor (unitless)  
- **α_{\rm core}, α_{\rm clad}**: intrinsic loss coefficients of core and cladding (m^{-1})  
- **α_{\rm scatt,bulk}, α_{\rm scatt,surf}**: bulk and surface scattering coefficients (m^{-1})  
- **α_{\rm eff}**: effective attenuation (m^{-1})  
- **α_{\rm total}**: total attenuation (m^{-1})  
- **P_{\rm in}, P_{\rm out}**: input and output optical power (W)  
- **L**: waveguide length (m)  
- **L_{\rm corr}**: roughness correlation length (m)  
- **p**: photoelastic coefficient (unitless)  
- **Δρ/ρ**: density variation ratio (unitless)  
- **σ_{\rm rms}**: RMS surface roughness (m)  
- **w_{\rm in}**: input beam waist (m)  
- **Δx**: lateral misalignment (m)  
- **T_{\rm nom}**: nominal coupling overlap (unitless)  
- **η**: Gaussian noise factor  
- **x'**: noisy version of computed output  


## 📊 Experimental Data Reference

| Glass_type                                   |   clad_index | index_contrast   |   core_radius_um |   clad_radius_um |   depth_um |   length_cm |   wavelength_nm |   Fresnel_dB | propagation_loss_dB/cm   |   insertion_loss_dB |   mode_mismatch_loss_dB |   scattering_loss_dB |   coupling_loss_dB |   mode_field_diameter_um | single_mode   | Paper                                                                                                                                                 |
|:---------------------------------------------|-------------:|:-----------------|-----------------:|-----------------:|-----------:|------------:|----------------:|-------------:|:-------------------------|--------------------:|------------------------:|---------------------:|-------------------:|-------------------------:|:--------------|:------------------------------------------------------------------------------------------------------------------------------------------------------|
| fused silica glass                           |     nan      | nan              |           nan    |              nan |         75 |       nan   |           633   |         0.02 | 1.1                      |                2    |                    1.84 |                 0.07 |             nan    |                     7    | y             | Single- and multi-scan femtosecond laser writing for selective chemical etching of cross section patternable glass micro-channels | Applied Physics A |
| SF57                                         |     nan      | 10−4             |           nan    |              nan |        nan |       nan   |           633   |       nan    | 1                        |              nan    |                  nan    |               nan    |             nan    |                     6    | y             | Femtosecond laser writing of optical waveguides with controllable core size in high refractive index glass | Applied Physics A                        |
| Foturan glass                                |     nan      | ∼ 1.7×10−3       |           nan    |              nan |        150 |         6.5 |           632.8 |         0.36 | 0.8                      |              nan    |                  nan    |               nan    |               3.1  |                     3.1  | y             | Optical waveguide writing inside Foturan glass with femtosecond laser pulses | Applied Physics A                                                      |
| silica glass                                 |     nan      | ∼ 2.47×10−3      |           nan    |              nan |       2100 |         0.5 |           632.8 |         0.36 | 0.56                     |              nan    |                  nan    |               nan    |               1.56 |                     1.56 | nan           | Influence of focusing depth on the microfabrication of waveguides inside silica glass by femtosecond laser direct writing | Applied Physics A         |
| fused silica                                 |       1.46   | 1.0 x 10-2       |             1    |              nan |        150 |         5   |          1550   |       nan    | ~0.9                     |              nan    |                  nan    |               nan    |             nan    |                    10    | y             | Waveguide writing in fused silica with a femtosecond fiber laser at 522 nm and 1 MHz repetition rate                                                  |
| chalcogenide glass                           |     nan      | 4.5x10−3         |           nan    |              nan |        nan |       nan   |           633   |       nan    | 1.47                     |              nan    |                  nan    |               nan    |             nan    |                   nan    | y             | Fabrication and characterization of femtosecond laser written waveguides in chalcogenide glass | Applied Physics Letters | AIP Publishing             |
| Corning 2947B                                |     nan      | 7 × 10−4         |             1.51 |              nan |        nan |         0.5 |           633   |       nan    | 0.7                      |              nan    |                  nan    |               nan    |             nan    |                   nan    | y             | Low-repetition rate femtosecond laser writing of optical waveguides in water-white glass slides                                                       |
| silica glass                                 |       1.499  | 0.00199          |             4    |              nan |        nan |         1.5 |           800   |       nan    | nan                      |              nan    |                  nan    |               nan    |             nan    |                   nan    | y             | Photowritten optical waveguides in various glasses with ultrashort pulse laser | Applied Physics Letters | AIP Publishing                             |
| ULE® glass                                   |     nan      | nan              |           nan    |              nan |        100 |         2.1 |          1550   |       nan    | 4.2                      |              nan    |                  nan    |               nan    |               0.2  |                   nan    | y             | Study on fs-laser machining of optical waveguides and cavities in ULE® glass - IOPscience                                                             |
| fused silica                                 |     nan      | 1.0 x 10-2       |           nan    |              nan |        150 |       nan   |          1550   |       nan    | 1                        |              nan    |                  nan    |               nan    |               1.4  |                    10    | y             | Waveguide writing in fused silica with a femtosecond fiber laser at 522 nm and 1 MHz repetition rate                                                  |
| Ge-doped silica                              |     nan      | 0.035            |            16    |              nan |        nan |       nan   |           nan   |       nan    | 0.1                      |              nan    |                  nan    |               nan    |             nan    |                   nan    | y             | Writing waveguides and gratings in silica and related materials by a femtosecond laser - ScienceDirect                                                |
| La-rich phosphate glass                      |     nan      | 1.5 × 10−2       |           nan    |              nan |        100 |       nan   |          1620   |       nan    | 0.2                      |                0.1  |                  nan    |               nan    |             nan    |                    10    | y             | Ion migration assisted inscription of high refractive index contrast waveguides by femtosecond laser pulses in phosphate glass                        |
| Ag-doped zinc phosphate                      |       1.59   | 2.5×10−3         |           nan    |              nan |        nan |         0.7 |           630   |         0.43 | 1.2                      |                5.22 |                  nan    |               nan    |             nan    |                   nan    | y             | Direct laser writing of a new type of waveguides in silver containing glasses | Scientific Reports                                                    |
| fused silica                                 |       1.4095 | 0.7 × 10−2       |             7.5  |              nan |        nan |       nan   |          3390   |       nan    | 1.3                      |              nan    |                  nan    |               nan    |             nan    |                   nan    | y             | 3D laser-written silica glass step-index high-contrast waveguides for the 3.5  μm mid-infrared range                                                  |
| la-doped borate glasses                      |     nan      | 1.0 x 10-2       |           nan    |              nan |        nan |       nan   |          1534   |       nan    | 1.6                      |              nan    |                  nan    |               nan    |             nan    |                   nan    | y             | Femtosecond laser writing of photonic devices in borate glasses compositionally designed to be laser writable                                         |
| silica glass                                 |     nan      | nan              |           nan    |              nan |        nan |         1   |          1550   |       nan    | 0.07                     |                1.07 |                  nan    |               nan    |             nan    |                   nan    | y             | Effectively writing low propagation and bend loss waveguides in the silica glass by using a femtosecond laser                                         |
| silica glass                                 |     nan      | 2.3 × 10−2       |             7.5  |              nan |        nan |       nan   |          1030   |         2.52 | 3.45                     |              nan    |                  nan    |               nan    |             nan    |                    12    | y             | High slope efficiency and high refractive index change in direct-written Yb-doped waveguide lasers with depressed claddings                           |
| Corning 2947B                                |       1.51   | nan              |           nan    |              nan |         50 |       nan   |           633   |       nan    | 1.6                      |              nan    |                  nan    |               nan    |             nan    |                   nan    | y             | Low-repetition rate femtosecond laser writing of optical waveguides in water-white glass slides                                                       |
| Ag NPs in tungsten lead– pyrophosphate glass |     nan      | nan              |           nan    |              nan |        nan |         0.2 |           632.8 |       nan    | 1.4                      |              nan    |                  nan    |               nan    |               9    |                   nan    | nan           | Metallic nanoparticles grown in the core of femtosecond laser micromachined waveguides | Journal of Applied Physics | AIP Publishing                  |

*Data sourced from the literature review spreadsheet.*