Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]


### Fixed
- Minor gradient direction and normalization fixes for polyslab, field monitors, and diffraction monitors in autograd.

## [2.7.5] - 2024-10-16

### Added
Expand Down
8 changes: 4 additions & 4 deletions tidy3d/components/data/monitor_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -1037,7 +1037,7 @@ def to_adjoint_point_sources(self, fwidth: float) -> List[PointDipole]:

for freq0 in field_component.coords["f"]:
omega0 = 2 * np.pi * freq0
scaling_factor = 1 / (MU_0 * omega0)
scaling_factor = 33 / (MU_0 * omega0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that seems pretty arbitrary :D why 33?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the honest answer is because I tested the PointDipole sources against every structure combination and the gradients were consistently 33x smaller than numerical ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little worrying then. Have you tested different grid sizes at the monitor location? And the monitor being in different materials?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to test everything, I already had like a 10 x 4 grid of different combinations of things to test, but no I did not test grid size and medium. The grid size should be handled by using interp in the source, the medium could have an effect but I think it would be some scaling by the refractive index, the overall constant might still be the same. I can look into it some more if it would be useful

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm just trying to think about things that this "constant" might actually depend on, like the mesh step around the dipole for some reason, or the material epsilon.

Copy link
Collaborator Author

@tylerflex tylerflex Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's very possible the gradients depend on these things in some way. This PR resolve the gradient direction and corrects the overall normalization so that the un-normalized gradients match pretty well (typically under 10% unnormalized and around 1% normalized), but under my test conditions. So There's another dimension to explore which is how these norms change under different background permittivity and grid resolution.. which gets pretty complicated. I think for sake of practical optimization what we have here is a good improvement, but I'll do a few more tests today to see if I can find some low hanging fruit to improve it

Copy link
Collaborator Author

@tylerflex tylerflex Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested a few different Simulation.medium and also a few different grid sizes. The grad norms were constant across each, maybe a tiny bit of variation with dl, but didnt seem linear with dl and could be due to many things. Ultimately feel confident that these constants are good to give adj ~ numerical grads without normalization for a range of situations, and if not, we can revisit later. So I'll go ahead and merge this


forward_amp = self.get_amplitude(field_component.sel(f=freq0))

Expand Down Expand Up @@ -1076,7 +1076,7 @@ def to_adjoint_field_sources(self, fwidth: float) -> List[CustomCurrentSource]:
for name, field_component in self.field_components.items():
# get the VJP values at frequency and apply adjoint phase
field_component = field_component.sel(f=freq0)
values = -1j * field_component.values
values = 2 * -1j * field_component.values

# make source go backwards
if "H" in name:
Expand Down Expand Up @@ -2974,7 +2974,7 @@ def adjoint_source_amp(self, amp: DataArray, fwidth: float) -> PlaneWave:
theta_data, phi_data = self.angles
angle_sel_kwargs = dict(orders_x=int(order_x), orders_y=int(order_y), f=float(freq0))
angle_theta = float(theta_data.sel(**angle_sel_kwargs))
angle_phi = np.pi + float(phi_data.sel(**angle_sel_kwargs))
angle_phi = float(phi_data.sel(**angle_sel_kwargs))

# if the angle is nan, this amplitude is set to 0 in the fwd pass, so should skip adj
if np.isnan(angle_theta):
Expand All @@ -3000,7 +3000,7 @@ def adjoint_source_amp(self, amp: DataArray, fwidth: float) -> PlaneWave:
center=self.monitor.center,
source_time=GaussianPulse(
amplitude=abs(src_amp),
phase=np.pi + np.angle(src_amp),
phase=np.angle(src_amp),
freq0=freq0,
fwidth=fwidth,
),
Expand Down
2 changes: 1 addition & 1 deletion tidy3d/components/geometry/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2433,7 +2433,7 @@ def derivative_face(
eps_xyz = [derivative_info.eps_data[f"eps_{dim}{dim}"] for dim in "xyz"]

# number of cells from the edge of data to register "inside" (index = num_cells_in - 1)
num_cells_in = 3
num_cells_in = 4

# if not enough data, just use best guess using eps in medium and simulation
needs_eps_approx = any(len(eps.coords[dim_normal]) <= num_cells_in for eps in eps_xyz)
Expand Down
12 changes: 5 additions & 7 deletions tidy3d/components/geometry/polyslab.py
Original file line number Diff line number Diff line change
Expand Up @@ -1425,7 +1425,7 @@ def compute_derivative_vertices(self, derivative_info: DerivativeInfo) -> Traced
vjps_edges = 0.0

# perform D-normal integral
contrib_D = delta_eps_inv * D_der_norm
contrib_D = -delta_eps_inv * D_der_norm
vjps_edges += contrib_D

# perform E-perpendicular integrals
Expand All @@ -1438,12 +1438,9 @@ def compute_derivative_vertices(self, derivative_info: DerivativeInfo) -> Traced
edge_areas = edge_lengths

# correction to edge area based on sidewall distance along slab axis
dim_axis = "xyz"[self.axis]
field_coords_axis = derivative_info.E_der_map[f"E{dim_axis}"].coords[dim_axis]
if len(field_coords_axis) > 1:
slab_height = abs(float(np.squeeze(np.diff(self.slab_bounds))))
if not np.isinf(slab_height):
edge_areas *= slab_height
slab_height = abs(float(np.squeeze(np.diff(self.slab_bounds))))
if not np.isinf(slab_height):
edge_areas *= slab_height

vjps_edges *= edge_areas

Expand All @@ -1452,6 +1449,7 @@ def compute_derivative_vertices(self, derivative_info: DerivativeInfo) -> Traced
vjps_edges_in_plane = vjps_edges.values.reshape((num_vertices, 1)) * normal_vectors_in_plane

vjps_vertices = vjps_edges_in_plane + np.roll(vjps_edges_in_plane, axis=0, shift=-1)
vjps_vertices /= 2.0 # each vertex is effected only 1/2 by each edge

# sign change if counter clockwise, because normal direction is flipped
if self.is_ccw:
Expand Down
Loading