Enforcing limits on the internal memory needed by monitors #1273

momchil-flex · 2023-11-22T23:25:35Z

Currently, monitors that produce a small amount of data but may need a large amount of memory internally are not really validated to prevent OOM on our servers. I was thinking on imposing some limits like max number of freqs, or product of num_freqs x num_points x num_modes for mode monitors, etc. but I realized what probably makes the most sense is restricting the data size of the associated field monitor, which is always needed internally.

tylerflex

Couple minor comments and suggestions but overall looks good

tylerflex · 2023-11-23T13:48:20Z

tidy3d/components/simulation.py


        return data_size

+    def _monitor_data_size(self, monitor):


Can you add type annotations?

tylerflex · 2023-11-23T13:49:03Z

tidy3d/components/simulation.py

+        # internal storage also does not exceed the limit.
+        check_monitor_types = [AbstractModeMonitor, SurfaceIntegrationMonitor, DiffractionMonitor]
+        for monitor in self.monitors:
+            if not isinstance(monitor, tuple(check_monitor_types)):


Not necessary to convert to tuple here i think

tylerflex · 2023-11-23T13:51:45Z

tidy3d/components/simulation.py

+            if not isinstance(monitor, tuple(check_monitor_types)):
+                continue
+
+            mnt_kwargs = {


I think a cleaner way to do this could be to add a property for monitors that returns the equivalent of the backend field monitor that would be used for that type of monitor (before processing) and then compute the storage size of that monitor here in the validator. What do you think? This might also be useful on the backend potentially or for debugging purposes

momchil-flex · 2023-11-27T20:52:24Z

I think a cleaner way to do this could be to add a property for monitors that returns the equivalent of the backend field monitor that would be used for that type of monitor (before processing) and then compute the storage size of that monitor here in the validator. What do you think? This might also be useful on the backend potentially or for debugging purposes

So I was thinking about doing it like this initially but it doesn't make that much sense actually; it doesn't work exactly like that on the backend. For example, surface integration monitors do not actually add a FieldMonitor that is then post-processed; on the other hand, ModeMonitor-s do. Also even disregarding that it wasn't easy for me to come up with a way to add this in a nicely unified way that would not need this property to be overwritten in multiple classes.

I've modified things to just introduce a server storage method. What do you think about this? I also added a limit on the max data that a mode solver call can produce, because more than 20GB is likely to error currently. Finally, I also realized that the previous implementation was not right for TimeMonitor-s (essentially FluxTimeMonitor is the only one affected) because it was making a FieldTimeMonitor with the same amount of time steps; instead, the fields at a single time step are stored internally at any given time (this also goes to the above point that there's no direct correspondence to a FieldMonitor or FieldTimeMonitor).

tomflexcompute · 2023-11-27T20:56:27Z

Can the current limit ensure OOM never occurs?

momchil-flex · 2023-11-27T21:20:20Z

Can the current limit ensure OOM never occurs?

Unfortunately no. There's too many ways in which it can happen and we need more work on server-side resource allocation too. But this will prevent some cases which would otherwise almost certainly have gone oom.

tylerflex

one minor comment but otherwise looks good.

tylerflex · 2023-11-27T23:51:49Z

tidy3d/components/simulation.py

+        for monitor in self.monitors:
+            num_cells = self._monitor_num_cells(monitor)
+            # intermediate storage needed
+            internal_data_gb = monitor._storage_size_solver(num_cells=num_cells, tmesh=self.tmesh)


would be a bit easier to understand if we named the variables like

internal_data_bytes = monitor._storage_size_solver(num_cells=num_cells, tmesh=self.tmesh) internal_data_gb = internal_data_bytes / 2**30

also in the other instance where the 2 ** 30 is divided out.

…y mode solver

momchil-flex requested review from tylerflex and tomflexcompute November 22, 2023 23:25

tylerflex requested changes Nov 23, 2023

View reviewed changes

momchil-flex force-pushed the momchil/monitor_limits branch from d1af540 to 990630b Compare November 27, 2023 20:54

tylerflex approved these changes Nov 27, 2023

View reviewed changes

tomflexcompute approved these changes Nov 28, 2023

View reviewed changes

momchil-flex added 2 commits November 28, 2023 14:42

Enforcing limits on the internal memory needed by fdtd monitors and b…

6de759d

…y mode solver

Avoid overflow in storage size estimate on 32-bit numpy builds

33a2451

momchil-flex force-pushed the momchil/monitor_limits branch from cd7c3d2 to 33a2451 Compare November 28, 2023 22:43

momchil-flex merged commit 99248a8 into pre/2.5 Nov 28, 2023
14 checks passed

momchil-flex deleted the momchil/monitor_limits branch November 28, 2023 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforcing limits on the internal memory needed by monitors #1273

Enforcing limits on the internal memory needed by monitors #1273

momchil-flex commented Nov 22, 2023

tylerflex left a comment

tylerflex Nov 23, 2023

tylerflex Nov 23, 2023

tylerflex Nov 23, 2023

momchil-flex commented Nov 27, 2023

tomflexcompute commented Nov 27, 2023

momchil-flex commented Nov 27, 2023

tylerflex left a comment

tylerflex Nov 27, 2023

Enforcing limits on the internal memory needed by monitors #1273

Enforcing limits on the internal memory needed by monitors #1273

Conversation

momchil-flex commented Nov 22, 2023

tylerflex left a comment

Choose a reason for hiding this comment

tylerflex Nov 23, 2023

Choose a reason for hiding this comment

tylerflex Nov 23, 2023

Choose a reason for hiding this comment

tylerflex Nov 23, 2023

Choose a reason for hiding this comment

momchil-flex commented Nov 27, 2023

tomflexcompute commented Nov 27, 2023

momchil-flex commented Nov 27, 2023

tylerflex left a comment

Choose a reason for hiding this comment

tylerflex Nov 27, 2023

Choose a reason for hiding this comment