-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unstructured grid datasets #1251
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dbochkov-flexcompute looks good to me. A few minor comments, but mostly not sure about adding vtk
as a basic requirement vs an optional one. Might want to check with MC about the size of the requirement. Also slightly worried there could be some system-dependent installation instructions so let's just make sure that's not the case before adding it.
requirements/basic.txt
Outdated
@@ -12,3 +12,4 @@ pydantic==2.* | |||
PyYAML | |||
dask | |||
toml | |||
vtk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large is vtk
? And is it relatively easy to install? You might need to check with MC to make sure that it's small enough because I think we were trying to make the basic
requirements as minimal as possible (at least this was true a few months ago, maybe things have changed).
If it's too large, or difficult to install on some systems, we might need to treat it as an optional requirement like trimesh
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation package for vtk is about 100mb. It doesn't seem to have any installation issues (just pip install vtk
) on different systems, which is probably not surprising since it's a very mature package (especially its C++ version). But yeah, I think it probably makes sense to treat it in the same way as trimesh
, especially since it would be required only if you are working with heat solver or in rare cases when the user wants to import unstructured data from other solvers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of the basic
requirements is that you should be able to load and validate simulations, even if you can't do anything else. Generally, I do think it would be best if we can encapsulate vtk imports in a VTK_AVAILABLE
kind of handling similar to trimesh. Do you think it can be possible to load simulations with unstructured data if vtk is not installed? Maybe some validators would have to be skipped if not VTK_AVAILABLE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that should be possible. VTK is really needed only for performing operations like slicing/clipping/interpolating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched vtk
to be an optional package. Similar to trimesh
, functions that require vtk
will raise an error if it's not available. And similar to trimesh
, it is attempted to be imported in types.py
. So, in the rest of the code from .types import vtk
and if vtk == None:
is used. Also managed to add a test that checks that if vtk
is not available instances can still be created and validated but most of functionality is disabled. I did it in the following way:
test_datasets.py
contains tests that check behavior for both whenvtk
is and is not available viaif vtk is None:
. However, runningpytest test_datasets.py
will check only one side of that depending whethervtk
is installed in the python environment- so, there is a separate file
_test_datasets_no_vtk.py
that imports the same tests but prepend them with a monkeypatch fixture that mocksvtk
is not available. So, runningpytest _test_datasets_no_vtk.py
checks that everything works as expected whenvtk
is not installed. It has to be run as a separate pytest session otherwisevtk
gets imported in other tests and remains loaded (hence, changes totest_local.py
andtox.ini
)
_dims = ("index", "axis") | ||
|
||
|
||
class CellDataArray(DataArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain this a bit? so CellDataArray[i,j]
is the j
-th vertex of cell i
? And what does it store? an integer index into another cell? or into the point data array? When I worked on unstructured meshes I would usually have a 3D array with dims cell_index, vertex_index, axis
. It seems that maybe this kind of data structure could be constructed out of this cell data array + the point data array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly, it stores indices of points that constitute the cell. And the representation that you mentioned can definitely be constructed from PointDataArray
+ CellDataArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstanding the intended usage, but in the notebook you attached, in the first cell, there is
tri_grid_cells = td.CellDataArray(
[[0, 1, 2], [1, 2, 3]],
coords=dict(cell_index=np.arange(2), vertex_index=np.arange(3)),
)
but under what circumstance will the vertex_index
not be np.arange(3)
for a triangle? I guess it would be different only if the cell represents a triangle vs. tetrahedron?
Another open-ended discussion that I wanted to bring up: currently I implemented these datasets in such a way that they store only one data array, similarly to
so as you could see both
but then the issue is how to pass them to |
Yeah no good solution comes to mind, at least that would not involve restructuring of the
and do some validations that those are provided when |
Would it make sense to store it the way you suggest (i.e. |
The problem I believe is that you then want to e.g. pass this CustomMedium to a simulation, and send that to our servers, for example. So it really needs to be a modification to the schema. |
cc821fb
to
144c9d1
Compare
I guess another workaround for eliminating repeated grid storage is to change our workflow for sampling material with perturbation models. Specifically, instead of explicitly converting all It would also help with lowering storage from overlapping structures. For example, in this notebook https://docs.flexcompute.com/projects/tidy3d/en/v2.5.0rc2/notebooks/ThermallyTunedRingResonator.html the custom medium associated with the ring structure carries values in the entire bounding box of the ring while in the interior of the ring only values of the background medium are relevant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this will be super useful! It's really neat that you've added those slicing features, I think that'll be a gift that keeps giving in the future.
A couple general comments here, and some minor ones below:
- If we're using vtk, do we still need Gmsh at all? I think vtk also has some basic mesh generation abilities, and the mesh doesn't need to be super fancy for heat if I understood correctly? What's more is that one could then just use vtk to generate surface meshes for the rotations etc. that Lucas was working on, rather than the homemade mesher which he wanted to replace anyway. Might be worth checking if vtk can be used for all of these and then not worry about license stuff with Gmsh? This is a "think about in the future" comment, not directly related to this PR.
- Are there a lot of situations where we expect someone to use an
UnstructuredGridDataset
without VTK? I'm asking because it seems to me a bit square-peg-in-round-hole-ish to "force" the unstructured format stored internally in VTK into an xarray format which isn't really designed for this. Why not just make theUnstructuredGridDataset
a wrapper around VTK that directly stores and accesses data in VTK's native internal representation? For the custom medium discussion, I guess the methods to and fromSpatialDataArray
are still needed, so maybe xarray is the way to go for our specific needs, in which case feel free to ignore this comment. But thinking from the perspective of "what's the best way to store unstructured data in general", it seems strange to create our own xarray representation and then go back and forth from VTK representations, when VTK is already designed to store unstructured data.
_dims = ("index", "axis") | ||
|
||
|
||
class CellDataArray(DataArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstanding the intended usage, but in the notebook you attached, in the first cell, there is
tri_grid_cells = td.CellDataArray(
[[0, 1, 2], [1, 2, 3]],
coords=dict(cell_index=np.arange(2), vertex_index=np.arange(3)),
)
but under what circumstance will the vertex_index
not be np.arange(3)
for a triangle? I guess it would be different only if the cell represents a triangle vs. tetrahedron?
@@ -571,5 +614,8 @@ class ChargeDataArray(DataArray): | |||
TriangleMeshDataArray, | |||
HeatDataArray, | |||
ChargeDataArray, | |||
PointDataArray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general concern I had was how to make sure that the definitions of one data array are consistent with its parent, e.g. the CellDataArray
shouldn't have indices outside the associated PointDataArray
. But I see that you have several validators in the dataset class which should take care of this. But is it possible that someone messes with the coords
etc. of an existing TriangularGridDataset
to yield a new one, but skips the validation? I guess even if this is possible, we don't need to worry about it because if someone does try this, they're playing with fire anyway?
tidy3d/components/data/dataset.py
Outdated
vtk_ind_size = vtkIdTypeArray().GetDataTypeSize() | ||
# using val.astype(np.int32/64) directly causes issues when dataarray are later checked == | ||
if vtk_ind_size == 4: | ||
return CellDataArray(val.data.astype(np.int32, copy=False), coords=val.coords) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understood this part: what would happen if there are a lot of points which exceed what can be represented by int32? I guess a related question is: what decides the type used by vtk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that vtk uses int64 for indices, but our github tests showed that on windows for some reason int32 is used. Perhaps, related to hardware and system configuration? I wonder if you can control that when compiling vtk from source, but not sure if there is any control when pip install vtk
tidy3d/components/data/dataset.py
Outdated
if vtk is None: | ||
return offsets | ||
|
||
vtk_ind_size = vtkIdTypeArray().GetDataTypeSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is known at import time and the check is repeated a couple times. Does it make sense instead to define a VTK_INT
or something globally as int32 or int64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, will add that
Hm, I don't think vtk really has any meshing capabilities. Maybe only surface discretization of primitives and boolean operations, but no volumetric mesh generation.
Yeah, I though about it too, but still choose going the xarray route since we already have a robust hdf reading/writing for that.
|
Oh ok, that's surprising!
Ok, yeah those points make sense given our specific needs. |
38920d6
to
91a670d
Compare
This is ready for final review. Changes since initial review:
Note that, custom mediums do not support yet unstructured datasets, that's for another PR. So, if only unstructured data for temperature is available, one needs to first interpolate it into Usage is demonstrated in these updated notebooks:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Just a few minor comments.
One general comment: do you think it would be helpful to have a comment or docstring somewhere, or somehow include in docs, exactly what functionality do we still have without vtk? I think having basically a list of things that make use of vtk will help a user decide beforehand if they need it or not. Could be in the FAQ section of the docs, or maybe in the docstring of one of the unstructured mesh base classes?
tidy3d/components/data/data_array.py
Outdated
coords = list(self.coords.values()) | ||
data = np.array(self.data) | ||
|
||
if center == coords[axis].data[0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do this check with some tolerance for floating point issues? or not necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not critical here (we can always have two points very close to each other), but you're right, it's better to have some tolerance here. Fixed.
|
||
shape = np.array(np.shape(data)) | ||
old_len = shape[axis] | ||
shape[axis] = 2 * old_len - num_duplicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would happen if center
is set in such a way that it falls "inside" the region where the original data is defined? i.e. there's an overlap in the original region and the reflected region? I guess realistically, in the intended usage of this method, this would never happen. But since this method is externally visible, I wonder if it's worth having a check to either error or somehow handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Also added frontend tests checking reflect
method
@@ -95,7 +97,7 @@ def plot_field( | |||
Name of :class:`.TemperatureMonitorData` to plot. | |||
val : Literal['real', 'abs', 'abs^2'] = 'real' | |||
Which part of the field to plot. | |||
scale : Literal['lin', 'dB'] | |||
scale : Literal['lin', 'log'] | |||
Plot in linear or logarithmic (dB) scale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you clarify whether it's dB or log so that the type string is consistent with the explanation? I think saying logarithmic (dB)
could be confusing because dB is logarithmic with some factor applied. Might be good to just write out the expression to avoid any confusion, e.g. 10 * log (power)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, just a leftover, thanks for catching it
field_name = "|T|², K²" | ||
else: | ||
field_name = "T, K" | ||
|
||
if scale == "log": | ||
field_data = np.log10(np.abs(field_data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh it's pure log, so I guess the string above can be updated to remove the (dB)
tidy3d/components/heat/monitor.py
Outdated
conformal: bool = pd.Field( | ||
False, | ||
title="Conformal Monitor Meshing", | ||
description="Mesh monitor domain exactly. Note that this option affects the computational " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate for the user's sake what it means to mesh the domain exactly? So if we have unstructured=True
and conformal=False
, I guess it would take the mesh based on the heat domain's mesh, and if conformal=True
then the heat mesh is adjusted based on the monitor? But if unstructured=False
, then does conformal
have an effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed description to this
description="If ``True`` the heat simulation mesh will conform to the monitor's geometry. "
"While this can be set for both Cartesian and unstructured monitors, it bears higher "
"significance for the latter ones. Effectively, setting ``conformal = True`` for "
"unstructured monitors (``unstructured = True``) ensures that returned temperature values "
"will not be obtained by interpolation during postprocessing but rather directly "
"transferred from the computational grid."
91a670d
to
fdc9838
Compare
added this to unstructured datasets docstrings:
|
fdc9838
to
7216050
Compare
This PR introduces two new datasets
TriangularGridDataset
andTetrahedralGridDataset
for 2d and 3d unstructured data. I think the easiest way to understand basic capabilities and purposes is to go through this demonstration notebook https://github.com/flexcompute-readthedocs/tidy3d-docs/blob/daniil/unstructured/docs/source/notebooks/UnstructuredData.ipynb. Overall I tried to follow functionality of xarray data arrays, but obviously I could do that so far only on a basic level. Some specific points I wanted to discuss:TriangleMeshDataArray
andTriangleMeshDataset
which are used for STL import. I guess if we strictly follow convention thatMesh
in the class name means surface mesh, whileGrid
means discretization grid then it should be ok.PointDataArray
,CellDataArray
, andIndexedDataArray
is somewhat clunky. They are basically 2d/1d indexed arrays, that is coords are typically specified usingnp.arange(n)
. I could use our numpy-based arrays, but I believe those are stored in json string?Next steps would be (most likely as a separate PR):
To review those next steps I will probably involve @weiliangjin2021
tagging @momchil-flex to keep in loop