-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic folder and data structures and pyvista plotting #12
Conversation
So a reason to be a dataframe, not a geodataframe if tracking coordinates? |
Geodataframes from geo pandas only supports 2D data AFIK. Is there 3D support? |
Ah, yes, right! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work @banesullivan I really like it. The only doubts I have at the moment is how many dataframes we really need.
As I see it we have:
- points df: 0D properties including XYZ
- cell data df: Anything but 0D
- df_tri: connects the points df elements to create higher dimensional objects that have cell data
Am I missing some construct here?
I think if we are able to construct the classes only with those - well defined - three frames, it would make the design really clean since any higher level object would be some combination of them!
I am getting very pumped with this project :D
subsurface/geometry/mesh.py
Outdated
|
||
""" | ||
self._df_points = pd.DataFrame(columns=['X', 'Y', 'Z']) | ||
self._df_point_data = pd.DataFrame() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth it to have two dataframes or should we just make one and we just make views:
self._df = pd.Data...
self._df_points is self._df['X', 'Y', 'Z']
self._df_point data is self._df[the rest]
Hi @banesullivan, thanks for kicking of the data structure API by bringing in your valuable pv-experience. I agree that xarray's are predestined for structured grids and that a collection of pandas data frames (at least one for the nodes and one for the cells) are a good and flexible representation of general cell-based data sets. With regard to the latter: Do we need different classes depending on the cell type? I agree that common functionality could be shared within something like pyGIMLi for example can handle various cell types (hex, tets, triangular prisms, quadratic pyramids, etc.) but more importantly a mixture of those (as the shape functions for FEM calculations are automatically generated based on the cell type). Are there reasons against a more general object where the number of columns in the cell data frame represents the maximum number nodes of a cell (let's say 6), and a tetrahedron would only use 4 of them with the unused two being NaNs? Just thinking out loud here ;-) Cheers |
I like that idea @florian-wagner. I have been thinking since Saturday how we can represent all geometries with the minimum number of - in this case - Dataframes (but as many columns as we need) and I reached to similar idea:
Am I missing something Florian? Hopefully I will get a week or two to work on this in July |
No, that's exactly what I had in mind. This would allow arbitrary geometries with point- and cell-based values in two compact pandas data frames. The first 8 columns in the cell dataframe would be reserved for node indices (not six as I wrote earlier...) and could be NaN if not used or columns Int5-Int8 could even be dropped when a mesh only containes tets for example. When all libraries could read and write such a format + VTK I/O + pyvista viz, I would already be a happy user of |
- [CLN] Refactor modules to new terminology - [CLN] Split requiremtes
- [ENH] Welly is optional - [ENH] structured_elements.py and unstructured_elements.py have been updated - [ENH] subpackage for visualiztion. pyvista code moved there - [DOC] Improved definition of the interfaces subpackage
- [ENH] Pyvista plotting
I think it is time to merge this. Some of the ideas that based this changes: https://github.com/softwareunderground/subsurface/blob/mig_dev/sdd.md This is the summary of features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments. I guess we will discuss more and Monday.
- [DOC] added geoh5py to the sdd.md
Just opening this PR to show the diff
In summary, there is a new
geometry
module that defines the spatial data structures of subsurface data. These data structures draw much inspiration from PyVista yet there is a fundamental difference we are implementing here: the idea of "meshes" and "grids".There are two submodules for mesh and grid data structures. Mesh data structures will track all of their data/spatial references in Pandas DataFrame's while grid data structures will track all of their data/spatial reference in an xarray dataset.
Meshes are any subsurface data where all of the points of that data need to be explicitly defined. Examples of data that would be in the mesh form include: point clouds, line sets (wells), triangulated surfaces, structured grids (e.g.
discretize.CurviMesh
orpyvista.StructuredGrid
), tetrahedralized volumes.Grids are any subsurface data with implicitly defined points. Examples of these kinds of data include: rasters, rectilinear grids, (think seismic volumes), etc. The idea being here that the entire dataset can be defined by a few parameters like origin, U/V direction (orientation), and cell spacings. These kinds of data make sense to track in xarray.
Its important to note that I am classifying structured grids as meshes here because all of the points of those types of data structures must be known and cannot be implicitly defined.