Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic folder and data structures and pyvista plotting #12

Merged
merged 32 commits into from
Sep 28, 2020
Merged

Conversation

banesullivan
Copy link
Member

@banesullivan banesullivan commented Jun 13, 2020

Just opening this PR to show the diff

In summary, there is a new geometry module that defines the spatial data structures of subsurface data. These data structures draw much inspiration from PyVista yet there is a fundamental difference we are implementing here: the idea of "meshes" and "grids".

There are two submodules for mesh and grid data structures. Mesh data structures will track all of their data/spatial references in Pandas DataFrame's while grid data structures will track all of their data/spatial reference in an xarray dataset.

Meshes are any subsurface data where all of the points of that data need to be explicitly defined. Examples of data that would be in the mesh form include: point clouds, line sets (wells), triangulated surfaces, structured grids (e.g. discretize.CurviMesh or pyvista.StructuredGrid), tetrahedralized volumes.

Grids are any subsurface data with implicitly defined points. Examples of these kinds of data include: rasters, rectilinear grids, (think seismic volumes), etc. The idea being here that the entire dataset can be defined by a few parameters like origin, U/V direction (orientation), and cell spacings. These kinds of data make sense to track in xarray.

Its important to note that I am classifying structured grids as meshes here because all of the points of those types of data structures must be known and cannot be implicitly defined.

@bluetyson
Copy link

So a reason to be a dataframe, not a geodataframe if tracking coordinates?

@banesullivan
Copy link
Member Author

Geodataframes from geo pandas only supports 2D data AFIK. Is there 3D support?

@bluetyson
Copy link

Ah, yes, right!

Copy link
Collaborator

@Leguark Leguark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work @banesullivan I really like it. The only doubts I have at the moment is how many dataframes we really need.

As I see it we have:

  • points df: 0D properties including XYZ
  • cell data df: Anything but 0D
  • df_tri: connects the points df elements to create higher dimensional objects that have cell data

Am I missing some construct here?

I think if we are able to construct the classes only with those - well defined - three frames, it would make the design really clean since any higher level object would be some combination of them!

I am getting very pumped with this project :D


"""
self._df_points = pd.DataFrame(columns=['X', 'Y', 'Z'])
self._df_point_data = pd.DataFrame()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth it to have two dataframes or should we just make one and we just make views:

self._df = pd.Data...
self._df_points is self._df['X', 'Y', 'Z']
self._df_point data is self._df[the rest]

@florian-wagner
Copy link

Hi @banesullivan,

thanks for kicking of the data structure API by bringing in your valuable pv-experience. I agree that xarray's are predestined for structured grids and that a collection of pandas data frames (at least one for the nodes and one for the cells) are a good and flexible representation of general cell-based data sets.

With regard to the latter: Do we need different classes depending on the cell type? I agree that common functionality could be shared within something like _CellDataMixin to avoid redundancy, but is that necessary or could we directly aim for a more general UnstructuredMesh object (similar as in pyvista/VTK)?

pyGIMLi for example can handle various cell types (hex, tets, triangular prisms, quadratic pyramids, etc.) but more importantly a mixture of those (as the shape functions for FEM calculations are automatically generated based on the cell type). Are there reasons against a more general object where the number of columns in the cell data frame represents the maximum number nodes of a cell (let's say 6), and a tetrahedron would only use 4 of them with the unused two being NaNs?

Just thinking out loud here ;-)

Cheers
Florian

@Leguark
Copy link
Collaborator

Leguark commented Jun 16, 2020

I like that idea @florian-wagner. I have been thinking since Saturday how we can represent all geometries with the minimum number of - in this case - Dataframes (but as many columns as we need) and I reached to similar idea:

  • Point data Dataframe [X, Y, Z] and properties
  • Element data Dataframe [Int1, Int2, Int3, Int4, ..] and properties
    • Depending on the number of points per element (How many ints or whatever the official name is) we would have lines, triangles...
    • No matter the dimensionality the property will be located always in the center and will represent the whole volume

Am I missing something Florian?

Hopefully I will get a week or two to work on this in July

@florian-wagner
Copy link

florian-wagner commented Jun 16, 2020

No, that's exactly what I had in mind. This would allow arbitrary geometries with point- and cell-based values in two compact pandas data frames. The first 8 columns in the cell dataframe would be reserved for node indices (not six as I wrote earlier...) and could be NaN if not used or columns Int5-Int8 could even be dropped when a mesh only containes tets for example.

When all libraries could read and write such a format + VTK I/O + pyvista viz, I would already be a happy user of subsurface.

hackmd-deploy and others added 7 commits June 16, 2020 09:00
- [CLN] Refactor modules to new terminology
- [CLN] Split requiremtes
- [ENH] Welly is optional
- [ENH] structured_elements.py and unstructured_elements.py have been updated
- [ENH] subpackage for visualiztion. pyvista code moved there
- [DOC] Improved definition of the interfaces subpackage
@Leguark Leguark marked this pull request as ready for review September 24, 2020 10:36
@Leguark
Copy link
Collaborator

Leguark commented Sep 24, 2020

I think it is time to merge this. Some of the ideas that based this changes: https://github.com/softwareunderground/subsurface/blob/mig_dev/sdd.md

This is the summary of features:

  • Created the full folder structure.
  • Added base_structures (StructuredData and UnstructuredData)
  • Adapted elements (points, lines, trisurf etc)
  • Pyvista plotting for all the elements

@Leguark Leguark requested a review from prisae September 24, 2020 10:40
@Leguark Leguark changed the title T20 progress Basic folder and data structures and pyvista plotting Sep 24, 2020
Copy link
Member

@prisae prisae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments. I guess we will discuss more and Monday.

requirements.txt Outdated Show resolved Hide resolved
sdd.md Show resolved Hide resolved
- [DOC] added geoh5py to the sdd.md
@Leguark Leguark merged commit 6a8916a into master Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants