Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Vectorized" access to mesh properties and data (e.g. in context of PyQIS)? #278

Closed
Huite opened this issue Jul 11, 2020 · 6 comments
Closed
Assignees
Labels
question Further information is requested

Comments

@Huite
Copy link

Huite commented Jul 11, 2020

(I'm not entirely sure if this is exactly the right place to ask these questions, but it seems the most direct way to ask the right people.)

The reason I'm asking is because I'm thinking about constructing a vtk plugin (via pyvista) for visualization and some filters, and I need to access the mesh topology as well as the data.

Let's say I'm writing a Python QGIS plugin, and I want to access properties of a layer. I can do this in two ways for raster layer: either I ask the layer directly, or I pass on the name to GDAL so I can get numpy array back:

layer = QgsProject.instance().mapLayersByName(layername)
gdal_raster = gdal.Open(layer[0].source(), gdal.GA_ReadOnly)

The benefit of this second approach is that we can mostly stick to compiled code (gdal and numpy), avoiding the overhead of the Python interpreter.

Now let's say I want to do the same for a mesh layer. I've been looking at e.g. the Crayfish plugin to see how things work:

def get_mesh_geometry(self, mesh, index):
    face = mesh.face(index)
    points = [mesh.vertex(v) for v in face]
    polygon = QgsPolygon()
    polygon.setExteriorRing(QgsLineString(points))
    return QgsGeometry(polygon)

I would expect quite a bit of overhead from Python in this case, as we're collecting individual points. So if I understand correctly, I suppose the only way around this is offering a C interface just like GDAL does -- but I image that might not be the number one priority at this stage (although the API is still nice and short at this stage!).

So if I think about some options on the short term:

  • Write a C++ --> Cython --> Python shared lib, in which the C++ codes queries MDAL in the QGIS libs, and Cython returns a bunch of numpy arrays. I guess I'd have to look in qgis_core.dll.
  • Compile a separate MDAL dll, probably use pybind11 (or something) to basically make a poor man's interface for my immediate needs.
  • Forgo MDAL entirely, just write some Python functions for the format(s) I need that directly gives me a bunch of numpy arrays.

The third option is by far the least hassle (for one or two formats), but it basically comes down to writing a reimplementation of MDAL in Python (if I want to support multiple formats). So this is a rather open-ended question, but: what are your opinions on this as MDAL devs?

Custom formats via Python

That brings me to another question: am I correct to believe that the only way to construct mesh layers is via MDAL? I mention this because the MDAL QEP states:

Would it be possible to implement support for any custom format?

Yes, by implementation of subclass of QgsMeshDataProvider and registering it in the list of providers available. This could be done in Python plugins.

Apart from the fact that the QGIS docs clearly mention that the API is experimental, does this approach currently work? I have a similar use case for a custom raster format, but I could not find any examples showing a custom QgsRasterDataProvider either.

(Ideally of course, I'd have GDAL and MDAL support for these formats, but that's a much larger responsibility + has greater cost + is far less easy to experiment with.)

@PeterPetrik PeterPetrik added the question Further information is requested label Jul 15, 2020
@PeterPetrik
Copy link
Contributor

PeterPetrik commented Jul 15, 2020

HI @Huite , let me answer some of your questions.

At the moment, we do not have Python API for MDAL library similar to GDAL. Only reason is that we haven't got a need to implement it (and maintain) it so far. But if there is a need for it, it should not be that difficult to create one, since the API is in C-language and it is quite simple as you mentioned.

As for data access, best is to start reading here
For the Mesh, if you access data directly from MDAL, you would need to handle CRS transformations and interpolation yourself. Opposite to GDAL, MDAL is really only IO layer. Within QGIS, data values could be retrieved in blocks as raw data, or interpolated via other functions (similarly what identify tools do or plots in crayfish do)

For the mesh frame, you need to access it one-by-one from Python API (the vectors are not exported), but the whole mesh is in-memory in QGIS, so at least there is no overhead with the IO operations.

For the second question about QGIS, you can write your mesh data provider. Currently we have MDAL and Memory which are registered to registry and used.

in general, best approach depends on your needs/format. If the formats you want to support are quite generic and used, we are happy to help to find the way how to get them into MDAL. if your formats are quite in-house, writing a custom data provider or plugin is probably better way. I am open to disscussion about MDAL python API too.

Kind regards,
Peter

@PeterPetrik PeterPetrik self-assigned this Jul 15, 2020
@Huite
Copy link
Author

Huite commented Jul 16, 2020

Hello Peter,

(I apologize for the somewhat meandering style of this reply, still thinking things out.)

Thanks for the answer. Your comment about the mesh being in memory made me run a quick benchmark, I was probably a little too pessimistic ("premature optimization being the root of all evil" and such...). Collecting the faces and vertices of a 100 000 face mesh into two arrays takes around 0.5-1.0 second on my laptop. Rendering on a laptop cpu, vtk won't smoothly present much more elements anyway. With larger datasets, you'd have to aggregate or interpolate anyway which is what you have to do in 2D as well (or build pyramids) -- and I guess the QGIS functions can help with that (e.g. via blocks as you mention). I think an MDAL Python API would be very useful nonetheless, but it probably shouldn't be my primary concern.

To expand on the file formats, I'm specifically talking about modflow6 mesh data. Groundwater modelling has a big GIS component, so having some support would be pretty useful. It features three types of meshes:

  • Structured (rectilinear but possibly non-equidistant)
  • Unstructured layered ("discretization by vertices")
  • Fully unstructured (but only with prismatic cells, i.e. horizontal and vertical faces at 90 degree angles)

In terms of data model, the first and second are supported within MDAL (as I understand it), provided the mesh consists of triangles or quads in the second case. I would argue that's plenty to be useful already. However, there are a few challenges for visualizing model input especially:

  • The grid definition can exist in plain text form (just an numerated list of vertices and faces), or a binary file which is written after completion of the simulation.
  • The grid definition is stored fully separately from the data, which is stored in tables with cell number -> value records.
  • These tables come in a number of varieties, depending on the type of boundary condition. The data type and the number of column varies. The tables can be stored both in plain text, or binary.
  • It's possible for multiple values to exist for the same cell, which raises the question which one you'd show during visualization.

Enumerating it like this makes me realize there's a quite a bit of very specific logic to implement -- which we're handling in a different Python package anyway; I'm really not sure it belongs in MDAL. Pragmatically, I think converting to e.g. a UGRID netCDF beforehand is likely the most effective approach. I actually question now whether it's smart to write a custom provider at all: having the data in plain text would likely hinder visualization speed, in which case you might want to convert to something else anyway...

Looking at it from that direction, we'd probably benefit most from MDAL developments that are already on-going:

  • Layered mesh support for UGRID (or did this already get released? I haven't gotten the chance to test 3.14 yet to be honest)
  • Efficient netCDF IO
  • Overviews a la GDAL / pyramids / downsampling for very large meshes (overviews for every time step in case of time dependent data?)

Thanks,
Huite

@PeterPetrik
Copy link
Contributor

Hi,

modflow6 outputs are definitely general enough to become new MDAL driver. You can see how there are implemented here, but the documentation is lacking (a bit).

MDAL can read formats that have plaintext file definitions of mesh (see https://github.com/lutraconsulting/MDAL/blob/master/mdal/frmts/mdal_2dm.hpp with example file for example https://github.com/lutraconsulting/MDAL/blob/master/tests/data/2dm/quad_and_line.2dm) and the data definition in other file (e.g. binary https://github.com/lutraconsulting/MDAL/blob/master/mdal/frmts/mdal_binary_dat.cpp or ascii or whatever).
Or the data can be defined in multiple files. Or everything can be defined in one file or whatever...

If you have multiple values to display, that is basically concept of dataset groups. One dataset group defines one type of variable for each face/vertex/edge for each timestep.

Only thing that is not supported is Fully Unstructured meshes. We support 2d, 1d (edges) and layered meshes. But fully unstructured meshes is very huge project in range of few man-months of work in MDAL and QGIS.

P.

@PeterPetrik
Copy link
Contributor

ah, QGIS already supports "pyramids", it is called "overviews" ,see QGIS 3.14 changelog :)

@PeterPetrik
Copy link
Contributor

if you want, we can continue discussion in email peter.petrik@lutraconsulting.co.uk

@Huite
Copy link
Author

Huite commented Jul 16, 2020

Ah, that's very encouraging!

Yeah I'm aware about the complexity of fully unstructured stuff, my intention is to stick to layered unstructured meshes for the short to mid term to avoid the complexity.

Thanks, this has been very informative, I'll be sure to send an e-mail if I've got more to report / ask about this subject.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants