-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No copy init for StructuredGrid and RectilinearGrid #2698
No copy init for StructuredGrid and RectilinearGrid #2698
Conversation
Codecov Report
@@ Coverage Diff @@
## patch/coerce_pointslike_arg_copy #2698 +/- ##
====================================================================
- Coverage 93.70% 93.70% -0.01%
====================================================================
Files 75 75
Lines 16178 16192 +14
====================================================================
+ Hits 15160 15172 +12
- Misses 1018 1020 +2 |
I agree with this. It makes no sense to allow non-unique values. What happens if a user IS using data with non-unique values in this PR? Does it silently continue or noisily complain in vtk? If it is silently continuing, it would be nice to put in a check to warn or raise an error. Then users will quickly be able to pinpoint the change here and is also useful in general. |
A quick check revealed that non-unique values are silently handled: import numpy as np
import pyvista
xrng = np.arange(-10, 10, 2, dtype=float)
yrng = np.arange(-10, 10, 5, dtype=float)
zrng = np.arange(-10, 10, 1, dtype=float)
# non-unique xrng
xrng_non_unique = np.array([-10., -10., -8., -6., -4., -2., 0., 2., 4., 6., 8.])
mesh_non_unique = pyvista.RectilinearGrid(xrng_non_unique, yrng, zrng)
mesh = pyvista.RectilinearGrid(xrng, yrng, zrng)
assert mesh == mesh_non_unique Only way around this is to check if an array is unique, and it appears fastest way to do so is: s = np.sort(a, axis=None)
s[:-1][s[1:] == s[:-1]] From https://stackoverflow.com/questions/11528078/ Implementing... |
I went for raising as Calling this good to go. |
pyvista/core/grid.py
Outdated
@@ -161,16 +162,23 @@ def _from_arrays(self, x: np.ndarray, y: np.ndarray, z: np.ndarray): | |||
Coordinates of the points in z direction. | |||
|
|||
""" | |||
|
|||
def raise_not_unique(arr, name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments:
- This is OK here, but it could be a generally useful method.
- Do we need the
name
argument here? Won't the stack trace be enough? Also the variable name is x in this private method, but the user supplied variable could be anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both resolved in 3b85909.
@akaszynski that's a bit surprising. On a quick timing with a 1000-length random array (no unique values, most likely use case) it is indeed slightly faster than
Edit: OK, I see you actually need the duplicate values for some reason. I thought we only wanted to check for lack of duplicates. |
It potentially makes the traceback a bit more helpful by giving the duplicate values. Since we're getting those for free, I figure it's worth it. |
using a typical array of 100-1000 values, I'm seeing a noticeable improvement using sort rather than unique. With 1000 values on my machine:
On second though, I think it makes more sense, especially when considering comment. |
Take this test: import numpy as np
import pyvista as pv
from memory_profiler import profile
@profile
def test_rectilinear():
xrng = np.arange(-100000, 100000, .2, dtype=float)
yrng = np.arange(-100000, 100000, .5, dtype=float)
zrng = np.arange(-100000, 100000, .1, dtype=float)
mesh = pv.RectilinearGrid(xrng, yrng, zrng)
x = mesh.x
x /= 2
if __name__ == '__main__':
test_rectilinear() Reuslts without using
|
Should we have a keyword argument |
For context, this work and the work in #2697 are for a xarray-pyvista DataArray accessor I am creating where we can easily deal with much larger data sizes and this sort of memory consumption is important to minimize |
I mean, this memory usage isn't that significant when realizing that test is actually pretty ridiculous... That mesh has 6,789,684,830,523,280,511 cells (yes, 6 quintillion cells) and is totally not plotting on my laptop anytime soon |
I'm pretty sure we have to choose between memory and performance here. At the worst case we need an |
Let's do that. Didn't realize the scope of the arrays being used here. This is also VTK's default behavior. |
As an added bonus, 23b801b includes updates to our documentation as our class documentation for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to go.
Recommending merge by Wednesday 1 June 2022 given:
#2245 (comment)
I'll merge this into #2697 and do a few final checks, then I think that one will be good to go as well |
* _coerce_pointslike_arg copy=False default * Simplify test * Add create no copy tests * Fix PointSet pytestmark * No copy init for StructuredGrid and RectilinearGrid (#2698) * Support no copy with StructuredGrid * Support no copy with RectilinearGrid * Improve convert_array to handle lists * raise on duplication * refactor * add check_duplicates option Co-authored-by: Alex Kaszynski <akascap@gmail.com> * Set dimensions of StructuredGrid in test * Fix up RectilinearGrid docstrings a bit * Remove comments * Use kwargs on RectilinearGrid init * Use may_share_memory * Use array_equal * Fix coerce test with array_equal and may_share_memory * Update pyvista/utilities/misc.py Co-authored-by: Andras Deak <adeak@users.noreply.github.com> * Add duplicate y and z array cases to RectilinearGrid test * move non-unique test to test_grid Co-authored-by: Alex Kaszynski <akascap@gmail.com> Co-authored-by: Andras Deak <deak.andris@gmail.com> Co-authored-by: Andras Deak <adeak@users.noreply.github.com>
Some changes to include in #2697 - note the base branch
This improves the inits of the
StructuredGrid
andRectilinearGrid
classes such that they can be initialized without copying the source array data.There is a slight potential for the changes to
RectlinearGrid
to not be backward compatible as it would previously take thenp.unique
values of the input array (producing a copy). IMO, and from the documentation on that class, the arrays should be provided as unique values and so the internal call tonp.unique
is unneeded and causes undesired data copying. While in some corner cases, this is not backward compatible, I would argue that those corner cases should not be supported and the original implementation was the library version of synatic sugar.