Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster multi ray trace retry #1323

Merged
merged 12 commits into from
May 22, 2021
Merged

Faster multi ray trace retry #1323

merged 12 commits into from
May 22, 2021

Conversation

Keou0007
Copy link
Contributor

Overview

The retry kwarg in the multi_ray_trace function triggers a retry of ray traces that returned no intersection points. Currently this is done by looping over all the rays and determining if they returned a result, which can be intolerably slow if the number of rays is very large and defeats the point of using embree for speed up in the first place. The new method uses a setdiff to determine which rays need to be retried and then only loops across those, significantly improving the performance of the function.

@codecov
Copy link

codecov bot commented May 18, 2021

Codecov Report

Merging #1323 (32f9c32) into master (f9bbebc) will decrease coverage by 0.03%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master    #1323      +/-   ##
==========================================
- Coverage   90.05%   90.01%   -0.04%     
==========================================
  Files          56       56              
  Lines       12183    12188       +5     
==========================================
  Hits        10971    10971              
- Misses       1212     1217       +5     

@adeak adeak self-requested a review May 18, 2021 09:18
@adeak
Copy link
Member

adeak commented May 18, 2021

At a first glance this looks good, thanks. Unfortunately I'm not using conda so installing pyembree is a bit of a pain. I'll try to get around to do that, because I have a strong suspicion that we could make the function even more efficient.

In the meantime do you have a small complete example that we can use as a benchmark to compare runtimes?

@akaszynski
Copy link
Member

In the meantime do you have a small complete example that we can use as a benchmark to compare runtimes?

That would be great as well. My feeling is that this fix looks good, but I don't usually use conda and can't test this locally without some setup.

@adeak
Copy link
Member

adeak commented May 18, 2021

I don't usually use conda and can't test this locally without some setup.

Same here. I tried installing pyembree without it, but only embree 3 is available in the debian testing repos and pyembree only supports embree 2 (its last commit was in 2019 which makes me a bit nervous about it as a soft dependency).

I've just finished setting up conda for this so I think I'll be able to play around with it.

@adeak
Copy link
Member

adeak commented May 18, 2021

I'm all set. We'll want to fix https://github.com/pyvista/pyvista/blob/master/tests/test_polydata.py#L39-L42:

try:
    CONDA_ENV = os.environ['CONDA_ALWAYS_YES'] == "1"
except KeyError:
    CONDA_ENV = False

Despite the conda I had to run the test with

CONDA_ALWAYS_YES=1 pytest -v -k 'test_polydata'

The only place where this is used is the multi_ray_trace test which checks if rtree, pyembree and trimesh are installed. The conda thing seems to be needed for pyembree.

So how about removing the whole conda filter and just letting pytest.importorskip do its thing?

@akaszynski
Copy link
Member

So how about removing the whole conda filter and just letting pytest.importorskip do its thing?

That's a better approach. Want to make another PR?

@adeak
Copy link
Member

adeak commented May 18, 2021

I think we can put it in this one along with other potential improvements.

@Keou0007 let me know if you feel like removing this yourself, otherwise I believe we can push to your branch to include this ourselves :)

@Keou0007
Copy link
Contributor Author

I think I've done what you want with removing the CONDA_ENV stuff. Please check that.

Here's a simple test for the performance improvement based on the distance between two surfaces example, which I modified to use the multi_ray_trace function in the PR. I've increased the resolution of the hill so there are a million rays being traced. In this particular example, embree will fail to find an intersection for about 50k of them. The idea is that the retry option will redo all of those misses using the OBBTree ray trace function, which is more reliable but much slower.

Code
import time
import pyvista as pv
import numpy as np

# A helper to make a random surface
def hill(seed):
    mesh = pv.ParametricRandomHills(randomseed=seed, u_res=1000, v_res=1000,
                                    hillamplitude=0.5)
    mesh.rotate_y(-10) # give the surfaces some tilt

    return mesh

h0 = hill(1).elevation()
h1 = hill(10)
# Shift one surface
h1.points[:,-1] += 5
h1 = h1.elevation()

h0["distances"] = np.empty(h0.n_points)
h0["distances"][:] = np.nan
vec = [[0, 0, 1]] * h0.n_points

start = time.time()
points, id_r, id_t = h1.multi_ray_trace(h0.points, vec, first_point=True)
print(f"multi_ray_trace method takes {time.time()-start:.3f} s")

h0["distances"][id_r] = np.sqrt(np.sum((points - h0.points[id_r])**2, axis=-1))

p = pv.Plotter()
p.add_mesh(h0, scalars="distances", smooth_shading=True)
p.add_mesh(h1, color=True, opacity=0.75, smooth_shading=True)
p.show()


start = time.time()
points, id_r, id_t = h1.multi_ray_trace(h0.points, vec, first_point=True, retry=True)
print(f"multi_ray_trace method with retry takes {time.time()-start:.3f} s")

h0["distances"][id_r] = np.sqrt(np.sum((points - h0.points[id_r])**2, axis=-1))

p = pv.Plotter()
p.add_mesh(h0, scalars="distances", smooth_shading=True)
p.add_mesh(h1, color=True, opacity=0.75, smooth_shading=True)
p.show()

Running this code using the existing method, it has to loop over all 1 million rays and check if there is an intersection returned before retrying. This is slooooooow.

multi_ray_trace method takes 9.883 s
multi_ray_trace method with retry takes 992.013 s

Using the new method from this PR, retrying the 50k misses one at a time in a loop still takes a few minutes, but we're no longer wasting time checking on every ray.

multi_ray_trace method takes 9.616 s
multi_ray_trace method with retry takes 282.953 s

This disparity in performance only gets worse as the number of rays increases.

@adeak
Copy link
Member

adeak commented May 20, 2021

Yes, your change is what I had in mind, thank you.

Also thanks for the example, that's perfect for me to start playing with this.

Copy link
Member

@akaszynski akaszynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the speedup, I'm more than happy with the changes. Tests pass:

tests/test_polydata.py::test_multi_ray_trace PASSED                      [ 73%]

@adeak
Copy link
Member

adeak commented May 21, 2021

I'm still working on this, I'm just a bit under the weather. Unfortunately due to memory and runtime constraints I can only time a 500x500 problem. @Keou0007 can you time your test with the following version, and check that the results are the same?

    def multi_ray_trace(poly_data, origins, directions, first_point=False, retry=False):
        # docstring snipped for github comment
        if not poly_data.is_all_triangles():
            raise NotAllTrianglesError

        try:
            import trimesh, rtree, pyembree
        except (ModuleNotFoundError, ImportError):
            raise ImportError(
                "To use multi_ray_trace please install trimesh, rtree and pyembree with:\n"
                "\tconda install trimesh rtree pyembree"
            )

        origins = np.asarray(origins)
        directions = np.asarray(directions)

        faces_as_array = poly_data.faces.reshape((poly_data.n_faces, 4))[:, 1:]
        tmesh = trimesh.Trimesh(poly_data.points, faces_as_array)
        locations, index_ray, index_tri = tmesh.ray.intersects_location(
            origins, directions, multiple_hits=not first_point
        )
        if retry:
            ray_tuples = [(id_r, l, id_t) for id_r, l, id_t in zip(index_ray, locations, index_tri)]
            all_ray_indices = np.arange(len(origins))
            retry_ray_indices = np.setdiff1d(all_ray_indices, index_ray, assume_unique=True)
            for id_r in retry_ray_indices:
                origin = origins[id_r]
                vector = directions[id_r]
                unit_vector = vector / np.sqrt(np.sum(np.power(vector, 2)))
                second_point = origin + (unit_vector * poly_data.length)
                locs, indexes = poly_data.ray_trace(origin, second_point, first_point=first_point)
                if locs.any():
                    if first_point:
                        locs = locs.reshape([1, 3])
                    for loc, id_t in zip(locs, indexes):
                        ray_tuples.append((id_r, loc, id_t))
            sorted_results = sorted(ray_tuples, key=lambda x: x[0])
            locations = np.array([loc for id_r, loc, id_t in sorted_results])
            index_ray = np.array([id_r for id_r, loc, id_t in sorted_results])
            index_tri = np.array([id_t for id_r, loc, id_t in sorted_results])
        return locations, index_ray, index_tri

The two changes so far are

  1. assume_unique in setdiff1d,
  2. coercing the inputs to arrays once at the start.

Especially the latter leads to really significant speedup.

I still want to refactor the native python tuple sorting and redundant list comprehensions. And check the inside of the loop. These might also decrease the runtime a bit.

@adeak
Copy link
Member

adeak commented May 21, 2021

Here's the best I've got:

    def multi_ray_trace(poly_data, origins, directions, first_point=False, retry=False):
        # docstring snipped for github comment
        if not poly_data.is_all_triangles():
            raise NotAllTrianglesError

        try:
            import trimesh, rtree, pyembree
        except (ModuleNotFoundError, ImportError):
            raise ImportError(
                "To use multi_ray_trace please install trimesh, rtree and pyembree with:\n"
                "\tconda install trimesh rtree pyembree"
            )

        origins = np.asarray(origins)
        directions = np.asarray(directions)

        faces_as_array = poly_data.faces.reshape((poly_data.n_faces, 4))[:, 1:]
        tmesh = trimesh.Trimesh(poly_data.points, faces_as_array)
        locations, index_ray, index_tri = tmesh.ray.intersects_location(
            origins, directions, multiple_hits=not first_point
        )
        if retry:
            # gather intersecting rays in lists
            loc_lst, ray_lst, tri_lst = [arr.tolist() for arr in [locations, index_ray, index_tri]]

            # find indices that trimesh failed on
            all_ray_indices = np.arange(len(origins))
            retry_ray_indices = np.setdiff1d(all_ray_indices, index_ray, assume_unique=True)

            # compute ray points for all failed rays at once
            origins_retry = origins[retry_ray_indices, :]  # shape (n_retry, 3)
            directions_retry = directions[retry_ray_indices, :]
            unit_directions = directions_retry / np.linalg.norm(directions_retry,
                                                                axis=1, keepdims=True)
            second_points = origins_retry + unit_directions * poly_data.length  # shape (n_retry, 3)

            for id_r, origin, second_point in zip(retry_ray_indices, origins_retry, second_points):
                locs, indices = poly_data.ray_trace(origin, second_point, first_point=first_point)
                if locs.any():
                    if first_point:
                        locs = locs.reshape([1, 3])
                    ray_lst.extend([id_r] * indices.size)
                    tri_lst.extend(indices)
                    loc_lst.extend(locs)
            # sort result arrays by ray index
            index_ray = np.array(ray_lst)
            sorting_inds = index_ray.argsort()
            index_ray = index_ray[sorting_inds]
            index_tri = np.array(tri_lst)[sorting_inds]
            locations = np.array(loc_lst)[sorting_inds]

        return locations, index_ray, index_tri

With this the 500x500 case goes down from 19 seconds to 16.3 seconds on my laptop. I'm curious how it works on the large benchmark.

Edit: I realized memory is not that much of an issue after all, so I could run the 1000 x 1000 case. The improvement is pretty meh: I get 205 seconds (PR version) -> 196 seconds (above version). But converting the inputs to arrays helps the non-retry case as well.

So yeah, the main speedup comes from not looping over rays we've already found.

@Keou0007
Copy link
Contributor Author

Yes there are definitely diminishing returns there and most of the time is taken up by the OBBTree trace now. I did try and make that faster a while back but it seemed the OBBTree is not thread safe so I got some weird results.

I've pushed your changes, thanks for doing that. It looks pretty nice now, to my eyes at least.

Copy link
Member

@adeak adeak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've pushed two small whitespace fixes, but this should be good to go if CI is green. Last night I checked on your example that we get the same result as before, for both values of first_point.

@adeak adeak merged commit 977f3ef into pyvista:master May 22, 2021
@Keou0007 Keou0007 deleted the faster_multi_ray_trace_retry branch May 24, 2021 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants