Speedup semantic rasterizer by pestipeti · Pull Request #140 · woven-by-toyota/l5kit

pestipeti · 2020-09-09T06:43:41Z

I achieved ~ 7-8% speedup with these modifications.

lucabergamini

Amazing work! minors, you can either pick them up yourself or I can also do them, just let me know :)

lucabergamini · 2020-09-09T09:03:56Z

        )

-        return {"xyz": xyz}
+        xyz[:, -1] = 1.0


why is this required? We ignore the z coordinates in the following so this shouldn't make any difference right?

In the current implementation, first you cut the z-coordinate lane_coords["xyz_right"][:, :2] and later in the transform method you stack it back np.vstack((points[:num_dims, :], np.ones(points.shape[1]))). With this we can save the one cut, the np.ones creation, the stack.

Another way of doing this would be to only use the first 2x2 of the matrix (only XY) in the semantic rasterizer. My issue with setting 1 here is that we're removing information in a very hidden part of the code, which may render debugging problematic in the future (also considering there is a cache system in between)

I see. You can move it to the semantic rasterizer method; before the dot product calls.

I like this idea also :) Today was quite busy, but I should be able to work on this tomorrow (hopefully)

lucabergamini · 2020-09-09T09:05:37Z


-        return {"xyz": xyz}
+        xyz[:, -1] = 1.0
+        return {"xyz": xyz.T}


I can see the point of caching this to avoid stack at runtime, but we may still want to include either the two lanes or at least the length of the first, so that we can always unpack the two apart (I'm thinking about centre line support right now)

The cheapest/easiest solution if you add the length of the first line. I think map_api should handle the centerline calculation as well (cachable).

yeah, I agree on that and I'm fine with having the length included

One note: I changed the returned xyz format to xyz.T.

lucabergamini · 2020-09-09T09:07:23Z

-            xy_left = cv2_subpixel(transform_points(lane_coords["xyz_left"][:, :2], world_to_image_space))
-            xy_right = cv2_subpixel(transform_points(lane_coords["xyz_right"][:, :2], world_to_image_space))
-            lanes_area = np.vstack((xy_left, np.flip(xy_right, 0)))  # start->end left then end->start right
+            lanes_xy = cv2_subpixel(world_to_image_space.dot(lane_coords["xyz"]).T[:, :2])


Why not using transform or transform_transpose functions here?

It seemed a bit complicated. Transposes, vstack, np.ones, cut z-axis, etc. I did not want to break other parts of the code by modifying those methods.

(Feel free to correct anything you want.)

Yeah you're perfectly right about that. Let me see if we can also handle this case there (same len for matrix and points)

I did these modifications only to save as many CPU cycles as I possibly can. I did not consider long term usability, I only care about speeding up for the competition, so it is fine if you drop all of or part of this PR.

pascal-pfeiffer · 2021-01-06T22:18:45Z

render_semantic_map is the bottleneck when using a low count of history_frames.
This PR speeds up the render_semantic_map function by ~200%.
Even for higher count of history_frames this change is still significant and I suggest to accept it.

lucabergamini · 2021-01-06T22:21:51Z

This PR speeds up the render_semantic_map function by ~200%.

did you compare this against #196?

pascal-pfeiffer · 2021-01-06T22:53:49Z

did you compare this against #196?

thanks for pointing to that PR. It does even better once the caching is done:

52 ms ± 4.77 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) no changes
24.8 ms ± 777 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) this PR #140

32.9 ms ± 17.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) PR #196
The slowest run took 4.23 times longer than the fastest. This could mean that an intermediate result is being cached.

after caching:
17.5 ms ± 880 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) PR #196

timings are for the full dataset loading (448x224 raster, 0 history frames) so the speedups are normalized to include all other operations such as box_rasterizer as well)

pestipeti added 5 commits September 9, 2020 08:08

Stack left/right lane coordinates inside the map-api

e0ac528

Optimize world to image transformation

5807ae8

Remove redundant data

9c940d4

Fix: Reset every point's z-coordinate to 1.0

862fdd3

Update docstrings

7dc258f

lucabergamini reviewed Sep 9, 2020

View reviewed changes

ossama-othman deleted the branch woven-by-toyota:master October 14, 2025 01:32

ossama-othman closed this Oct 14, 2025

Conversation

pestipeti commented Sep 9, 2020

Uh oh!

lucabergamini left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pascal-pfeiffer commented Jan 6, 2021

Uh oh!

lucabergamini commented Jan 6, 2021

Uh oh!

pascal-pfeiffer commented Jan 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pascal-pfeiffer commented Jan 6, 2021 •

edited

Loading