Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Awful performance with many non-physics entities #356

Open
inodentry opened this issue Mar 30, 2024 · 5 comments
Open

Awful performance with many non-physics entities #356

inodentry opened this issue Mar 30, 2024 · 5 comments
Labels
bug Something isn't working performance Improvements or questions related to performance

Comments

@inodentry
Copy link

inodentry commented Mar 30, 2024

We had discussed this on Discord previously, but it seems like the issue still persists in the latest version.

We have a 2D game where we use LDTK (via bevy_ecs_ldtk, which is based on bevy_ecs_tilemap). When the level is spawned, that creates a lot of ECS entities in Bevy (>60k for a moderately sized level). None of those entities have any physics components, they are
just to represent the visuals of the map.

We have very few physics entities (<100 colliders for walls in the level and a few kinematic bodies, no dynamic bodies yet).

The game runs awfully slowly . I ran a trace to determine what is going on, and it looks like bevy_xpbd_2d's propagate_collider_transforms system is the culprit. As well as Bevy's normal transform propagation being called multiple times, before and after physics.

That system takes around 8ms every sub-tick. I am currently setting SubtickCount to 1, because otherwise this is completely unusable.

It seems like it is iterating through all of those non-physics entities and wasting CPU and time.

In conclusion: bevy_xpbd scales very poorly for applications with few physics entities and lots of non-physics entities. A lot of work is wasted on non-physics entities to process their transforms (even though they do not have colliders or any other physics components).

@Jondolf Jondolf added bug Something isn't working performance Improvements or questions related to performance labels Mar 30, 2024
@DaAwesomeP
Copy link

DaAwesomeP commented Apr 2, 2024

Hi, I am also seeing poor performance with colliders. In my case I do have many many colliders at a time, but when I scale down I do not see the expected performance improvement.

Is it possible to further parallelize collect_collisions? When I run a trace, I see the initialization run in a task pool but collect_collisions always appears in the main thread (I suppose Bevy/Rayon could be deciding this is best). I see it is using par_splat_map but for some reason it is not chopping up chunks into threads for me--it does it all at once in the main thread.
I was not tracing properly.

@DaAwesomeP
Copy link

Another thing I found: If I spawn a bunch of colliders, check some collisions, and remove layer filters until the only remaining filters are ones without any collider memberships, then I don't get any performance back. The performance indicates that it calculates the same collisions regardless of filters.

@DaAwesomeP
Copy link

Another idea (from Rapier): there should be an option to skip computing contact points (just collisions) for some colliders (i.e. sensors). It would also be a nice optimization to skip recomputing collisions if at least one of the objects is a sensor and has not moved since the last computation.

@Jondolf
Copy link
Owner

Jondolf commented Jun 19, 2024

Hi! Sorry I took so long, but I think I finally came up with a reasonable fix to the issue. #377 has my fix, along with some refactoring.

TLDR: When a collider is added, just mark all of its ancestors with the ColliderAncestor marker component. The ColliderTransform propagation can simply skip traversing trees with no ColliderAncestor entities. This gets rid of the majority of the unnecessary work, and brought performance from ~22 fps to ~200 fps in a test scene with one root entity and 100,000 child entities.

In the upcoming Avian release (see #346), this should be even less of an issue, as the propagation doesn't need to be done in the substepping loop.

This still doesn't fix the fact that Bevy's normal transform propagation is run before and after physics, but it does make the situation significantly better. We can look into further reducing the overhead in a follow-up.

Jondolf added a commit that referenced this issue Jun 21, 2024
…hy logic into `ColliderHierarchyPlugin` (#377)

# Objective

**Note:** This is for the `bevy-0.14` branch, because it uses new observer functionality (although it's not strictly necessary). It will be merged to `main` for the upcoming release.

Fixes #356.

To handle transforms in collider hierarchies properly, the engine must currently propagate a `ColliderTransform` component down the hierarchy several times per frame. This is made even worse by the fact that this propagation is done at every *substep*, not just once or twice per frame.

This wouldn't be too bad if it only affected physics entities, but the current system traverses through *all* entities that have a `Transform`. As reported in #356, this leads to absolutely awful performance for scenes with a large number of non-physics entities. This means that the engine scales very poorly with entity count, which is unacceptable.

## Solution

### Speed up propagation with `ColliderAncestor`

When colliders are added as children, automatically mark all ancestors with the `ColliderAncestor` component. Similarly, clean up the markers when a collider is removed.

`ColliderTransform` is *only* propagated down trees with `ColliderAncestor` entities. This way, we can essentially skip all unnecessary propagation.

In a test scene with one root entity, 100,000 child entities, and 12 substeps, I was previously getting **~22 fps** even though the entities only had a `SpatialBundle` and no physics components. With the changes in this PR, I now get **~200 fps**. Adding colliders to individual entities doesn't impact performance in any meaningful way, as the propagation is only done for those entities.

Note that this propagation will be even less of an issue in the upcoming Avian release, because there it doesn't even need to be run in the substepping loop.

The `ColliderAncestor` logic has a pretty comprehensive unit test for adding and removing colliders in a hierarchy, but I haven't verified yet if simply changing the `Parent` without moving the `Collider` works as expected. Adding more tests and potentially fixing these more niche cases can be done in a follow-up.

### Add `ColliderHierarchyPlugin`

The `ColliderBackendPlugin` was getting *big*, so I decided to extract the logic related to hierarchies and `ColliderTransform` into a new `ColliderHierarchyPlugin`.

Ideally, you could use the plugins fully independently and for example completely abandon hierarchies if you wanted to, but currently, they are still lightly coupled. Eliminating this coupling is a task for a follow-up.

---

## Migration Guide

Hierarchy and transform logic for colliders has been extracted from the `ColliderBackendPlugin` into a new `ColliderHierarchyPlugin`, which by default is included in the `PhysicsPlugins` plugin group.

If you are adding plugins manually, make sure you have both if you want that functionality.
@Jondolf
Copy link
Owner

Jondolf commented Jun 21, 2024

Follow-up! #380 uses a similar idea as #377, but for the rest of the transform propagation added by bevy_xpbd, along with refactoring the logic further. This brings the performance in the earlier test scene from 200 FPS (after #377) all the way to over 490 FPS. It should eliminate almost all of the remaining transform propagation overhead for non-physics entities :)

Another follow-up idea could be to try and unify the ColliderTransform propagation systems and the "normal" transform propagation systems added by bevy_xpbd, and somehow propagate ColliderTransform and Transform in the same pass in some places. Not sure how viable this is. Additionally, we should verify if all of the transform propagation system copies are really needed, or if we could reduce them.

Jondolf added a commit that referenced this issue Jun 21, 2024
…380)

# Objective

#377 refactored collider hierarchy logic and extracted it into a `ColliderHierarchyPlugin<C: ScalableCollider>`, along with significantly speeding up `ColliderTransform` propagation using a `ColliderAncestor` marker component to ignore trees that don't have colliders.

However, a similar marker component approach can *also* be used to speed up the many copies of transform propagation that the engine has. By limiting the propagation runs to physics entities, we could drastically reduce the overhead for scenes with lots of non-physics entities, further addressing #356.

## Solution

- Extract the ancestor marker logic from `ColliderHierarchyPlugin` into an `AncestorMarkerPlugin<C: Component>` that adds an `AncestorMarker<C>` component for all ancestors of entities with the given component `C`. The logic was also refactored to be more robust and fix some edge cases and fully support hierarchy changes, along with adding more tests.
- Use this plugin in `SyncPlugin` to mark rigid body ancestors.
- Define custom transform propagation systems that *only* traverse trees that have physics entities (rigid bodies or colliders). See the comments in `sync.rs` above the propagation systems to see how this works.
- To support custom collision backends without having to add generics to all the propagation systems, and to avoid needing to unnecessarily duplicate systems, the `ColliderBackendPlugin` now adds a general, automatically managed `ColliderMarker` component for all colliders, which allows filtering collider entities without knowing the collider type. This lets us get rid of the generics on `ColliderHierarchyPlugin`!
- I also changed some scheduling and system sets in the `PrepareSet` to account for some changes and to be more logical.

## Results

Test scene: 12 substeps, 1 root entity, 100,000 child entities. All entities have just a `SpatialBundle`, and one child entity also has a `Collider`.

- Before [#377](#377): ~22 FPS
- After [#377](#377): ~200 FPS
- After this PR: ~490 FPS

Of course, this scene is not very representative of an actual game scene, but it does show just how much of an impact the transform propagation was having. A *lot* of games (probably most of them) do have many more non-physics entities than physics entities, and the overhead added by the marker components and new checks should be very minimal in comparison.

---

## Migration Guide

Some `PrepareSet` system sets have changed order.

Before:

1. `PreInit`
2. `PropagateTransforms`
3. `InitRigidBodies`
4. `InitMassProperties`
5. `InitColliders`
6. `InitTransforms`
7. `Finalize`

After:

1. `PreInit`
2. `InitRigidBodies`
3. `InitColliders`
4. `PropagateTransforms`
5. `InitMassProperties`
6. `InitTransforms`
7. `Finalize`

Additionally, the `ColliderHierarchyPlugin` added in #377 no longer needs generics. The plugin is new anyway however, so this isn't really a breaking change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance Improvements or questions related to performance
Projects
None yet
Development

No branches or pull requests

3 participants