Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpectedly slow CCD #39

Closed
Andlon opened this issue Mar 24, 2023 · 13 comments 路 Fixed by #43
Closed

Unexpectedly slow CCD #39

Andlon opened this issue Mar 24, 2023 · 13 comments 路 Fixed by #43
Assignees
Labels
bug Something isn't working

Comments

@Andlon
Copy link

Andlon commented Mar 24, 2023

Hi, it's me again 馃憢

I noticed that CCD sometimes takes a very long time relative to other components of my simulator. For example, for a small impact problem, with a block dropping (with some initial velocity) on top of another, CCD can take ~20 seconds or more where factorization takes milliseconds. I can reproduce similar behavior for a very simple example, see this Gist here, where a single tetrahedron is squeezed so that the top vertex gets within dhat of the lower face. Here I'm getting timings like this:

Potential duration: 6.1011e-05 seconds.
CCD duration: 2.25247 seconds.

Here's the most crucial part of the code (see the above gist for the full setup):

    // Run this several times just to demonstrate that it's not due to some initialization or similar
    for (int i = 0; i < 5; ++i) {
        const auto potential_begin = std::chrono::steady_clock::now();
        const double potential = constraints.compute_potential(mesh, deformed_vertices, dhat);
        const Eigen::VectorXd grad = constraints.compute_potential_gradient(mesh, deformed_vertices, dhat);
        const Eigen::SparseMatrix<double> hessian = constraints.compute_potential_hessian(mesh, deformed_vertices, dhat,project_spd);
        const auto potential_end = std::chrono::steady_clock::now();
        const auto potential_duration = std::chrono::duration<double>(potential_end - potential_begin).count();

        const auto ccd_begin = std::chrono::steady_clock::now();
        const double alpha = ipc::compute_collision_free_stepsize(mesh, rest_vertices, deformed_vertices);
        const auto ccd_end = std::chrono::steady_clock::now();
        const auto ccd_duration = std::chrono::duration<double>(ccd_end - ccd_begin).count();

        std::cout << "Potential duration: " << potential_duration << " seconds." << std::endl;
        std::cout << "CCD duration: " << ccd_duration << " seconds." << std::endl;
    }

Build type is Release. I don't have all that much experience with CCD, but this seems slower than what I was expecting. Is there something I should configure that I haven't? I've basically just followed the docs/tutorial here. Any hints as to what I can do here, if anything, would be much appreciated :-)

@Andlon Andlon added the bug Something isn't working label Mar 24, 2023
@Andlon
Copy link
Author

Andlon commented Mar 24, 2023

(I had to choose, so I labeled this as a bug, although I'm not sure it is one)

@Andlon
Copy link
Author

Andlon commented Mar 24, 2023

(Also, for the record, I'm aware that the Hessian computed here would have to be passed through to_full_dof in order to be consistent with the call to build_from_full_mesh.)

@zfergus
Copy link
Member

zfergus commented Mar 24, 2023

It seems like you have setup up everything correctly, but I did notice one thing. Here

const double alpha = ipc::compute_collision_free_stepsize(mesh, rest_vertices, deformed_vertices);

you are doing CCD starting at the rest positions and going to the deformed positions. This might be what you intended, but normally we perform CCD between successive deformed positions in the Newton optimization for example.

What usually affects CCD performance is having tiny initial distances and/or large trajectories. The former causes our algorithm to refine further to avoid producing a conservative estimate of 0 for the TOI. The latter can also cause larger running times as it has to refine further to match our length-based convergence criteria.

When I have some free time, I will try to set up a unit test from your gist and see if there are any obvious bottlenecks that can be improved.

@zfergus
Copy link
Member

zfergus commented Mar 24, 2023

Looking at your gist it doesn't seem to be either of those cases, so it should have been able to get you a quick answer. I will definitely look more into it later.

@Andlon
Copy link
Author

Andlon commented Mar 24, 2023

you are doing CCD starting at the rest positions and going to the deformed positions. This might be what you intended, but normally we perform CCD between successive deformed positions in the Newton optimization for example.

Yeah, that was just out of convenience for this example. In my simulator I'm indeed incorporating CCD into the line search.

@Andlon
Copy link
Author

Andlon commented Mar 29, 2023

Unfortunately this issue is causing me a bit of a headache. Upon running a somewhat larger-scale example (500k tets with about 40k triangles in total), the simulation got killed after a few hours due to OOM during ipc::compute_collision_free_stepsize. This could be unrelated, but I suspect that it might be related. I am working towards obtaining a reproducible example (although I guess reproducing OOM depends on your hardware configuration), but in the meantime, I might need a way to circumvent this.

Based on my very limited understanding of how this works, there are two likely candidates for the runtime & memory usage:

  • my candidate step is very long, causing excessive number of pairs in the broadphase (imagine for example that the step tries to move the entire mesh diagonally, which would lead to every primitive's space-time AABB colliding with every other)
  • the CCD is frequently reaching max iterations. From the Tight Inclusion paper, I understand that this may perhaps cause excessive memory usage due to storing candidate intervals.

Now, I'll do some more testing to be sure, but I don't think the first case is what's happening, since I have not noticed very excessive step lengths... However, I am currently looking at an impact scene where a block is falling on top of another block with relatively high velocity and comparably large time step ($10 m/s$ downwards velocity for the top block, $\Delta t = 0.01s$). On the faces aligned with the upwards z-axis, there will indeed be a large number of collision candidates to prune here, since these triangles move a large distance in a single time step, although they move with the same velocity and therefore do not actually intersect.

Nevertheless, this does not explain the small reproducible example that I posted in the beginning of the issue, where we have a single tetrahedron. It might be a contributing factor to runtime and/or memory usage however. In fact, I do observe long CCD runtimes for the first time step where there are no collisions, so this could perhaps explain that particular issue.

For the second potential problem (high CCD iter), I can think of two ways to maybe circumvent this:

  • reduce maximum number of iterations. It's unclear to me what the consequence of this are, and what a reasonable number could be.
  • fall back to floating-point CCD like in the original IPC. I'm also not so clear on what the trade-offs are here. I know I will lose the correctness and guarantees, but I'm not sure how considerable this loss is in practice.

Of course, it might be that the problems I'm experiencing are unrelated to the two candidate problems I presented.

I'd like to ask your advice on what you think is the most reasonable way for me to go forward for the time being. I do not need 100% correctness guarantees, but I would like to be able to set up some fairly challenging scenes and hopefully be able to run them without intersections.

@Andlon
Copy link
Author

Andlon commented Mar 29, 2023

Update: It looks like I unfortunately cannot reproduce the OOM error, presumably due to the same reason that some of my simulations are not deterministic when running with multiple threads (see #36).

@zfergus
Copy link
Member

zfergus commented Mar 29, 2023

Can you try using this branch: https://github.com/ipc-sim/ipc-toolkit/tree/faster-ccd

It changes the CCD strategy by clamping the min distance used to 1e-4. We know that TI CCD is slower the larger the minimum separation distance, so instead of directly using 0.2 * initial_distance it uses std::min(0.2 * initial_distance). Let me know if you run into any problems and how the performance changes.

@Andlon
Copy link
Author

Andlon commented Mar 30, 2023

Thanks for looking into this on so short notice @zfergus, really appreciate it!

Indeed, this appears to completely solve my performance concerns on the few examples that I've tested it. I'm working on setting up more elaborate scenes, but the preliminary results look great.

Unfortunately I also noticed a couple of segmentation faults that seemed to stem from the CCD code (at least the last thing I had in my own logs was that it started to run the CCD). I haven't been able to reproduce these... I actually had this happen once before (not using this branch). So I suspect there's some undefined behavior somewhere related to the CCD, but I have no idea how to debug this as long as I cannot reproduce it. Might require running through valgrind or using a sanitizer.

I'll use this branch for now and continue setting up more experiments and report back with my experience. Thanks again!

@Andlon
Copy link
Author

Andlon commented Mar 30, 2023

Update: Unfortunately right after posting my previous post, my currently running simulation got killed by the OOM killer. I checked my logs and it happened during CCD again 馃 Unfortunately I was not dumping the input to the CCD to file at the moment, I'll see if I can provoke it again somehow. Perhaps this is a separate issue altogether though...

@Andlon
Copy link
Author

Andlon commented Mar 30, 2023

Good news: I managed to reproduce and dump the mesh data. The OOM was caused by my optimizer (not standard Projected Newton) trying to take a ridiculously long/bad step, which caused the broadphase to blow up presumably due to too many collision candidates (like I described up above). I'm not sure if the segmentation fault was due to the same problem or something else, but at least this is something I can solve on my end :-) With some luck, the segmentation faults are also gone, otherwise I'll report back here.

@Andlon
Copy link
Author

Andlon commented Mar 30, 2023

Update: So overall I think the performance is definitely better but I'm still running into the occasional severe slowdown, where CCD takes more than an order of magnitude more time than the factorization.

@Andlon
Copy link
Author

Andlon commented Mar 31, 2023

I discovered a bug in my code when calling ipc-toolkit (basically mixed up some integers for data size in my C wrapper), which seems to be responsible for my segfaults, and after implementing mitigation of long/bad steps, I haven't had any issues since. The CCD also appears to work very well in some non-trivial scenes, just very occasionally slows down a lot during an impact, for example. So, I'm happy to report that things are working very well on my end with your experimental branch 馃憦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants