TRACE: mid-timestep adding particles by tigerchenlu98 · Pull Request #821 · hannorein/rebound

tigerchenlu98 · 2025-03-10T15:11:58Z

Trying to address #810 -- this is definitely far from merge-ready, but wanted to get this going in case either of you have immediate thoughts @hannorein @dmhernan

Just to recap -- this is just so we can add particles mid-timestep, for example if the user implements a collision routine that can generate fragments. I'd thought that all we'd need to do is to add a flag such that we auto-accept the step. But the reshuffling of the current_Ks array has proven a little trickier than I anticipated.

…updating for both removing and adding particles mid-timestep for TRACE

hannorein · 2025-03-13T21:12:18Z

Thanks for looking into this. This mostly makes sense to me but I haven't looked in great detail at the logic yet (this is complicated!).

tigerchenlu98 · 2025-03-14T16:45:09Z

Yep, the logic ended up being a lot more irritating than I thought it would be... but it seems to be working now!! I'm sure there's a better/more efficient way to do it than my implementation though, so I'd be happy to chat over zoom about what I did if you'd like @hannorein

I uploaded a C example with a simple collision prescription I've used to test things. This should really be a unit test, but I couldn't figure out how to scale out the energy offset here without accessing some back-end fields that were easier to do in C... that's on the to-do. But we do very well, keeping the error to around 1e-5 -- better than MERCURIUS!

I'm pretty excited about this -- a lot of people have been requesting this since allowing for fragmentation seems to be a big deal for early planet formation simulations, which is now a great use case for TRACE. I'll certainly do some much more extensive testing, but after that I think this may be worth writing up in a quick research note in RNAAS to let people know of the new capabilities.

hannorein · 2025-03-20T14:42:36Z

It looks like the new temp_Ks array never gets freed.

tigerchenlu98 · 2025-03-20T16:18:00Z

Fixed, good catch!

One note here: I was pretty disappointed in TRACE's speed for this problem (we're faster than MERCURIUS, but only by about a factor of ~2) so I ran some profiling. It turns out that around ~60% of the runtime was being taken up by the pre/post timestep checks, which scale as O(N^2)! Precomputing the switching radii shaves off about ~10%, but I wonder if we can still do better. There's certainly no shortcuts if we want to do it fully robustly -- we can't get around computing pairwise distances every timestep, and in terms of absolute time we spend as much time in pre/post_ts_check as MERCURIUS does in encounter_predict, so probably no speedups on offer there. But I wonder if there's a way to prune particles unlikely to be in close encounters, such that we don't compute switching radii at all for some... food for thought.

dmhernan · 2025-03-20T16:28:39Z

Is it 60% of the time being used for the close encounter prediction alone?

tigerchenlu98 · 2025-03-20T16:31:32Z

Is it 60% of the time being used for the close encounter prediction alone?

yes! Nothing to do with collisions at all, this inefficiency was always present -- we just never tested TRACE with a system with large enough N for this to be noticeable

dmhernan · 2025-03-20T16:36:34Z

So the encounter prediction is done pre and post step, right? I think the first obvious thing is that the post step prediction could get recycled for use in the next step most of the time without further work, which would reduce compute burden (I guess at best to ~40%).

tigerchenlu98 · 2025-03-20T17:01:02Z

So the encounter prediction is done pre and post step, right? I think the first obvious thing is that the post step prediction could get recycled for use in the next step most of the time without further work, which would reduce compute burden to 30%.

Good idea! I think that'll mean we need to save a lot more data for SimulationArchive, but hopefully it'll be worth it if the speedup is big enough.

Ahh @dmhernan but it's actually a bit trickier, because TRACE should support all the REBOUNDx effects too, which would change particle positions between timesteps... maybe some sort of safe mode can be implemented again

hannorein · 2025-03-20T17:18:48Z

It's very scenario dependent. The $O(N^2)$ could be made $O(N \log N)$ with either a tree or the line search algorithm. But that's a ton of extra complexity.

dmhernan · 2025-03-20T19:14:15Z

Ah, something like a tree method sounds promising. I would start with trying to recycle work when possible since the default case is probably recalculating many pairwise distances.

hannorein · 2025-03-22T22:22:03Z

Sorry for being slow on this one. I'm having a hard time following all the logic! Comments in random order:

I don't think it makes sense thinking about optimization without a specific science case on hand.
In the example collision resolve function, you're removing particles and adding particles. Previously, the idea was to indicate which particles get removed by returning either, 0, 1, 2, or 3 so that the particles can be removed later. If you're doing this in the collision resolve function, I think we'll end up with undefined behaviour whenever there is more than one collision because it's no longer clear which particles have been checked for a collision and which ones had a collision.
I don't understand the purpose of the r->collisions[1].p1 = -1; statement in the example.
You might want to rename the example from add_particle to something involving the name TRACE.
What happens to the new dcrit6 array when a particle gets added or removed? It doesn't look like it's getting updated? I suspect the post timestep check might be wrong.

hannorein · 2025-03-23T14:00:54Z

@tigerchenlu98, I see you're implementing more things, I assume to address some of the issues. But before you go much further, note that there are already 31 (!) variables and arrays defined just for internal use by TRACE. I'm not quite comfortable with this level of complexity. Someone wanting to use trace will not be able to find the right options. I think we should take a step back and think about how to simplify things. For example, having to define an extra array for a 10% speedup is not worth it in my opinion.

tigerchenlu98 · 2025-03-23T16:00:50Z

Hi @hannorein -- totally fair, and my apologies, I well may be getting overzealous! To address your questions first:

I don't think it makes sense thinking about optimization without a specific science case on hand.

The specific science case I'm thinking of is described in the research note -- essentially its a protoplanetary disk of ~150 planetesimals with realistic collisions. The code itself was written by Haniyeh Tajer, and I've just added it to the commit.

In the example collision resolve function, you're removing particles and adding particles. Previously, the idea was to indicate which particles get removed by returning either, 0, 1, 2, or 3 so that the particles can be removed later. If you're doing this in the collision resolve function, I think we'll end up with undefined behaviour whenever there is more than one collision because it's no longer clear which particles have been checked for a collision and which ones had a collision.

I don't understand the purpose of the r->collisions[1].p1 = -1; statement in the example.

You might want to rename the example from add_particle to something involving the name TRACE.

Oh I don't plan on including that example in the final release -- we should definitely remove it, probably in favor of the example in the note. Everything you're asking about was just my extremely hack-y way to scale out the energy of the collision to assess if TRACE was handling the post-collision dynamics well.

What happens to the new dcrit6 array when a particle gets added or removed? It doesn't look like it's getting updated? I suspect the post timestep check might be wrong.

Ah, but it doesn't need to be updated -- anytime a particle gets added or removed, we force accept the step anyway and post_ts_check is never called!

To make sure we're on the same page, I'm mostly motivated by these large N disk simulations. I think this could be a great application for TRACE, and I'm hoping we can deliver something closer to the 10x improvement we promise over MERCURIUS. The improvements I have/had in mind:

Precalculating the switching radius every timestep, gives about ~10% speedup
Safe mode, so carrying over close encounters from the post-timestep check to the next pre-timestep check. This gives about ~15% speedup.
Tree search for close encounter detection, haven't implemented this yet.

But also note that the numbers I reported are all for the N~150 case, and they'll represent even larger chunks of the runtime a N increases. If complexity on the user side is a concern, I'd be happy to write up a help page a.la Advanced WHFAST Settings. But of course I'll defer to your judgement for what you think is worth it in the first place -- let me know!

hannorein · 2025-03-23T18:20:02Z

Thanks for the clarifications. So there are two separate issues:

I'm not sure if it is a good idea to include all of this into the upstream version of REBOUND.
- Again, I'm just really worried about the complexity. There are so many options and edge cases. Very few of them are checked for in unit tests. And there is a lot of opportunity for users to do something wrong without ever noticing.
- It might be better to just keep this as a separate fork for this specific problem. We could link to it from the main REBOUND page. But I guess you feel strongly about including it into the main branch because others might find it useful?
- The example you added is very detailed which is great. But it's also over 700 lines of code long! I don't understand everything. I think a much shorter example would be more useful. (Unrelated: you might want to get rid of the global variables.)
I don't understand everything that the algorithm is doing (yet). Clearly you have thought about this for much longer.
- Isn't dcrit6 dependent on the position and thus different before and after the timestep? If you are caching the values then don't you break time symmetry? It might not matter, but then the algorithm is really no longer time symmetric whether there are close encounters or not.
- I don't see why caching the dcrit6 values would lead to a much larger speedup for larger N. Isn't this pre/post check always going to be $O(N^2$)? So in the extreme limit where the pre/post checks dominate everything, you would get a 50% speedup. I don't think the added complexity is worth optimizing for, even in that case. And there is also always a $O(N^2)$ interaction term, right? So even using a tree code for the pre/post timestep/collision part will not change the algorithm's $O(N^2)$ scaling. But I might be missing something here.
- For the science case, what is the number of particles that you are aiming for? Is it as many as possible, or do you have an idea what you really need to be converged? Again, I would be careful to avoid premature optimization.
- Regarding the safe mode. If a user adds a particle in-between timesteps, doesn't that lead to an array overflow when reading previous_Ks?

hannorein · 2025-03-23T18:20:35Z

(Also happy to connect via zoom if that's easier to address to bigger questions)

tigerchenlu98 · 2025-03-23T18:36:23Z

Hey @hannorein, probably easiest to go over this on zoom. What's your availability looking like? I'm free after 1 PM tomorrow or 2 PM on Tuesday. I'll throw out 2 PM tomorrow, but let me know if that's not convenient for you.

I'd imagine N ~ 1000 is the most we'll want to try to support. That's how many particles we use in Section 5.4 of the original TRACE paper, and its only computationally tractable initially because so many of the particles immediately collide and merge. I haven't run the diagnostics for that problem, but I'd imagine if the overhead is already so bad for the N ~ 100 case we're probably are very close to the regime where the entire runtime is indeed dominated by pre/post timestep checks.

dmhernan · 2025-03-23T18:42:56Z

Curious what the switching radius speedup is about? By the way, if a paper/note comes out of this, I have no problem at all not being an author if I'm not contributing (which is currently the case!)

tigerchenlu98 · 2025-03-23T20:35:24Z

@dmhernan In the original TRACE implementation, the switching radius is not saved anywhere -- rather, it's recalculated for every pre- and post- timestep check. Since the switching radius depends on both particles, we do this by looping over every particle pair -- the complexity for the combined pre- and post- timestep checks thus scales as $O(N^2)$. I'm doing two things:

Calculate all the hill radius criteria at the beginning of the timestep. This avoids redundant calculations of particle pairs where one of the particles has already been looped over.
Use the hill radius criteria calculated in the pre-ts check for the post-ts check as well. I'm not sure if this breaks time-reversibility...curious to hear your take. But in practice, the error performance is fine.

Taken altogether, this makes the complexity of the switching radius calculation scale as $O(N)$. Of course, the computation of the pairwise separation is still $O(N^2)$.

dmhernan · 2025-03-23T21:47:55Z

Calculate all the hill radius criteria at the beginning of the timestep. This avoids redundant calculations of particle pairs where one of the particles has already been looped over.

OK, if I understand this correctly, only a loop of O(N) is needed because we calculate Hill radius per particle, and then the Hill radius of a pair is the maximum of either particle in the pair. This makes sense.

Use the hill radius criteria calculated in the pre-ts check for the post-ts check as well. I'm not sure if this breaks time-reversibility...curious to hear your take. But in practice, the error performance is fine.

This should not work. You could check in the simple restricted problem how well this works, but I wouldn't be optimistic. However, you COULD do the inverse and recycle post-ts Hill radius calculations for use in pre-ts for many/most of the timesteps when the step was not repeated.

dmhernan · 2025-03-23T21:52:41Z

Edit: note that all of this discussion involves O(N) algorithms, so probably not a bottleneck anyway based on my understanding. So it's not worth breaking reversibility.

hannorein · 2025-03-25T01:17:17Z

src/rebound.h


    int* current_Ks; // Tracking K_ij for the entire timestep
+    int* temp_Ks;    // temporary K array for adding/removing particles 
+    int* previous_Ks; // K_ij for last timestep, for safe mode


this one can be removed

hannorein · 2025-03-25T01:18:00Z

src/rebound.h

    struct reb_vec3d com_vel;

    int* current_Ks; // Tracking K_ij for the entire timestep
+    int* temp_Ks;    // temporary K array for adding/removing particles 


This one still needs to be removed (by using an array temporarily allocated then freed in particle.c until we can wrap our head around how to modify the array in place)

hannorein · 2025-03-25T01:18:18Z

src/rebound.h

+    int* previous_Ks; // K_ij for last timestep, for safe mode
+//    double* pairwise_v2; // Keep track of pairwise velocities between particles, for safe mode
+//    double* pairwise_qvs; // Keep track of pairwise qdotv between particles, for safe mode
+    double* dcrit6;    // temporary switching radius array for adding/removing particles 


can be removed

…on.c

tigerchenlu98 · 2025-03-25T03:49:47Z

OK 😮‍💨 I think that's everything we discussed today! @hannorein one thing that would be helpful is if you could take a look at output.c to double-check if everything we need for Simulationarchive is getting saved... I think I got it right, but good to have another set of eyes on it.

To-do list:

Tutorial Page (Advanced TRACE settings)
Unit Tests
LINE Support

dmhernan · 2025-03-25T03:59:40Z

Isn't LINE what we're already doing?

tigerchenlu98 · 2025-03-25T15:52:17Z

Isn't LINE what we're already doing?

This would be the LINE collision detection algorithm, not for close encounters. Hypothetically, this should just be a quick copy-paste.

In other news, consider me extremely confused... at David's suggestion I took a closer look at the time profiling. It turns out that the dominant chunk of time in the integration itself is the interaction step -- in a 1 minute integration, we spend 30 seconds (!!!) in the interaction step! This is a) not the case for MERCURIUS, b) not the case for the accretion example in the original TRACE paper, and c) nothing to do with the collision prescription, I did some tests with collisions completely turned off. So surely, something is funky with the setup of the problem file. I'm taking a closer look, but would welcome thoughts if anything jumps out.

hannorein · 2025-03-25T17:41:46Z

The interaction step is $O(N^2)$. So I would have thought this is going to be slow for large $N$. (The example is super long, so I don't follow everything)

tigerchenlu98 · 2025-03-25T18:42:41Z

Ah OK, I think I've figured it out -- TRACE is literally just catching less close encounters than MERCURIUS, not because of collision detection, but because the Hill radii of the particles in the example are tiny and MERCURIUS also uses a current velocity switching condition that always supersedes the Hill radius criteria. So we're integrating very little with BS with TRACE.

Long winded way of saying I think everything is indeed working as intended...

dmhernan · 2025-03-25T19:50:46Z

Hmm, sounds like it might be good to catch more close encounters with some other criterion. (This has the consequence of ejecting more particles and speeding up the simulation as well; edit: actually, I'm not certain about this, but it seems resolving encounters better might produce more scattering and ejection).

…ult TRACE example?

tigerchenlu98 · 2025-03-25T20:58:23Z

@dmhernan, I agree that'd be good to think about a more robust switching condition at some point -- but I think that's beyond the scope of the planned Note and this PR for now. Definitely a lot of further improvements that could be made on TRACE!

tigerchenlu98 · 2025-03-26T14:13:38Z

rebound/tests/test_trace.py

    yDot[0] = y[1]
    yDot[1] = -k/m*y[0]

+def collision_add_particle(sim_pointer, collision):


This test works in C, but is seems like the Python collision resolve function cannot modify the simulation? I've commented out the unit test for now, if the collision resolve function is able to add particles this should pass fine.

This should work. What's the error? (Note that you're using orbital elements - in Jacobi coordinates to be precise - when adding the new particles. This is a bit ambiguous)

Ah it's a TRACE issue -- passes fine with IAS15. The error:

.python(90395,0x2036f1200) malloc: Incorrect checksum for freed object 0x7fb54580fd78: probably modified after being freed. Corrupt value: 0x3fd34b0ce4c51c1e python(90395,0x2036f1200) malloc: *** set a breakpoint in malloc_error_break to debug Abort trap: 6

So clearly a memory issue. One thing you can do is run it with valgrind to check for this kind of stuff (I was going to do that at some point myself before merging it). It's much easier if you have a pure C version that shows the same issue.

hannorein · 2025-03-26T14:20:04Z

src/collision.c

                        double dy = gb.y - p2.y; 
                        double dz = gb.z - p2.z; 
-                        double sr = p1.r + p2.r; 
+                        double sr = p1.r + p2.r;


Can you clean up these white space changes so that we I can squash merge everything in one clean commit?

hannorein · 2025-03-26T23:40:17Z

So, I've setup a simple example where two particles collide and generate a fragment.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include "rebound.h"


int ncol = 0;

int reb_collision_resolve_fragment(struct reb_simulation* const r, struct reb_collision c){
    // Allow a maximum of 1 collision for this test.
    if (ncol>0) return 0;
    ncol++;
    
    printf("Collision occured at t= %.3f\n", r->t);

    // Add fragment
    struct reb_particle f = reb_particle_com_of_pair(r->particles[c.p1], r->particles[c.p2]);
    f.m = 1e-2;
    f.r = 0.0001;
    reb_simulation_add(r, f);

    return 0;
}


int main(int argc, char* argv[]){
    struct reb_simulation* r = reb_simulation_create();
    r->dt = 6./365.*2.0*M_PI;

    r->rand_seed = 1;
 
    r->integrator = REB_INTEGRATOR_TRACE;
    r->collision = REB_COLLISION_DIRECT;
    r->collision_resolve = reb_collision_resolve_fragment;

    reb_simulation_add_fmt(r, "m", 1.0);
    reb_simulation_add_fmt(r, "m a e r", 1.0e-3, 1.0, 0.0, 0.004778945);
    reb_simulation_add_fmt(r, "m a e r", 1.0e-3, 1.1, 0.0, 0.004778945);
    reb_simulation_move_to_com(r);

    for (int i=0; i<20; i++){
        reb_simulation_step(r);
        printf("step %-2d  t=%.3f  N=%d\n",i, r->t, r->N);
    }
    reb_simulation_free(r);
}

Let me know if I set up anything incorrectly. But I'm getting a lot of memory issues including a crash when I free the simulation. Run it with Valgrind to see all of the out of bounds errors. The errors are reproducible in very short time, so hopefully this is easily debugable.

tigerchenlu98 · 2025-03-27T10:56:27Z

I believe I (or rather, my amazing undergrad student Nina) have diagnosed the memory leak! We weren't reallocating the backup particles arrays.

And, for what it's worth in my opinion we shouldn't actually include the C example here in the merge -- it may be too complicated. I think what I wrote up in AdvTRACE should suffice. I can upload the fragmentation example to my personal github and just link it in the Note.

hannorein · 2025-03-27T12:13:30Z

Can confirm that this fixes the leak! Thanks!

hannorein · 2025-04-01T20:37:35Z

closing this for now - but obviously: don't delete your branch!

Tiger Lu added 3 commits March 10, 2025 10:58

adding particles mid-timestep

c78b988

this is PROFOUNDLY inelegant, but I think it works. Fixed current_Ks …

4ab884a

…updating for both removing and adding particles mid-timestep for TRACE

removed unneeded variables in TRACE

bad175f

Tiger Lu added 3 commits March 13, 2025 17:16

allocate memory for temp_Ks

7858757

added example

b5b5a83

clean up problem file

1377ed6

Check for collisions with central star, regardless of peri mode

1ac0b8e

free temp_Ks, and precompute switching radius each timestep

f95eede

Tiger Lu added 2 commits March 23, 2025 09:43

added TRACE safe mode, and main fragmentation example

acd855b

fix python side

fbaffad

hannorein reviewed Mar 25, 2025

View reviewed changes

Tiger Lu added 3 commits March 24, 2025 21:57

removed more fields, fixed typos

23139da

removed temp_Ks array. That was...surprisingly painless

d11d805

added simulationarchive fields and removed force_accepts from collisi…

d90e243

…on.c

Added Advanced TRACE example -- maybe this is better just in the defa…

42aed2c

…ult TRACE example?

add unit tests

55eac77

tigerchenlu98 commented Mar 26, 2025

View reviewed changes

hannorein reviewed Mar 26, 2025

View reviewed changes

Tiger Lu added 2 commits March 26, 2025 10:43

LINE support plus minor cleanup

cbd9f73

updated problem file and inputs

bdd777f

Tiger Lu added 2 commits March 27, 2025 06:25

fixed memory leak

789d461

whitespace fixes plus minor bookkeeping

929ab41

Tiger Lu added 2 commits March 27, 2025 07:41

docs update

1524b5d

whitespace

375eb96

Tiger Lu and others added 2 commits March 28, 2025 12:31

328

f57b5c4

added back sprintf statements

e966a8b

hannorein closed this Apr 1, 2025

Conversation

tigerchenlu98 commented Mar 10, 2025

Uh oh!

hannorein commented Mar 13, 2025

Uh oh!

tigerchenlu98 commented Mar 14, 2025

Uh oh!

hannorein commented Mar 20, 2025

Uh oh!

tigerchenlu98 commented Mar 20, 2025

Uh oh!

dmhernan commented Mar 20, 2025

Uh oh!

tigerchenlu98 commented Mar 20, 2025

Uh oh!

dmhernan commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tigerchenlu98 commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hannorein commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmhernan commented Mar 20, 2025

Uh oh!

hannorein commented Mar 22, 2025

Uh oh!

hannorein commented Mar 23, 2025

Uh oh!

tigerchenlu98 commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hannorein commented Mar 23, 2025

Uh oh!

hannorein commented Mar 23, 2025

Uh oh!

tigerchenlu98 commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmhernan commented Mar 23, 2025

Uh oh!

tigerchenlu98 commented Mar 23, 2025

Uh oh!

dmhernan commented Mar 23, 2025

Uh oh!

dmhernan commented Mar 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tigerchenlu98 commented Mar 25, 2025

Uh oh!

dmhernan commented Mar 25, 2025

Uh oh!

tigerchenlu98 commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hannorein commented Mar 25, 2025

Uh oh!

tigerchenlu98 commented Mar 25, 2025

Uh oh!

dmhernan commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tigerchenlu98 commented Mar 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmhernan commented Mar 20, 2025 •

edited

Loading

tigerchenlu98 commented Mar 20, 2025 •

edited

Loading

hannorein commented Mar 20, 2025 •

edited

Loading

tigerchenlu98 commented Mar 23, 2025 •

edited

Loading

tigerchenlu98 commented Mar 23, 2025 •

edited

Loading

tigerchenlu98 commented Mar 25, 2025 •

edited

Loading

dmhernan commented Mar 25, 2025 •

edited

Loading

tigerchenlu98 commented Mar 27, 2025 •

edited

Loading