TRACE: mid-timestep adding particles#821
TRACE: mid-timestep adding particles#821tigerchenlu98 wants to merge 25 commits intohannorein:mainfrom
Conversation
…updating for both removing and adding particles mid-timestep for TRACE
|
Thanks for looking into this. This mostly makes sense to me but I haven't looked in great detail at the logic yet (this is complicated!). |
|
Yep, the logic ended up being a lot more irritating than I thought it would be... but it seems to be working now!! I'm sure there's a better/more efficient way to do it than my implementation though, so I'd be happy to chat over zoom about what I did if you'd like @hannorein I uploaded a C example with a simple collision prescription I've used to test things. This should really be a unit test, but I couldn't figure out how to scale out the energy offset here without accessing some back-end fields that were easier to do in C... that's on the to-do. But we do very well, keeping the error to around 1e-5 -- better than MERCURIUS! I'm pretty excited about this -- a lot of people have been requesting this since allowing for fragmentation seems to be a big deal for early planet formation simulations, which is now a great use case for TRACE. I'll certainly do some much more extensive testing, but after that I think this may be worth writing up in a quick research note in RNAAS to let people know of the new capabilities. |
|
It looks like the new |
|
Fixed, good catch! One note here: I was pretty disappointed in TRACE's speed for this problem (we're faster than MERCURIUS, but only by about a factor of ~2) so I ran some profiling. It turns out that around ~60% of the runtime was being taken up by the pre/post timestep checks, which scale as O(N^2)! Precomputing the switching radii shaves off about ~10%, but I wonder if we can still do better. There's certainly no shortcuts if we want to do it fully robustly -- we can't get around computing pairwise distances every timestep, and in terms of absolute time we spend as much time in pre/post_ts_check as MERCURIUS does in encounter_predict, so probably no speedups on offer there. But I wonder if there's a way to prune particles unlikely to be in close encounters, such that we don't compute switching radii at all for some... food for thought. |
|
Is it 60% of the time being used for the close encounter prediction alone? |
yes! Nothing to do with collisions at all, this inefficiency was always present -- we just never tested TRACE with a system with large enough N for this to be noticeable |
|
So the encounter prediction is done pre and post step, right? I think the first obvious thing is that the post step prediction could get recycled for use in the next step most of the time without further work, which would reduce compute burden (I guess at best to ~40%). |
Good idea! I think that'll mean we need to save a lot more data for SimulationArchive, but hopefully it'll be worth it if the speedup is big enough. Ahh @dmhernan but it's actually a bit trickier, because TRACE should support all the REBOUNDx effects too, which would change particle positions between timesteps... maybe some sort of safe mode can be implemented again |
|
It's very scenario dependent. The |
|
Ah, something like a tree method sounds promising. I would start with trying to recycle work when possible since the default case is probably recalculating many pairwise distances. |
|
Sorry for being slow on this one. I'm having a hard time following all the logic! Comments in random order:
|
|
@tigerchenlu98, I see you're implementing more things, I assume to address some of the issues. But before you go much further, note that there are already 31 (!) variables and arrays defined just for internal use by TRACE. I'm not quite comfortable with this level of complexity. Someone wanting to use trace will not be able to find the right options. I think we should take a step back and think about how to simplify things. For example, having to define an extra array for a 10% speedup is not worth it in my opinion. |
|
Hi @hannorein -- totally fair, and my apologies, I well may be getting overzealous! To address your questions first:
The specific science case I'm thinking of is described in the research note -- essentially its a protoplanetary disk of ~150 planetesimals with realistic collisions. The code itself was written by Haniyeh Tajer, and I've just added it to the commit.
Oh I don't plan on including that example in the final release -- we should definitely remove it, probably in favor of the example in the note. Everything you're asking about was just my extremely hack-y way to scale out the energy of the collision to assess if TRACE was handling the post-collision dynamics well.
Ah, but it doesn't need to be updated -- anytime a particle gets added or removed, we force accept the step anyway and post_ts_check is never called! To make sure we're on the same page, I'm mostly motivated by these large N disk simulations. I think this could be a great application for TRACE, and I'm hoping we can deliver something closer to the 10x improvement we promise over MERCURIUS. The improvements I have/had in mind:
But also note that the numbers I reported are all for the N~150 case, and they'll represent even larger chunks of the runtime a N increases. If complexity on the user side is a concern, I'd be happy to write up a help page a.la Advanced WHFAST Settings. But of course I'll defer to your judgement for what you think is worth it in the first place -- let me know! |
|
Thanks for the clarifications. So there are two separate issues:
|
|
(Also happy to connect via zoom if that's easier to address to bigger questions) |
|
Hey @hannorein, probably easiest to go over this on zoom. What's your availability looking like? I'm free after 1 PM tomorrow or 2 PM on Tuesday. I'll throw out 2 PM tomorrow, but let me know if that's not convenient for you. I'd imagine N ~ 1000 is the most we'll want to try to support. That's how many particles we use in Section 5.4 of the original TRACE paper, and its only computationally tractable initially because so many of the particles immediately collide and merge. I haven't run the diagnostics for that problem, but I'd imagine if the overhead is already so bad for the N ~ 100 case we're probably are very close to the regime where the entire runtime is indeed dominated by pre/post timestep checks. |
|
Curious what the switching radius speedup is about? By the way, if a paper/note comes out of this, I have no problem at all not being an author if I'm not contributing (which is currently the case!) |
|
@dmhernan In the original TRACE implementation, the switching radius is not saved anywhere -- rather, it's recalculated for every pre- and post- timestep check. Since the switching radius depends on both particles, we do this by looping over every particle pair -- the complexity for the combined pre- and post- timestep checks thus scales as
Taken altogether, this makes the complexity of the switching radius calculation scale as |
OK, if I understand this correctly, only a loop of O(N) is needed because we calculate Hill radius per particle, and then the Hill radius of a pair is the maximum of either particle in the pair. This makes sense.
This should not work. You could check in the simple restricted problem how well this works, but I wouldn't be optimistic. However, you COULD do the inverse and recycle post-ts Hill radius calculations for use in pre-ts for many/most of the timesteps when the step was not repeated. |
|
Edit: note that all of this discussion involves O(N) algorithms, so probably not a bottleneck anyway based on my understanding. So it's not worth breaking reversibility. |
src/rebound.h
Outdated
|
|
||
| int* current_Ks; // Tracking K_ij for the entire timestep | ||
| int* temp_Ks; // temporary K array for adding/removing particles | ||
| int* previous_Ks; // K_ij for last timestep, for safe mode |
src/rebound.h
Outdated
| struct reb_vec3d com_vel; | ||
|
|
||
| int* current_Ks; // Tracking K_ij for the entire timestep | ||
| int* temp_Ks; // temporary K array for adding/removing particles |
There was a problem hiding this comment.
This one still needs to be removed (by using an array temporarily allocated then freed in particle.c until we can wrap our head around how to modify the array in place)
src/rebound.h
Outdated
| int* previous_Ks; // K_ij for last timestep, for safe mode | ||
| // double* pairwise_v2; // Keep track of pairwise velocities between particles, for safe mode | ||
| // double* pairwise_qvs; // Keep track of pairwise qdotv between particles, for safe mode | ||
| double* dcrit6; // temporary switching radius array for adding/removing particles |
|
OK 😮💨 I think that's everything we discussed today! @hannorein one thing that would be helpful is if you could take a look at output.c to double-check if everything we need for Simulationarchive is getting saved... I think I got it right, but good to have another set of eyes on it. To-do list:
|
|
Isn't LINE what we're already doing? |
This would be the LINE collision detection algorithm, not for close encounters. Hypothetically, this should just be a quick copy-paste. In other news, consider me extremely confused... at David's suggestion I took a closer look at the time profiling. It turns out that the dominant chunk of time in the integration itself is the interaction step -- in a 1 minute integration, we spend 30 seconds (!!!) in the interaction step! This is a) not the case for MERCURIUS, b) not the case for the accretion example in the original TRACE paper, and c) nothing to do with the collision prescription, I did some tests with collisions completely turned off. So surely, something is funky with the setup of the problem file. I'm taking a closer look, but would welcome thoughts if anything jumps out. |
|
The interaction step is |
|
Ah OK, I think I've figured it out -- TRACE is literally just catching less close encounters than MERCURIUS, not because of collision detection, but because the Hill radii of the particles in the example are tiny and MERCURIUS also uses a current velocity switching condition that always supersedes the Hill radius criteria. So we're integrating very little with BS with TRACE. Long winded way of saying I think everything is indeed working as intended... |
|
Hmm, sounds like it might be good to catch more close encounters with some other criterion. (This has the consequence of ejecting more particles and speeding up the simulation as well; edit: actually, I'm not certain about this, but it seems resolving encounters better might produce more scattering and ejection). |
…ult TRACE example?
|
@dmhernan, I agree that'd be good to think about a more robust switching condition at some point -- but I think that's beyond the scope of the planned Note and this PR for now. Definitely a lot of further improvements that could be made on TRACE! |
| yDot[0] = y[1] | ||
| yDot[1] = -k/m*y[0] | ||
|
|
||
| def collision_add_particle(sim_pointer, collision): |
There was a problem hiding this comment.
This test works in C, but is seems like the Python collision resolve function cannot modify the simulation? I've commented out the unit test for now, if the collision resolve function is able to add particles this should pass fine.
There was a problem hiding this comment.
This should work. What's the error? (Note that you're using orbital elements - in Jacobi coordinates to be precise - when adding the new particles. This is a bit ambiguous)
There was a problem hiding this comment.
Ah it's a TRACE issue -- passes fine with IAS15. The error:
.python(90395,0x2036f1200) malloc: Incorrect checksum for freed object 0x7fb54580fd78: probably modified after being freed. Corrupt value: 0x3fd34b0ce4c51c1e python(90395,0x2036f1200) malloc: *** set a breakpoint in malloc_error_break to debug Abort trap: 6
There was a problem hiding this comment.
So clearly a memory issue. One thing you can do is run it with valgrind to check for this kind of stuff (I was going to do that at some point myself before merging it). It's much easier if you have a pure C version that shows the same issue.
src/collision.c
Outdated
| double dy = gb.y - p2.y; | ||
| double dz = gb.z - p2.z; | ||
| double sr = p1.r + p2.r; | ||
| double sr = p1.r + p2.r; |
There was a problem hiding this comment.
Can you clean up these white space changes so that we I can squash merge everything in one clean commit?
|
So, I've setup a simple example where two particles collide and generate a fragment. #include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include "rebound.h"
int ncol = 0;
int reb_collision_resolve_fragment(struct reb_simulation* const r, struct reb_collision c){
// Allow a maximum of 1 collision for this test.
if (ncol>0) return 0;
ncol++;
printf("Collision occured at t= %.3f\n", r->t);
// Add fragment
struct reb_particle f = reb_particle_com_of_pair(r->particles[c.p1], r->particles[c.p2]);
f.m = 1e-2;
f.r = 0.0001;
reb_simulation_add(r, f);
return 0;
}
int main(int argc, char* argv[]){
struct reb_simulation* r = reb_simulation_create();
r->dt = 6./365.*2.0*M_PI;
r->rand_seed = 1;
r->integrator = REB_INTEGRATOR_TRACE;
r->collision = REB_COLLISION_DIRECT;
r->collision_resolve = reb_collision_resolve_fragment;
reb_simulation_add_fmt(r, "m", 1.0);
reb_simulation_add_fmt(r, "m a e r", 1.0e-3, 1.0, 0.0, 0.004778945);
reb_simulation_add_fmt(r, "m a e r", 1.0e-3, 1.1, 0.0, 0.004778945);
reb_simulation_move_to_com(r);
for (int i=0; i<20; i++){
reb_simulation_step(r);
printf("step %-2d t=%.3f N=%d\n",i, r->t, r->N);
}
reb_simulation_free(r);
}Let me know if I set up anything incorrectly. But I'm getting a lot of memory issues including a crash when I free the simulation. Run it with Valgrind to see all of the out of bounds errors. The errors are reproducible in very short time, so hopefully this is easily debugable. |
|
I believe I (or rather, my amazing undergrad student Nina) have diagnosed the memory leak! We weren't reallocating the backup particles arrays. And, for what it's worth in my opinion we shouldn't actually include the C example here in the merge -- it may be too complicated. I think what I wrote up in AdvTRACE should suffice. I can upload the fragmentation example to my personal github and just link it in the Note. |
|
Can confirm that this fixes the leak! Thanks! |
|
closing this for now - but obviously: don't delete your branch! |
Trying to address #810 -- this is definitely far from merge-ready, but wanted to get this going in case either of you have immediate thoughts @hannorein @dmhernan
Just to recap -- this is just so we can add particles mid-timestep, for example if the user implements a collision routine that can generate fragments. I'd thought that all we'd need to do is to add a flag such that we auto-accept the step. But the reshuffling of the current_Ks array has proven a little trickier than I anticipated.