Add more regularity into particle statistics #2004

gassmoeller · 2017-11-13T18:32:32Z

This adds another option to the statistical particle generation. Instead of randomly choosing cells (according to their probability distribution), we assume every cell will get exactly the number of particles it should get (i.e. the integral of the probability density function over the cell times the number of particles), and only positions of particles are choosen randomly. This seems to be the preferred option (it also seems to decrease the likelihood of empty cells later in the computation), therefore I made it the new default. I am not sure if tests will fail, but the number of particles should stay constant, positions will change. @egpuckett @hlokavarapu you will be interested in this.

gassmoeller · 2017-11-13T20:26:05Z

Since so many tests actually test the particle positions, and we have no definitive proof yet that the new method is better, I reverted the changed default value, and instead added tests for both options of the new parameter. We can decide later on which value this parameter should have.

bangerth · 2017-11-14T00:32:01Z

cmake/generate_reference_output.sh

@@ -13,7 +13,7 @@ echo "Overwriting test output with reference output ..."
 SRC_PATH=`dirname $0`
 SRC_PATH=`cd $SRC_PATH/..;pwd`
 OUT=$PWD/changes.diff
-ASPECT_GENERATE_REFERENCE_OUTPUT=1 ctest -j 4 >/dev/null
+ASPECT_GENERATE_REFERENCE_OUTPUT=1 ctest -j 4 -R particle_generator_random_ >/dev/null


Is this change on purpose?

bangerth · 2017-11-14T00:33:39Z

include/aspect/particle/generator/probability_density_function.h

+           * particle numbers per cell follow exactly the particle density,
+           * only the locations of particles are chosen randomly.
+           */
+          bool random_cell_selection;


I don't think I understand the comment. Can you document this in terms of the algorithm behind it, e.g. "first determine how many particles each cell should have based on the integral of the density over each of the cells, and then once we know how many particles we want on each cell, choose their locations randomly within each cell."

bangerth · 2017-11-14T00:34:02Z

source/particle/generator/probability_density_function.cc

+                                   "This means particle numbers per cell can deviate statistically from "
+                                   "the integral of the probability density. If false, "
+                                   "particle numbers per cell follow exactly the particle density, "
+                                   "only the locations of particles are chosen randomly.");


egpuckett · 2017-11-14T03:48:15Z

@bangerth and @gassmoeller I'm not sure whether this is the best place to insert myself into the discussion or not, but here goes.

I would like to make a few general comments. Everything below refers to random particle generation and subsequent use.

When and why would anyone need to start with an initially random distribution of particles. I can imagine various scenarios, but do we currently have a user community that has that need? I think that until we have a specific user with a specific need for random particle generation, it will be difficult to design the 'correct' random particle generation algorithm, in the sense that it will converge as h -> 0 and possibly other parameters become small or large (e.g., number of particles).
I'm inclined to have both (a) randomly generated particles over the entire domain, without regard to the number of particles per cell, and (b) ensuring a certain number of particles per cell as a criterion for that particle generation algorithm, as choices in the parameter file.
Another possibility is to randomly pick a cell, if that cell doesn't already have the # of PPC specified by the user, randomly pick a position in that cell, place a particle there, and then continue until all cells have n PPC. I am almost certain that this distribution would have a valid pdf (probability density function), but it seems to lack 'independence' in some way. I'm not used to pdfs that are 'dependent', assuming this is how they are known, but I am certain they exist. I'll ask my colleagues at a seminar on Wednesday at 4:00 on random matrices. (UCD is a hotbed of probability and particularly of dependent random variables.)
The fact that the seed is hardwired troubles me. Not necessarily now, but eventually, can we make this a choice in the parameter file? Also, is the same seed used on each 'independent' parallel process? That would truly reduce the 'randomness' of the particle distributions in both algorithms 1(a) and 1(b).

Finally, this is fun stuff. I would be delighted to participate in the design of the random parts of the particle algorithms. I have some experience in this area. In my PhD thesis I proved that a stochastic ('random') method converged to the true solution of the Kolmogorov equation.

Anyway, please do include me in the discussion here.

Also, @gassmoeller I would like to hear more about the seeds, especially how you are generating initial particles positions on difference processors.

gassmoeller · 2017-11-14T17:00:12Z

Wolfgang I think I addressed all of your comments.

About Gerry's points:
0. There are probably cases where one needs such a distribution, though admittedly they might be rare. But if you are looking for a user that wants to use such a distribution I am the first one. Not so much because I prefer it over the regular distribution, but because I every now and then want to compare models to CitcomS, and this is exactly their particle generation scheme (with random cell selection set to on).

by (a) you mean the old algorithm? That is still available with this PR. Though it is not distributed "over the whole domain", but rather "inside of the local domain of a process". The global domain thing would be a tricky implementation that might destroy the scaling of the algorithm, so I would really want to avoid it. Do you mean by (b) essentially the algorithm by (a) with an added minimum bound for the particles per cell value? That seams problematic to me, because then you do not longer follow the prescribed probability density at all (you shift the average number of particles per cell). With the new option of this pull request at least you are guaranteed that each cell contains as many particles as expected, and it also guarantees that no cell has less particles than the integral of the probability density function over its volume suggests.
Why do you need to randomly pick a cell in this scenario? You could as well loop over all cells, and generate particles until you reach the number of PPCs in this cell (assuming the random numbers are truly random, i.e. the order of generation does not play a role). I agree in general that this is a valid alternative to our current approach. It is essentially identical, except that you prescribe the number of PPC, instead of the probability density (which integrated over the cell volume controls the number of PPC in our current approach). Of course the approaches would be the same, if the probability density, and the cell volume is constant between all cells.
That is an excellent point that I obviously did not think about when first implementing this algorithm. The seed was identical on every process. I now made the seed an input parameter, and ensured that every parallel process starts with a different seed.

egpuckett · 2017-11-14T18:22:29Z

@gassmoeller

That is an excellent point that I obviously did not think about when first implementing this algorithm. The seed was identical on every process. I now made the seed an input parameter, and ensured that every parallel process starts with a different seed.

That sounds good ...

by (a) you mean the old algorithm? That is still available with this PR. Though it is not distributed "over the whole domain", but rather "inside of the local domain of a process". The global domain thing would be a tricky implementation that might destroy the scaling of the algorithm, so I would really want to avoid it.

No, 1(a) was the algorithm as it was implemented last week; i.e., before the new implementation.

Do you mean by (b) essentially the algorithm by (a) with an added minimum bound for the particles per cell value?

No, by 1(b) I meant what I think you just implemented; namely:

With the new option of this pull request at least you are guaranteed that each cell contains as many particles as expected, and it also guarantees that no cell has less particles than the integral of the probability density function over its volume suggests.

With regards to

Why do you need to randomly pick a cell in this scenario? You could as well loop over all cells, and generate particles until you reach the number of PPCs in this cell (assuming the random numbers are truly random, i.e. the order of generation does not play a role). I agree in general that this is a valid alternative to our current approach. It is essentially identical, except that you prescribe the number of PPC, instead of the probability density (which integrated over the cell volume controls the number of PPC in our current approach). Of course the approaches would be the same, if the probability density, and the cell volume is constant between all cells.

This is just a suggestion, not something I think absolutely needs to be done. My idea was to add additional randomness to the procedure. I neglected to specify two things:

(A) Just pick a cell on the current processor at random.

(B) Continue to assign particles to a cell based on its area / volume as you are currently doing. (I was thinking only of cells with the same area / volume.

It is essentially identical, except that you prescribe the number of PPC, instead of the probability density (which integrated over the cell volume controls the number of PPC in our current approach). Of course the approaches would be the same, if the probability density, and the cell volume is constant between all cells.

Again, just an idea ...

bangerth · 2017-11-15T14:47:46Z

When and why would anyone /*need*/ to start with an initially random distribution of particles. I can imagine various scenarios, but do we currently have a user community that has that need? I think that until we have a specific user with a specific need for random particle generation, it will be difficult to design the 'correct' random particle generation algorithm, in the sense that it will converge as h -> 0 and possibly other parameters become small or large (e.g., number of particles).

If you think of cases where you want to treat particles passively (for example just to track where material goes), then there is really no reason to associate particles (or the number of particles) with cells. Whether a cell is large or small, or whether a cell already has a large number of particles or not, is not important -- you just want particles *somewhere*. If you want to do statistics, then choosing them randomly located within the domain, regardless of a mesh, is the right choice. @gassmoeller:

Though it is not distributed "over the whole domain", but rather "inside

of the local domain of a process". The global domain thing would be a tricky implementation that might destroy the scaling of the algorithm, so I would really want to avoid it. I think this would actually quite simple to implement, and suspect that in practice also won't have much of an effect on the scalability of the algorithm. Come talk to me sometime if you're curious :)

gassmoeller · 2017-11-15T16:31:12Z

Ok, so can we conclude that the new method in this PR is a valid alternative to the old one (both are now selectable in the input file), and that we have some ideas for further alternatives at a later time?
I updated the tests and verified that only tests with either of the following conditions change:

they run in parallel and use the random uniform generator (now every process uses its own seed)
they run in parallel and use some form of particle population management (they also use the random number generator, and now have different seeds)

From my side this PR is then ready to merge.

bangerth · 2017-11-16T03:57:11Z

Looks great, thanks for the work on this!

gassmoeller force-pushed the extend_particle_generation branch from 22e6c03 to 77b7e6e Compare November 13, 2017 20:24

bangerth reviewed Nov 14, 2017

View reviewed changes

gassmoeller force-pushed the extend_particle_generation branch 2 times, most recently from 48a21af to f7fb793 Compare November 14, 2017 16:44

Add more regularity into particle statistics

e83085a

gassmoeller force-pushed the extend_particle_generation branch from f7fb793 to e83085a Compare November 15, 2017 16:25

bangerth merged commit 9f8710c into geodynamics:master Nov 16, 2017

gassmoeller deleted the extend_particle_generation branch January 26, 2018 07:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more regularity into particle statistics #2004

Add more regularity into particle statistics #2004

gassmoeller commented Nov 13, 2017

gassmoeller commented Nov 13, 2017

bangerth Nov 14, 2017

bangerth Nov 14, 2017

bangerth Nov 14, 2017

egpuckett commented Nov 14, 2017

gassmoeller commented Nov 14, 2017

egpuckett commented Nov 14, 2017

bangerth commented Nov 15, 2017 via email

gassmoeller commented Nov 15, 2017

bangerth commented Nov 16, 2017

Add more regularity into particle statistics #2004

Add more regularity into particle statistics #2004

Conversation

gassmoeller commented Nov 13, 2017

gassmoeller commented Nov 13, 2017

bangerth Nov 14, 2017

Choose a reason for hiding this comment

bangerth Nov 14, 2017

Choose a reason for hiding this comment

bangerth Nov 14, 2017

Choose a reason for hiding this comment

egpuckett commented Nov 14, 2017

gassmoeller commented Nov 14, 2017

egpuckett commented Nov 14, 2017

bangerth commented Nov 15, 2017 via email

gassmoeller commented Nov 15, 2017

bangerth commented Nov 16, 2017