Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more regularity into particle statistics #2004

Merged
merged 1 commit into from Nov 16, 2017

Conversation

gassmoeller
Copy link
Member

This adds another option to the statistical particle generation. Instead of randomly choosing cells (according to their probability distribution), we assume every cell will get exactly the number of particles it should get (i.e. the integral of the probability density function over the cell times the number of particles), and only positions of particles are choosen randomly. This seems to be the preferred option (it also seems to decrease the likelihood of empty cells later in the computation), therefore I made it the new default. I am not sure if tests will fail, but the number of particles should stay constant, positions will change. @egpuckett @hlokavarapu you will be interested in this.

@gassmoeller
Copy link
Member Author

Since so many tests actually test the particle positions, and we have no definitive proof yet that the new method is better, I reverted the changed default value, and instead added tests for both options of the new parameter. We can decide later on which value this parameter should have.

@@ -13,7 +13,7 @@ echo "Overwriting test output with reference output ..."
SRC_PATH=`dirname $0`
SRC_PATH=`cd $SRC_PATH/..;pwd`
OUT=$PWD/changes.diff
ASPECT_GENERATE_REFERENCE_OUTPUT=1 ctest -j 4 >/dev/null
ASPECT_GENERATE_REFERENCE_OUTPUT=1 ctest -j 4 -R particle_generator_random_ >/dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change on purpose?

* particle numbers per cell follow exactly the particle density,
* only the locations of particles are chosen randomly.
*/
bool random_cell_selection;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the comment. Can you document this in terms of the algorithm behind it, e.g. "first determine how many particles each cell should have based on the integral of the density over each of the cells, and then once we know how many particles we want on each cell, choose their locations randomly within each cell."

"This means particle numbers per cell can deviate statistically from "
"the integral of the probability density. If false, "
"particle numbers per cell follow exactly the particle density, "
"only the locations of particles are chosen randomly.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@egpuckett
Copy link
Contributor

@bangerth and @gassmoeller I'm not sure whether this is the best place to insert myself into the discussion or not, but here goes.

I would like to make a few general comments. Everything below refers to random particle generation and subsequent use.

  1. When and why would anyone need to start with an initially random distribution of particles. I can imagine various scenarios, but do we currently have a user community that has that need? I think that until we have a specific user with a specific need for random particle generation, it will be difficult to design the 'correct' random particle generation algorithm, in the sense that it will converge as h -> 0 and possibly other parameters become small or large (e.g., number of particles).

  2. I'm inclined to have both (a) randomly generated particles over the entire domain, without regard to the number of particles per cell, and (b) ensuring a certain number of particles per cell as a criterion for that particle generation algorithm, as choices in the parameter file.

  3. Another possibility is to randomly pick a cell, if that cell doesn't already have the # of PPC specified by the user, randomly pick a position in that cell, place a particle there, and then continue until all cells have n PPC. I am almost certain that this distribution would have a valid pdf (probability density function), but it seems to lack 'independence' in some way. I'm not used to pdfs that are 'dependent', assuming this is how they are known, but I am certain they exist. I'll ask my colleagues at a seminar on Wednesday at 4:00 on random matrices. (UCD is a hotbed of probability and particularly of dependent random variables.)

  4. The fact that the seed is hardwired troubles me. Not necessarily now, but eventually, can we make this a choice in the parameter file? Also, is the same seed used on each 'independent' parallel process? That would truly reduce the 'randomness' of the particle distributions in both algorithms 1(a) and 1(b).

Finally, this is fun stuff. I would be delighted to participate in the design of the random parts of the particle algorithms. I have some experience in this area. In my PhD thesis I proved that a stochastic ('random') method converged to the true solution of the Kolmogorov equation.

Anyway, please do include me in the discussion here.

Also, @gassmoeller I would like to hear more about the seeds, especially how you are generating initial particles positions on difference processors.

@gassmoeller gassmoeller force-pushed the extend_particle_generation branch 2 times, most recently from 48a21af to f7fb793 Compare November 14, 2017 16:44
@gassmoeller
Copy link
Member Author

Wolfgang I think I addressed all of your comments.

About Gerry's points:
0. There are probably cases where one needs such a distribution, though admittedly they might be rare. But if you are looking for a user that wants to use such a distribution I am the first one. Not so much because I prefer it over the regular distribution, but because I every now and then want to compare models to CitcomS, and this is exactly their particle generation scheme (with random cell selection set to on).

  1. by (a) you mean the old algorithm? That is still available with this PR. Though it is not distributed "over the whole domain", but rather "inside of the local domain of a process". The global domain thing would be a tricky implementation that might destroy the scaling of the algorithm, so I would really want to avoid it. Do you mean by (b) essentially the algorithm by (a) with an added minimum bound for the particles per cell value? That seams problematic to me, because then you do not longer follow the prescribed probability density at all (you shift the average number of particles per cell). With the new option of this pull request at least you are guaranteed that each cell contains as many particles as expected, and it also guarantees that no cell has less particles than the integral of the probability density function over its volume suggests.

  2. Why do you need to randomly pick a cell in this scenario? You could as well loop over all cells, and generate particles until you reach the number of PPCs in this cell (assuming the random numbers are truly random, i.e. the order of generation does not play a role). I agree in general that this is a valid alternative to our current approach. It is essentially identical, except that you prescribe the number of PPC, instead of the probability density (which integrated over the cell volume controls the number of PPC in our current approach). Of course the approaches would be the same, if the probability density, and the cell volume is constant between all cells.

  3. That is an excellent point that I obviously did not think about when first implementing this algorithm. The seed was identical on every process. I now made the seed an input parameter, and ensured that every parallel process starts with a different seed.

@egpuckett
Copy link
Contributor

@gassmoeller

That is an excellent point that I obviously did not think about when first implementing this algorithm. The seed was identical on every process. I now made the seed an input parameter, and ensured that every parallel process starts with a different seed.

That sounds good ...

by (a) you mean the old algorithm? That is still available with this PR. Though it is not distributed "over the whole domain", but rather "inside of the local domain of a process". The global domain thing would be a tricky implementation that might destroy the scaling of the algorithm, so I would really want to avoid it.

No, 1(a) was the algorithm as it was implemented last week; i.e., before the new implementation.

Do you mean by (b) essentially the algorithm by (a) with an added minimum bound for the particles per cell value?

No, by 1(b) I meant what I think you just implemented; namely:

With the new option of this pull request at least you are guaranteed that each cell contains as many particles as expected, and it also guarantees that no cell has less particles than the integral of the probability density function over its volume suggests.

With regards to

Why do you need to randomly pick a cell in this scenario? You could as well loop over all cells, and generate particles until you reach the number of PPCs in this cell (assuming the random numbers are truly random, i.e. the order of generation does not play a role). I agree in general that this is a valid alternative to our current approach. It is essentially identical, except that you prescribe the number of PPC, instead of the probability density (which integrated over the cell volume controls the number of PPC in our current approach). Of course the approaches would be the same, if the probability density, and the cell volume is constant between all cells.

This is just a suggestion, not something I think absolutely needs to be done. My idea was to add additional randomness to the procedure. I neglected to specify two things:

(A) Just pick a cell on the current processor at random.

(B) Continue to assign particles to a cell based on its area / volume as you are currently doing. (I was thinking only of cells with the same area / volume.

It is essentially identical, except that you prescribe the number of PPC, instead of the probability density (which integrated over the cell volume controls the number of PPC in our current approach). Of course the approaches would be the same, if the probability density, and the cell volume is constant between all cells.

Again, just an idea ...

@bangerth
Copy link
Contributor

bangerth commented Nov 15, 2017 via email

@gassmoeller
Copy link
Member Author

Ok, so can we conclude that the new method in this PR is a valid alternative to the old one (both are now selectable in the input file), and that we have some ideas for further alternatives at a later time?
I updated the tests and verified that only tests with either of the following conditions change:

  • they run in parallel and use the random uniform generator (now every process uses its own seed)
  • they run in parallel and use some form of particle population management (they also use the random number generator, and now have different seeds)

From my side this PR is then ready to merge.

@bangerth bangerth merged commit 9f8710c into geodynamics:master Nov 16, 2017
@bangerth
Copy link
Contributor

Looks great, thanks for the work on this!

@gassmoeller gassmoeller deleted the extend_particle_generation branch January 26, 2018 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants