New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add capability to store Monte Carlo sample data in parallel #13906
Conversation
f5823db
to
dea3f22
Compare
Job Documentation on d666216 wanted to post the following: View the site here This comment will be updated on new commits. |
@aeslaughter could you take a look at this PR? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't prefer this design as it restricts the distributed version to a very specific case where the number of matrices match the number or processors. It also introduces two new virtual methods to implement.
My goal for the Sampler object is for the number of matrices and rows per matrix to be arbitrary. I don't want to run into a case in the future were are rigid design of the matrix and row structure makes it difficult to implement.
I am open to discuss this further. Hopefully we can get something that is general but allows you to meet your needs.
What if "reinit" accepted size pairs for the number of rows and columns for each matrix and an overloaded version for when they are all the same:
reinit( (10,2), (11,3), ...)
reinit(3, (10, 10))
Also, the distribution should can just break matrices wherever needed and we can provide a smart
iterator: localSampleRowBegin(), localSampleRowEnd() that would allow each row to be accessed for on the local process.
...ols/test/tests/multiapps/batch_commandline_control/gold/master_multiple_out_storage_0002.csv
Show resolved
Hide resolved
Another possibility is that I am trying to be too general, in that case we should just remove the multiple DenseMatrices and just use one. This would allow a single offset to be stored for distributing the data. |
The current implementation should allow the number of rows/matrices to be different from the number of processors? Maybe I misunderstand what said about "specific case where the number of matrices match the number or processors" If you agree to have just one DenseMatrix, it will be easier to implement at least for now. Could you explain to me how you would like the DenseMatirx to be distributed across all processors? Do you think the use of linear partitioning of number of rows is not a good idea? |
@bwspenc since you are also interested in this for Grizzly application, so I am tagging you to join in our discussion. |
dea3f22
to
61f73c9
Compare
61f73c9
to
acdbcca
Compare
@jiangwen84 I misunderstood what you were doing with the two getTotal... methods. Also, the distribution looks fine. I just went through this too quick, sorry about that. The only thing don't like about this is two function overrides. Can we go with setters instead that are called from the constructor of MonteCarloSampler. The setTotalNumberOfMatrices(42);
setTotalNumberOfSamples(10000); |
acdbcca
to
9035ca0
Compare
9035ca0
to
3778376
Compare
@aeslaughter Can you take another look at? I made the sample matrix partitioning consistent with multi-app batch. I also added a new option in MCSampler to reuse the computed sample matrix because recompute might take a long time when there are more than 100million samples. Thanks! |
3778376
to
87612f6
Compare
87612f6
to
d666216
Compare
@aeslaughter any thoughts? Thanks. |
I like where this is going, I am going to add a few things to this rather than try to explain what I am thinking. |
(refs idaholab#13906) The test removed isn't needed, it is executed with threads by the test system
(refs idaholab#13906) The test removed isn't needed, it is executed with threads by the test system
(refs idaholab#13906) The test removed isn't needed, it is executed with threads by the test system
Please do not close ref #13879.