Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce large floating point accumulation error in high photon simulations #41

Closed
fangq opened this issue Jul 20, 2018 · 1 comment
Closed
Assignees
Labels

Comments

@fangq
Copy link
Owner

fangq commented Jul 20, 2018

MCX has been using atomic operations for fluence accumulation by default since a few years ago. However, a drop in fluence intensity in large photon simulations has been observed. For example, running the below script using the current MCX github code, you can get the below plot

clear cfg
cfg.vol=uint8(ones(60,60,60));
cfg.srcpos=[30 30 1];
cfg.srcdir=[0 0 1];
cfg.gpuid=1;
cfg.autopilot=1;
cfg.prop=[0 0 1 1;0.005 1 0 1.37];
cfg.tstart=0;
cfg.tend=5e-9;
cfg.tstep=cfg.tend;

figure
hold on;
for i=6:9
    cfg.nphoton=10^i;
    flux=mcxlab(cfg);
    plot(log10(abs(squeeze(flux.data(30,30,:)))),'-','color', rand(1,3));
end

accum_error_old

The reason for the drop in intensity was not due to data racing, like the case when non-atomic operations were used, but the accumulations of the round-off errors. In the region near the source, the energy deposit quickly increases to a large value. When adding a new energy deposit (which is a very small value) on top of a large value, the accuracy becomes a problem.

This is a serious problem because, with the increase of GPU computing capacity, most people would choose to run large photon simulations. We must be able to run large photon numbers without loosing accuracy.

There are a few solutions to such problem.

The easiest solution is to change the energy storage to double. However, consumer GPUs have extremely poor double performance, so moving to double precision addition can likely lead to drop in speed.

The standard way to sum a small values with a large floating point value is the Kahan summation. This is what we used in MMC. However, this requires multiple step operations with additional storage space. When combining with the atomic operation, atomic Kahan summation is very difficult to be implemented in the GPU.

Another idea is to use repetitions (-r) to split a large simulation into smaller chunks, and sum the solutions together. For example, for 1e9 photons with 10 respin, we run 10x 10^8 photon simulations. This can reduce the round-off error, but the repeated launch of the kernel causes a large overhead, sometimes, significantly higher than the kernel execution itself. In addition, even simulate at 1e8 photons, from the above plot, the drop in intensity remains noticeable.

A robust method is needed to obtain stable and converging solution especially at large photon numbers.

@fangq
Copy link
Owner Author

fangq commented Jul 21, 2018

This issue is now fixed. After the fix, the result can be seen at

accum_error

the simulation speed is nearly the same (or only 1-2% drop). With this fix, I can finally release mcx 1.0 without major concerns :-)

fangq added a commit that referenced this issue Jan 15, 2019
fix photon sharing normalization and issue #41 for WP/DCS output
jdtatz pushed a commit to jdtatz/mcx that referenced this issue Jul 15, 2020
fix photon sharing normalization and issue fangq#41 for WP/DCS output
ShijieYan added a commit to ShijieYan/mmc that referenced this issue Sep 21, 2022
fangq added a commit to fangq/mmc that referenced this issue Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant