Improvement to Free-Slip Erosion
Pre-release
Pre-release
Release 0.13.0-Preview.1
Free-slip logic
Previously it has been believed that 2D memory copy function is slower than linear copy, i.e., cudaMemcpy2D() is supposed to be slower than cudaMemcpy(). However after spending some time on benchmarking they in fact perform exactly the same, and 2D copy is even marginally faster than linear copy; perhaps that's due to some internal optimisation done by CUDA driver.
Hence, it is sensible to pack the free-slip buffer using 2D copy than the old "linear copy + global-local index table" combo. Therefore:
- Deprecate and remove completely the index table logic from all generation pipeline stages which use free-slip logic, such as single histogram filter and hydraulic erosion.
- Modify
STPFreeSlipTextureBufferto use 2D copy for merging free-slip texture. - Remove
STPFreeSlipGeneratorand global-local index table generation kernel inSTPHeightfieldKernel. - Remove
STPFreeSlipManager. - Remove global-local index table entry from
STPFreeSlipInformation(renamed fromSTPFreeSlipData). - Merge
STPFreeSlipLocationintoSTPFreeSlipTextureBuffer. - Put all the rests of the free-slip classes to Chunk directory and remove FreeSlip directory.
This has improved the performance in all free-slip-related heightfield generators, especially for erosion, by at most 10 times. It has shown that the index lookup table approach is slow.
General fixes and improvement
- For hydraulic erosion, enable atomic instruction on release build considering for both data safety and performance. On debug build it is too slow and make the program non-debuggable hence we trade data safety with speed.
- Deprecate and remove multi-channel texture support for
STPFreeSlipTextureBufferbecause it is useless. - Fix an error on some compilers during compilation of
STPSingleHistogramFilterwhich reportstd::max()has not found overload forsize_tandunsigned long long. - Merge
STPSeedMixerwithSTPLayer.