Skip to content

Improvement to Free-Slip Erosion

Pre-release
Pre-release

Choose a tag to compare

@stephen-hqxu stephen-hqxu released this 10 Apr 12:42

Release 0.13.0-Preview.1

Free-slip logic

Previously it has been believed that 2D memory copy function is slower than linear copy, i.e., cudaMemcpy2D() is supposed to be slower than cudaMemcpy(). However after spending some time on benchmarking they in fact perform exactly the same, and 2D copy is even marginally faster than linear copy; perhaps that's due to some internal optimisation done by CUDA driver.

Hence, it is sensible to pack the free-slip buffer using 2D copy than the old "linear copy + global-local index table" combo. Therefore:

  • Deprecate and remove completely the index table logic from all generation pipeline stages which use free-slip logic, such as single histogram filter and hydraulic erosion.
  • Modify STPFreeSlipTextureBuffer to use 2D copy for merging free-slip texture.
  • Remove STPFreeSlipGenerator and global-local index table generation kernel in STPHeightfieldKernel.
  • Remove STPFreeSlipManager.
  • Remove global-local index table entry from STPFreeSlipInformation (renamed from STPFreeSlipData).
  • Merge STPFreeSlipLocation into STPFreeSlipTextureBuffer.
  • Put all the rests of the free-slip classes to Chunk directory and remove FreeSlip directory.

This has improved the performance in all free-slip-related heightfield generators, especially for erosion, by at most 10 times. It has shown that the index lookup table approach is slow.

General fixes and improvement

  • For hydraulic erosion, enable atomic instruction on release build considering for both data safety and performance. On debug build it is too slow and make the program non-debuggable hence we trade data safety with speed.
  • Deprecate and remove multi-channel texture support for STPFreeSlipTextureBuffer because it is useless.
  • Fix an error on some compilers during compilation of STPSingleHistogramFilter which report std::max() has not found overload for size_t and unsigned long long.
  • Merge STPSeedMixer with STPLayer.