cltorch-benchmarking

cltorch benchmarking, for evaluating where to focus optimization effort

This is cltorch-specific for now, though if someone wants to make it more general, I'm happy to change the name, eg to torch-benchmarking :-)

Current direction is to measure why char-rnn runs really slowly, on opencl, on certain devices. Examples of things to check:

test_launch: measure kernel launch times, by adding 1 to a constant-sized array (about 100MB), and varying the number of kernel launches used
test_apply1: varies vector size, float vs float4. varies operation used, ie + vs -, exp, etc
test_apply1b: varying operation, as test_apply1, but adds an additional temporary variable out
test_applystrided: (in progress) mix up the memory access a bit, and/or add an inner loop over dimensions (tbd)

To build

pre-requisites:

EasyCL installed, using make -j 4 install, into ~/git/EasyCL/dist (ie install easycl, with a CMAKE_INSTALL_PREFIX of [your home directory]/git/EasyCL/dist
cmake and ccmake installed
gcc, g++ etc

method

git clone https://github.com/hughperkins/cltorch-benchmarking.git
cd cltorch-benchmarking
mkdir build
cd build
cmake ..
make -j 4

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
results		results
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
test_apply1.cpp		test_apply1.cpp
test_apply1b.cpp		test_apply1b.cpp
test_apply3.cpp		test_apply3.cpp
test_apply3_flat.cpp		test_apply3_flat.cpp
test_apply3_perclt.cpp		test_apply3_perclt.cpp
test_apply3_singleinfosbuf.cpp		test_apply3_singleinfosbuf.cpp
test_applystrided.cpp		test_applystrided.cpp
test_launch.cpp		test_launch.cpp
test_privatebuffer.cpp		test_privatebuffer.cpp
test_workgroupsize.cpp		test_workgroupsize.cpp