This repo provides complementary demonstrations for a web article found on the web, by Christos Argyropoulos.
The Quest for Performance Part II : Perl vs Python
The inspiration came from reading Comparison of various GPU acceleration frameworks using matrix-vector multiplication, by Thomas Germer. If you're trying Mr. Germer's repo, comment out the line ti.loop_config(block_dim=N)
for better performance on desktop GPUs. More over, set block_size = 32
. I ran with m, n = 8192, 8192
.