The speedup if considerable. Even when using theta=0 (which means O(n^2) runtime because no aggregate is ever used) the new implementation is easily 100% faster (often around 180%). With 80 points and theta=1 I get 300% speedup, and the speedup scales nicely with the number of points (suggesting the implementation is indeed O(n log n)). Of course theta=1 produces a very different layout (more rounded in practice, with fewer long edges). using theta=0 gives exactly the same output as the old version.