## Simple example -- pairwise distance

Suppose we have `N` vectors in $\mathbb{R}^3$, stored as a $3 \times N$ matrix and labeled $x_1 \in \mathbb{R}^3$, $x_2 \in \mathbb{R}^3$, ..., $x_N \in \mathbb{R}^3$

Suppose we also have a metric function $\rho: \mathbb{R}^3 \times \mathbb{R}^3 \rightarrow \mathbb{R}^+$  that we would like to apply to each pair of vectors to compute an $N \times N$ matrix $D$, where the $i,j$ element is give by $D_{i,j} = \rho(x_i, x_j)$

Let's start with the Euclidean (L2) norm $\rho(x, y) = \sqrt{\sum_{i=1}^3(x_i - y_i)^2}$

Here's how we might compute this in Julia

In [1]:
N = 1500
X = randn(3, N)  # create the matrix of random numbers drawn from N(0, 1)

3×1500 Array{Float64,2}:
  0.715844   1.45206    0.823386  …  -0.969235  -0.118297  -0.877812
 -1.21844    0.147635   1.08595       0.178751  -0.133487   1.32263 
 -0.533013  -1.40513   -1.07828      -3.1798     0.363955   0.205532

In [14]:
function computeD(ρ, X)
    m, N = size(X)
    out = zeros(N, N)
    for j in 1:N, i in 1:N
        # [index...] notation. Colon means "everything" in dimension
        out[i, j] = ρ(X[:, i], X[:, j])
    end
    out # implicit return of last expression in function
end

# short-hand function
# the `.+` and `.^` apply "elementwise" to arguments
# Can work with other functions `f` like `f.(arg1, arg2, arg3, ...)`
ρ_l2(x, y) = sqrt(sum((x.-y).^2))

@time D = computeD(ρ_l2, X)

  0.745252 seconds (6.80 M allocations: 740.609 MiB, 42.90% gc time)


1500×1500 Array{Float64,2}:
 0.0       1.7801   2.37047  3.84236  2.47581   …  3.43469  1.6363    3.08905
 1.7801    0.0      1.1758   4.5763   1.53997      3.00218  2.38216   3.06645
 2.37047   1.1758   0.0      3.95544  1.12557      2.90739  2.11041   2.14436
 3.84236   4.5763   3.95544  0.0      3.80534      4.1988   2.55115   2.0993 
 2.47581   1.53997  1.12557  3.80534  0.0          1.78233  2.44776   2.40231
 1.25205   1.61506  1.68716  3.16938  1.41881   …  2.38535  1.43661   2.26515
 0.770282  1.7654   2.04729  3.26891  1.93211      2.8165   1.3412    2.54403
 1.94038   2.92658  2.66198  2.24058  2.93576      3.9939   0.564735  1.81632
 3.64545   4.36788  3.91051  3.0447   4.54292      5.83129  2.29928   2.66002
 1.54604   2.05497  1.78117  2.55944  1.76181      2.82759  0.831962  1.6466 
 2.16541   3.60794  3.60264  2.73328  3.84439   …  4.73151  1.53389   2.87756
 1.06759   1.08171  1.31617  3.5942   1.55625      2.91847  1.36562   2.3485 
 2.56983   3.14105  2.97576  3.75125

Let's test our work  by showing the diagonal is all zeros:


In [4]:
using LinearAlgebra: diag

# notice  the `.` notation for `abs`
maximum(abs.(diag(D)))

0.0

## Python implementation

So that we can gauge performance

In [6]:
using Pkg

In [7]:
pkg"add PyCall"

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[?25l[2K[?25h

│     — /home/sglyon/.julia/registries/General — failed to fetch from repo
└ @ Pkg.Types /build/julia/src/julia-1.3.1/usr/share/julia/stdlib/v1.3/Pkg/src/Types.jl:1199


[32m[1m Resolving[22m[39m package versions...
[32m[1m Installed[22m[39m PyCall ─ v1.91.4
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Project.toml`
 [90m [438e738f][39m[92m + PyCall v1.91.4[39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Manifest.toml`
 [90m [438e738f][39m[92m + PyCall v1.91.4[39m
[32m[1m  Building[22m[39m PyCall → `~/.julia/packages/PyCall/zqDXB/deps/build.log`


In [9]:
using PyCall

In [10]:
py"""
import numpy as np

def ρ_l2(x, y):
    return np.sqrt(np.sum((x - y) ** 2))


def computeD(X, ρ=ρ_l2):
    m, N = X.shape
    out = np.zeros((N, N))
    for i in range(N):
        for j in range(N):
            out[i, j] = ρ(X[:, i], X[:, j])
    return out
"""

py_computeD = py"computeD"

PyObject <function computeD at 0x7f1fe569b598>

In [11]:
@time py_D = py_computeD(X)

 22.350265 seconds (1.57 M allocations: 96.763 MiB, 0.21% gc time)


1500×1500 Array{Float64,2}:
 0.0       1.7801   2.37047  3.84236  2.47581   …  3.43469  1.6363    3.08905
 1.7801    0.0      1.1758   4.5763   1.53997      3.00218  2.38216   3.06645
 2.37047   1.1758   0.0      3.95544  1.12557      2.90739  2.11041   2.14436
 3.84236   4.5763   3.95544  0.0      3.80534      4.1988   2.55115   2.0993 
 2.47581   1.53997  1.12557  3.80534  0.0          1.78233  2.44776   2.40231
 1.25205   1.61506  1.68716  3.16938  1.41881   …  2.38535  1.43661   2.26515
 0.770282  1.7654   2.04729  3.26891  1.93211      2.8165   1.3412    2.54403
 1.94038   2.92658  2.66198  2.24058  2.93576      3.9939   0.564735  1.81632
 3.64545   4.36788  3.91051  3.0447   4.54292      5.83129  2.29928   2.66002
 1.54604   2.05497  1.78117  2.55944  1.76181      2.82759  0.831962  1.6466 
 2.16541   3.60794  3.60264  2.73328  3.84439   …  4.73151  1.53389   2.87756
 1.06759   1.08171  1.31617  3.5942   1.55625      2.91847  1.36562   2.3485 
 2.56983   3.14105  2.97576  3.75125

In [13]:
maximum(abs.(D .- py_D))

0.0

On one run on my machine the Julia version took 0.745252 seconds whereas the Python one took 22.35 seconds. 

This was a pretty much "free" speedup as the code was almost the same for both languages. It is not uncommon to se factors of 100 or 1000 when moving hand-written Python code to Julia

Note that the code in each example was not meant to reach optimal performance, both implementations could be improved

## Faster Julia version



In [15]:
function computeD_l2(X)
    m, N = size(X)
    out = zeros(N, N)
    @inbounds for j in 1:N, i in 1:N
        val = 0.0
        @simd for d in 1:m
            val += (X[d, i] - X[d, j])^2
        end
        out[i, j] = sqrt(val)
    end
    out # implicit return of last expression in function
end

computeD_l2 (generic function with 1 method)

In [37]:
@time D2 = computeD_l2(X);

  0.017431 seconds (6 allocations: 17.166 MiB)


In [38]:
maximum(abs.(D .- D2))

8.881784197001252e-16

On my machine this version of code ran the computation in 0.017620 seconds -- that's a 40x speedup over the original Julia and a 1268x speedup over the Python

With Julia it is possible to write "lower level" code and get drastically better performance

Sometimes you might hear that "in Julia you can get C-like performance if you write C-like code"

This is typically not possible in other common data analytics languages like Python, Matlab, or R