# CME 257 Class 6 - Language & Library Interfaces

One of the nice things about Julia is that it is relatively easy to use code written in other languages.  Today we'll talk about Julia's built-in [`ccall()`](http://julia.readthedocs.org/en/latest/manual/calling-c-and-fortran-code/) function (for C and Fortran) as well as the [PyCall](https://github.com/stevengj/PyCall.jl) package.  There are also packages to call [R](https://github.com/JuliaStats/RCall.jl), [Matlab](https://github.com/JuliaLang/MATLAB.jl), [Mathematica](https://github.com/one-more-minute/Mathematica.jl), [C++](https://github.com/Keno/Cxx.jl), and [Java](https://github.com/aviks/JavaCall.jl) (maybe more that aren't on the package registry).  We'll focus on ccall and PyCall today because these are probably the most important in the current Julia ecosystem, although you may find one of these other packages useful depending on your needs and interests.

## Why call other languages?

Julia can be nice to work with, but isn't perfectly suited for all problems, and hasn't been around for a long time to gain extensive package support.  Using language interfaces can let you

* Use Julia for certain tasks, and use a different language for other tasks
* Utilize robust, tried and tested libraries, or industry/community standard libraries
* Use code from your old projects in Julia

## ccall

`ccall()` lets you call libraries written in either in c or fortran from Julia.  Shared object libraries vary a bit between operating systems - typically on linux they have a .so extension, Macs have a .dylib extension, and Windows has a .dll extension.  Static libraries have a .a extension on Mac/linux, and .lib extension on Windows.  The examples here were tested on a Mac.

Shared object libraries are loaded at runtime, and static libraries have code that is copied when a binary is created.  You can only call shared object libraries from Julia.  If you want to learn more, try this [StackOverflow thread](http://stackoverflow.com/questions/2649334/difference-between-static-and-shared-libraries) to start.

In [3]:
#A first example
ccall((:clock, "libc"), Int32, ())

14369025


### Compiling a library

When you install a library using a package manager, it's pretty easy to get started using it.  If you have your own custom code, you need to pass in `-fPIC` and `-shared` to your compiler (at least gcc, other compilers may behave slightly differently) to help the complier know it is creating a shared object library.

Here we'll use C to compile libraries, but you can also use fortran.

Refer to the [Makefile](Makefile) to see a basic example.

### Using ccall

Important: You need to be able to find the libraries to use them.  This is done using the global variable `LD_LIBRARY_PATH`, set in the bash shell.  Set this in your terminal before launching Julia/Jupyter in order to use the libcme257.so library.

```
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{path to your .so files}

```

the `.` just appends the current directory to the search path for libraries.  If you want to use a library in another directory, use its path.

Information can be found in [Julia's documentation](https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/)

The first input is a function-library pair e.g. `(:c_sum, "libcme257.so")`, the second input is a return type, e.g. `Int64`, the third argument is a tuple of input types e.g. `(Int64, Int64)`, and the rest of the arguments are inputs.

Refer to [cme257.c](cme257.c) to see the function declarations for libcme257.so.



In [11]:
function c_hello()
    ccall((:hello, "libcme257.so"), Nothing, ())
end

function c_sum(a::Int64, b::Int64)
    return ccall((:c_sum, "libcme257.so"), Int64, (Int64, Int64), a, b)
    #return ccall((:c_sum, "libcme257"), Int64, (Float64, Float64), Float64(a), Float64(b))
end

# an example of what you can do wrong
function c_sum2(a::Float64, b::Float64)
    return ccall((:c_sum, "libcme257.so"), Float64, (Float64, Float64), a, b)
end
;

In [12]:
@show c_sum(100, 5)
c_hello()

c_sum(100, 5) = 105
hello world!

In [13]:
c_sum2(1.0, 2.0)

1.0

You can also call libraries installed on your computer, usually without modifying `LD_LIBRARY_PATH`

In [14]:
# call cosine in libmath
function c_cos(x::Float64)
   return ccall((:cos, "libm"), Float64, (Float64,), x) 
end

c_cos (generic function with 1 method)

In [17]:
x = Float64(pi)
@time y1 = c_cos(x)
@time y2 = cos(x)
@show y1, y2
;

  0.000003 seconds (5 allocations: 176 bytes)
  0.000007 seconds (5 allocations: 176 bytes)
(y1, y2) = (-1.0, -1.0)


For the math library, see [here](http://en.cppreference.com/w/c/numeric/math).  You can find many standard library function headers [here](http://en.cppreference.com/w/c/header).

ccall is used in parts of Julia, and also in some common libraries.

* [Metis.jl](https://github.com/JuliaSparse/Metis.jl) is simply a wrapper for the Metis library (graph partioning).
* [TensorFlow.jl](https://github.com/malmaud/TensorFlow.jl) wraps the TensorFlow library.

## Broadcasting/vectorization

Whenever you create a function in Julia, you can "broadcast" that function to an array of the types that the function works on.  This is done using the `.`:

In [19]:
x = randn()
y = c_cos(x) # regular function
x = randn(5)
y = c_cos.(x) # broadcasted function

5-element Array{Float64,1}:
 0.9743406502659705
 0.7433038460352788
 0.9794989264587904
 0.9685956739889715
 0.968852534991265 

Note we never defined a function `c_cos.`, just `c_cos` - think of this `.` as automatically creating vectorized functions (if you're used to MATLAB).

In [20]:
x = randn(5)
using LinearAlgebra
@time y1 = c_cos.(x)
@time y2 = cos.(x)
norm(y1-y2)

  0.000010 seconds (7 allocations: 320 bytes)
  0.099578 seconds (138.33 k allocations: 7.375 MiB, 26.92% gc time)


1.2412670766236366e-16

For more information see Julia's documentation on [array broadcasting](hhttps://docs.julialang.org/en/v1/base/arrays/#Broadcast-and-vectorization-1) and [vectorizing functions](https://docs.julialang.org/en/v1/manual/functions/#man-vectorized-1).  

For more complicated element-wise array manipulations, it is also handy to know about the [`map` function](https://docs.julialang.org/en/v1/base/collections/#Base.map)

In [21]:
@time y1 = c_cos.(x)
@time y2 = map(c_cos, x)
norm(y1 - y2)

  0.000012 seconds (7 allocations: 320 bytes)
  0.080124 seconds (103.60 k allocations: 5.390 MiB)


0.0

In [22]:
# something non-trivial
a = [(i,i+1) for i = 1:5]
map(x -> x[2]*x[1], a)

5-element Array{Int64,1}:
  2
  6
 12
 20
 30

`x -> x[2]*x[1]` is an example of an [anonymous function](https://en.wikibooks.org/wiki/Introducing_Julia/Functions#Anonymous_functions).  These are like [lambda functions in Python](https://www.python-course.eu/lambda.php).

# Exercise 1

* modify the cme257.c to include a multiplication function.  Create a function in Julia that calls the multiplication function in the shared object library.
* create a function that wraps sine in libmath

In [2]:
function c_mult(a::Int64, b::Int64)
    return ccall((:c_mult, "libcme257.so"), Int64, (Int64, Int64), a, b)
    #return ccall((:c_sum, "libcme257"), Int64, (Float64, Float64), Float64(a), Float64(b))
end

c_mult(201,2)

402

# BLAS and LAPACK interface

[BLAS](http://www.netlib.org/blas/) and [LAPACK](http://www.netlib.org/lapack/) are commonly used linear algebra libraries.  Today we'll briefly cover the interface in Julia. 

Why use these interfaces?  BLAS and LAPACK are often tuned for your machine architecture, and much effort is put into making them fast and efficient.  One common example is Intel's MKL, which you can use when you compile Julia.

Another advantage of these libraries are that they allow you to operate on arrays in-place.  If you write a optimization routine or PDE solver that requires a matrix-vector multiplication at each step, you can actually improve the speed of your function quite a bit by pre-allocating arrays and doing everything in-place.  This is because memory allocation is expensive.

These libraries also have special routines for special matrix formats (symmetric, triangular, banded), which you can use to further speed up your code.

To read more about Julia's BLAS and LAPACK interfaces, see its [linear algebra documentation](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#BLAS-Functions-1).

## dot, gemv, gemm

BLAS has 3 levels:
* level 1 consists of vector operations
* level 2 consists of matrix-vector operations
* level 3 consists of matrix-matrix operations

There are 4 underlying datatypes that you can use with BLAS: `Float32`, `Float64`, `Complex{Float32}`, and `Complex{Float64}`

In [7]:
using LinearAlgebra
# dot product between two vectors
# this is an example of a BLAS level-1 operation
T = Float32
n = 5
x = randn(T, n)
y = randn(T, n)
@time d1 = BLAS.dot(x,y)
@time d2 = dot(x,y)
d1 - d2

  0.000010 seconds (5 allocations: 176 bytes)
  0.000005 seconds (5 allocations: 176 bytes)


0.0f0

In Julia, functions that mutate data are typically denoted with a `!` symbol.  In the following, `gemv` returns the result of `A*x`, and `gemv!` overwrites a vector `y` that has been pre-allocated, and passed in as an input. There are somtimes valid performance reasons to do this: `gemv` (usually) does not cause Julia to call the garbage collector.

In [9]:
# gemv  - general matrix vector multiplication
# gemv  - y = α * A * x
# gemv! - y = α * A * x + β * y
T = Float32
n = 50
m = 100
A = rand(T, m, n)
x = rand(T, n)
y = Array{Float64}(undef,m)
α = one(T) # alpha
β = zero(T) # beta
# if we replaced the 'N' with a 'T', we would do A'*x - make sure dimensions are correct!
@time y1 = BLAS.gemv('N', α, A, x)
@time y2 = A*x
@show norm(y2 - y1)
;

  0.000018 seconds (5 allocations: 656 bytes)
  0.000011 seconds (5 allocations: 656 bytes)
norm(y2 - y1) = 0.0f0


In [16]:
# gemm - general matrix-matrix multiplication
begin
    T = Float64
    n = 1000;
    A = rand(T,n, n);
    B = rand(T,n, n);
    C = rand(T,n, n);
    Corig = copy(C);
    α = T(2.0);
    β = T(3.0);

    # C = α A B' + β C
    @time BLAS.gemm!('N', 'C', α, A, B, β, C);
    @show C ≈ (α * A * B' + β * Corig)
    C = copy(Corig);
    @time C = α * A * B' + β * C;

end

;

  0.065370 seconds (4 allocations: 160 bytes)
C ≈ α * A * B' + β * Corig = true
  0.071084 seconds (13 allocations: 30.518 MiB, 2.41% gc time)


# Exercise 2

* Compare the time it takes to do GEMM calling BLAS versus writing out the corresponding expression in Julia (try n = 10, 100, 1000).  Can you explain the results?
* implement the power method using `gemv!`.  Don't use any more allocations than necessary.

# PyCall

[PyCall](https://github.com/stevengj/PyCall.jl) is a package by [Stephen Johnson](http://math.mit.edu/~stevenj/), which allows you to call Python libraries using syntax that is essentially the same as Python's import statement.

It is used in several packages in Julia, including PyPlot.

In [17]:
using PyCall

In [18]:
# Demo similar to https://github.com/stevengj/PyCall.jl#usage
math = pyimport("math");
x = 5;
math.sin(math.pi + x) - sin(π + x)

1.1102230246251565e-16

PyCall works by combining Python's C API, and Julia's `ccall()` functions (so you can't use Jython, unless you want to try [JavaCall](https://github.com/aviks/JavaCall.jl)).  Check out the [source for PyCall](https://github.com/stevengj/PyCall.jl/blob/master/src/PyCall.jl), and see how `ccall` is being used.

Here's an example using Python's popular [scikit-learn](http://scikit-learn.org/stable/) from Julia:

In [19]:
# https://rizalzaf.wordpress.com/2015/05/15/calling-pythons-scikit-learn-machine-learning-library-from-julia/
svm = pyimport_conda("sklearn.svm", "scikit-learn")
X = [[0 0]; [1 1]]
y = [0; 1]
clf = svm.SVC()
(clf.fit)(X, y) 
# note syntax to call a method on an object
# clf.fit() in Python
# clf[:fit]() in PyCall

PyObject SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='rbf', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

In [20]:
x_test = [0.01 0.02]
y_test = (clf.predict)(x_test)

1-element Array{Int64,1}:
 0

You can also import your own Python modules.  Refer to [cme257.py](cme257.py) to see the function defintion of `fibonacci`.  Note that this is a terrible algorithm for computing fibonacci numbers.

In [21]:
py"""
import sys
sys.path.insert(0, ".")
"""

In [24]:
cme257py = pyimport("cme257")
cme257py.fibonacci(20)

6765

In [25]:
for i = 0:10
    println("fibonacci($i) = $(cme257py.fibonacci(i))")
end

fibonacci(0) = 0
fibonacci(1) = 1
fibonacci(2) = 1
fibonacci(3) = 2
fibonacci(4) = 3
fibonacci(5) = 5
fibonacci(6) = 8
fibonacci(7) = 13
fibonacci(8) = 21
fibonacci(9) = 34
fibonacci(10) = 55


# Running External Programs

You may also be interested in running programs that are typically executed using the bash shell in Julia.  You can read about this in [the documentation](https://docs.julialang.org/en/stable/manual/running-external-programs/#Running-External-Programs-1) - here we'll just give some simple examples.

In [26]:
c = `echo hello` # note tick marks to produce command
typeof(c)

Cmd

In [27]:
run(c) # this will run the command as if you did it in the bash shell
# output is piped to stdout

hello


Process(`[4mecho[24m [4mhello[24m`, ProcessExited(0))

In [28]:
# if you want to capture output as a string, use the following
ans = readstring(c)
ans

UndefVarError: UndefVarError: readstring not defined

In [29]:
# example of string interpolation in command
str = "hello()"
fname = "cme257.c"
c = `grep $str $fname`
run(c)

void hello();
void hello() {


Process(`[4mgrep[24m [4m'hello()'[24m [4mcme257.c[24m`, ProcessExited(0))

# Exercise 3

(You may need to install scikit learn to do this - `conda install scikit-learn` or `pip install scikit-learn` - try this with the `run` command in Julia!).

* compare several methods of computing element-wise `sin` of an array - an explicit for-loop, julia's built-in `sin.`, a broadcast call to `libmath`, and a map call with Python's `math.sin`.
* use a [decision tree classifier](http://scikit-learn.org/stable/modules/tree.html#classification) from scikit-learn on the example above.
* modify cme257.py to include a function that adds 3 integers together and call it from Julia
* How long does it take to multiply two 100x100 matrices using numpy?  How long does it take if you call numpy from Julia? How does this compare to doing the same thing in native Julia?