[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/jolin-io/workshop-accelerate-Python-with-Julia/main?filepath=01-introduction-julia.ipynb)

<a href="https://www.jolin.io" target="_blank" rel="noreferrer noopener">
<img src="https://www.jolin.io/assets/Jolin/Jolin-Banner-Website-v1.3-darkmode.webp">
</a>

# Tutorial PyCon 2023: Accelerate Python with Julia

<div style="display: flex">
<div style="width: 20%">
<a href="https://www.jolin.io/en/about-us" target="_blank" rel="noreferrer noopener">
<img src="https://www.jolin.io/assets/stephansahm-extreme-small.webp" style="height: 300px">
</a>
</div>
    
<div style="width: 50%">
    
### Stephan Sahm
    
- Founder of Jolin.io julia consultancy
- Organizer of Julia User Group Munich
- Full stack data science consultant
- Applied stochastics, uncertainty handling
- Big Data, High Performance Computing and Real-time processing
- Making things production ready

</div>
</div>


------

### Outline for today

1. **Introduction to Julia I:** calling Julia from Python
2. **Introduction to Julia II:** Pluto, pure Julia
3. **Simulation example:** Python vs Cython vs C++ vs Julia

<br>

-------

<br>

# **Introduction to Julia I:** Calling Julia from Python

For further study, https://julialang.org/learning/ is the perfect place to start.

While there are multiple options to use Julia from Python I recommend using python package `juliacall`.
- 🙂 it does not copy data, but passes mutable references between the languages
- 🙂 good defaults
- 🙂 nice printing in Jupyter

In depth documentation about `juliacall` (and the corresponding `PythonCall.jl` julia package) can be found at [`PythonCall.jl`](https://cjdoris.github.io/PythonCall.jl/stable/).

In [None]:
from juliacall import Main as jl
%load_ext juliacall.ipython

This gives us access to cell magic `%%julia` and line magic `%julia`

In [None]:
# JuliaCall comes with its own Julia dependency file juliapkg.json
# however for binder it is much simpler to just reuse binder's installation mechanism
%julia Pkg.activate(Base.current_project())
%julia using PythonCall
%julia set_var(k, v) = @eval $(Symbol(k)) = $v

Above we defined a little helper called `set_var` which we can use to copy Python objects to Julia.

Any julia function or variable defined in the global julia namespace can be accessed directly on `jl`

In [None]:
n = 1000
jl.set_var("n", n)

## Arrays

Julia has excellent support for arrays. Unlike NumPy, Julia supports excellent performance also for custom data types.

In [None]:
import numpy as np
nparray = np.arange(n)
jl.set_var("nparray", nparray)
nparray[:10]

Another goodie: Any custom julia function can be broadcasted over arrays (in high speed).

In [None]:
%%julia
@show typeof(nparray)

double(x) = 2x
result = double.(nparray)

@show typeof(result)
result

In [None]:
jlarray = _
print(type(jlarray))

In [None]:
back_to_python = np.array(jlarray) / 2
back_to_python[:10]

### Minibenchmarks

In order to call our double function from python we define an extra helper

In [None]:
%%julia
array_double(a) = double.(a)

When comparing to python's standard range, going to julia and back to python is still faster for larger `n`

In [None]:
%timeit [x*2 for x in range(n)]

In [None]:
%timeit jl.array_double(range(n))
# for n=1000, julia is faster, while for n=100 python is still faster

When comparing with `numpy`, numpy is faster

In [None]:
%timeit nparray * 2

In [None]:
%timeit jl.array_double(nparray)

In order to inspect what the pure julia time would be, we can benchmark directly within julia using `BenchmarkTools.jl`

In [None]:
%%julia
using BenchmarkTools

@btime double.(nparray)

# when converting the Python wrapper to julia's standard type Vector
# we get a little extra boost (not too much actually)
jlarray = pyconvert(Vector, nparray)
@btime double.(jlarray)

Another tiny example: Matrix multiplication

In [None]:
%timeit np.random.rand(1, n) @ np.random.rand(n, 1)

In [None]:
# using Julias multiplication on python objects
%julia multiply(a, b) = a * b
%timeit jl.multiply_convert(np.random.rand(1, n), np.random.rand(n, 1))

In [None]:
# using plain julia
%julia @btime rand(1, n) * rand(n, 1)

----------------
#### 💻 your space
- 👉 try a different `n`
- 👉 try mapping some further numpy operations to julia

In [None]:
# your space ...

----------------

## What to do, if you neither have Jupyter nor `%%julia`

if you are not in a jupyter notebook, you can simply write a julia file

In [None]:
%%writefile example.jl

a = 2
myfunc(args...; kwargs...) = (args, kwargs)

`jl.seval(...)` executes arbitrary julia code.

In addition, julia comes with a special function `include`, which loads the given filename.

In [None]:
jl.seval('include("example.jl")')

or use `jl.seval` directly

⚠️ when using multiple statements they need to be wrapped into `begin` - `end`

In [None]:
jl.seval("""begin
    a = 2
    myfunc(args...; kwargs...) = (args, kwargs)
end""")

In [None]:
jl.myfunc(1,4, [1,2,3], range(10), mykey=list)

## DataFrames

Julia has excellent support for DataFrames, too. Again with excellent support for custom data types.

See https://dataframes.juliadata.org/stable/man/comparisons/ for a detailed mapping from pandas to DataFrames.jl.

In [None]:
[1,2]*3

In [None]:
import pandas as pd
import numpy as np

df = pd.DataFrame({'grp': [1, 2] * 3,
                   'x': range(6, 0, -1),
                   'y': range(4, 10),
                   'z': [3, 4, 5, 6, 7, None]},
                   index = list('abcdef'))
df2 = pd.DataFrame({'grp': [1, 3], 'w': [10, 11]})

df.groupby('grp')['x'].mean()

In [None]:
%%julia
using DataFrames
using Statistics

df = DataFrame(grp=repeat(1:2, 3), x=6:-1:1, y=4:9, z=[3:7; missing], id='a':'f')
df2 = DataFrame(grp=[1, 3], w=[10, 11])

combine(groupby(df, :grp), :x => mean)

pseudo timings:

In [None]:
df3 = pd.DataFrame({'grp': [1, 2] * n, 'x': range(2*n, 0, -1)})
%timeit df3.groupby('grp')['x'].mean()
df3.groupby('grp')['x'].mean()

In [None]:
%%julia
df3 = DataFrame(grp = repeat(1:2, n), x = 2n:-1:1)
@btime combine(groupby(df, :grp), :x => mean)

----------------
#### 💻 your space
- 👉 look at the documentation and try mapping some further pandas operations
- 👉 try a different `n`

In [None]:
# your space

----------------

# Further resources

- [learning Julia](https://julialang.org/learning/)
- [juliacall / PythonCall.jl](https://cjdoris.github.io/PythonCall.jl/stable/)
- using `juliacall` you can specify your julia dependencies using a json config file which will automatically be initialized by juliacall (see [these docs](https://cjdoris.github.io/PythonCall.jl/stable/juliacall/#julia-deps))

ways to add Julia itself as a dependency:
- `juliacall` uses [`juliapkg`](https://github.com/cjdoris/pyjuliapkg/) internally which also installs julia if it is not available
- alternatively, the python package [`jill`](https://pypi.org/project/jill/) is widely used to install julia from python
- the conda-forge [`julia`](https://github.com/conda-forge/julia-feedstock) package is well maintained

# Next

Next you will...
- learn why Julia is both easy and fast
- get to know Pluto, a reactive alternative to Jupyter

[Next Notebook](https://mybinder.org/v2/gh/jolin-io/workshop-accelerate-Python-with-Julia/main?urlpath=pluto/open?path=/home/jovyan/02-introduction-pluto.jl)

<a href="https://www.jolin.io" target="_blank" rel="noreferrer noopener">
<img src="https://www.jolin.io/assets/Jolin/Jolin-Banner-Website-v1.3-darkmode.webp">
</a>