Skip to content

grasph/JIII2021Talk

Repository files navigation

Binder

The presentation was written as a Jupyter(lab) notebook. For an interactive presentation, click on the binder badge above or here. The static version follows. The presentation notebook (01-Julia-dream-JIII2021.ipynb) and the accompagning ones can be downloaded from this repository.


13e Journées Informatiques IN2P3/IRFU





Julia, a HEP dream comes true

Julia, un rêve HEP se réalise


Philippe Gras (Université Paris-Saclay, CEA/Irfu)
November 16th, 2021

Introduction

  • Julia is a relatively new programming language

  • Let's start with few words on the context...

High performance computing is important for HEP

  • Computing plays a central role in the research done at LHC
    • Theoretical predictions: simulation of the proton-proton collisions
    • Simulation of the detector response
    • Reconstruction of the phyics events
    • Analysis of reconstructed events to perform measurements and new physics search.

Analysis of the reconstructed events

  • Behind an LHC experiment result publication there are tens of thousands of computing jobs that have run on the worldwide computing grid. Both ATLAS and CMS reached their 1000th papers in June 2020.
  • Research code: developed by the main authors of the prepared publication
  • C++ is widely used and the performance it offers is essential
  • More and more combined with Python, that provides easy/fast coding it offers, and comes with an extensive library ecosystem. But it does not meet the performance provided by C++
  • A programming language that offers at the same time the C++ and Python pros would be more convenient

Julia solving the two-language problem


Fast/easy coding fastFast running
Python C/C++
⇒ Mixing languages and Go back-and-forth between them

Nowadays, Julia is a mature language, with a wide ecosystem

Loops

A python loop

HEP data analysis is a looping game

HEP enjoys loop: we loop on physics events to loop on particles/physics objects. We often perform particle matching and clustering and for this we loop on events to loop on objects to loop on objects.

for event in billions_of_lhc_events
    for tens_or_hundreds_of_objects in event
        for tens_or_hundres_of_objects_to_match in event
            ...
        end
    end
end
  • The outer loops hides several loops: datasets > files
  • This is repeated several times for each analysis.

⇒ Lines of code executed billions of times even for a Kleenex code, written specially for a publication.

Python dislikes loop

  • A master rule for high-performance code in Python is to avoid writing loop in Python
    • ⇒ push the loop to underlying compiled libraries. Approach of the numpy vectorisation.

Let's run a simple loop

Simple loop in Python

Python
45 ms

In C++. Code in simple-loop.cc:

#include <iostream>
#include <sys/time.h>x

int main(){
  struct timeval t0, t1;
  gettimeofday(&t0, 0);
    
  double a = 0.;
  for(unsigned i = 0; i <= 1000000; ++i) a += 1.0/i;
  std::cout << "Computation Result: " << a << "\n";
  
  gettimeofday(&t1, 0);
  std::cerr << "Duration: " << (t1.tv_sec-t0.tv_sec)
    + 1.e-6*(t1.tv_usec-t0.tv_usec)
	    << " seconds\n";
  return 0;
}
run(`g++ -Wall -o simple-loop simple-loop.cc`)
run(`./simple-loop`)
;
run(`g++ -O3 -Wall -o simple-loop simple-loop.cc`)
run(`./simple-loop`)
;
Computation Result: 14.3927
Computation Result: 14.3927


Duration: 0.003238 seconds
Duration: 0.001518 seconds
Python C++
45 ms 1.0 ms

How is doing Julia?

#
# Julia
#
function f()
    a = 0.0
    for i in 1:1_000_000 # ✨ Note the underscores that improves legibility
       a = a + 1.0/i
    end
    return a
end
f()
@time b = f()
  0.001369 seconds (1 allocation: 16 bytes)





14.392726722864989
Python C++ Julia
45 ms 1.0 ms 1.0 ms

Ease of programming

The goal is not only running performance. We want also fast and easy coding.

  • You have already seen in the previous example that the code syntax and grammar is similar to Pythons. No ''std::map<std:string, std::vector>''..., no compilation step.
  • Easy to learn
  • Syntactic sugars similar to Python for a concise code: list comprehension, a < b < c, 1_000_000, support of symbols for variables... and more: e.g. a function call is "vectorized" (ala numpy) with a simple dot, f.(x)
  • Interactive help, nice tools to debug, to optimize code, for introspection.
  • Multidispatch eases remarkably use/extension of third-party libraries → explains the rapid grow of the Julia ecosystem.

Programming in a community

Googling programming

Internet search engine and stack overflow play is an essential ingredient in nowadays programming workflow.

Julia is already widespread enough, to find all the information on the Internet.

In addition to usual resource, Julia has dedicated forum on Discourse, Slack, and Zulip with an active and friendly community.

Go to DuckDuckGo or your prefered search engine and make a try.

Ecosystem

  • Large set of libraries and active developement
    • Julia is firstly used by scientific community ⇒ oriented to our needs
  • I did the following game during the PyHEP2021 workshop: I've looked for a Julia equivalent each time a speaker mention a Python library (apart from HEP specific ones).
    • Caveat: I've not checked that it covers all the features of the Python package
    • The results of this survey shows the large activity around Julia

cmd (Olivier Mattelaer) ✓
FreeCAD interface (Christophe Delaere) ❌ In discussion
Telegram bot (Matias Senge) ✓ https://github.com/Arkoniak/Telegram.jl
DataFrames (Vincenzo Eduardo Padulano) ✓
Spark (Vicenzo and Andr F.) ✓ https://github.com/dfdx/Spark.jl
Dask (Vincenzo E. P., Graham Markal) ✓
Batch computing (Vincenzo E. P.) ✓ https://docs.julialang.org/en/v1/manual/parallel-computing/, https://github.com/JuliaParallel, https://juliagpu.org/
Apache Parquet (Andre Frankenthal) ✓
Jupyter/Binder/SWAN ✓
Bokeh (Bruno Alves) ✓ https://github.com/samuelcolvin/Bokeh.jl
CUDA (Graham Markall) ✓ https://juliagpu.org/cuda/
Hypothesis (Santam Roy Choudhury, property testing) ❌ (beside an unmaintained QuickCheck projet)
Virtualenv (Henry Schneider, Packaging talk) ✓ built in the std package manager
Unit test tools (Henry Schneider, Packaging talk) ✓ std package and more: Coverage, FactCheck
JIT/Numba (Graham Markal, Henry Schneider) ✓ Intrisic to the language
Machine learning ✓ Flux, JuliaML,
  TensorFlow(Matthew Feickert)TensorFlow
  GPyTorch/Gaussian Process ML (Irina Espejo Morales talk) GPML

Data format support

  • Non-HEP format
    • HDF5 and Parquet are fully supported (also CSV and Excel, less relevant our data size)
  • ROOT
    • Two packages, developed by users.
      • Writtten in Julia, fast, and read-only: UnROOT.jl from Tamas Gal and Jerry Ling. Can read KM3Net data and tree of simple type and/or vector of simple type like CMS NanoAOD.
      • Providing both read and write support: UpROOT.jl from Oliver Schulz. A wrapper to uproot. Support xroot.

Advanced tools

IDE

  • Emacs and vim support
  • Atom and VScode support. Many features. Code can be run and debugged with the IDE, with support for plots.

Notebooks

  • Jupyter
  • Pluto. A new generation notebook with automatic update of cells.
  • Debugger: Debugger, Rebugger, Juno debugger (for Atom IDE)

Package installation

  • Python made it easy with conda and pip. It's even easier in Julia
    • A standard library part of the Julia installation
    • Give instructions to the user, when he or she tries to import a missing package. Try it:
import Blink
ArgumentError: Package Blink not found in current path:
- Run `import Pkg; Pkg.add("Blink")` to install the Blink package.




Stacktrace:

 [1] require(into::Module, mod::Symbol)

   @ Base ./loading.jl:967

 [2] eval

   @ ./boot.jl:373 [inlined]

 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)

   @ Base ./loading.jl:1196

In the REPL (interactive terminal application equivalent ipython), since version 1.7, the user is proposed to install the missing package and does not even need to enter the installation command.

Package installation prompt

💡 Dedicated command mode for package handling in the REPL:

julia> ]
(@v1.6) pkg> add Blink
(@v1.6) pkg> [Backspace]
julia>

Interoperability

"Plug adaptors" by dogwelder is licensed under CC BY-NC 2.0
Credits: Karen V Bryan is licensed under CC BY-ND 2.0
  • Python, C, Fortran code: direct call from Julia and Jupyter Julia kernels
  • C++ code: call via a wrapper. Lacking a tool for automatic generation of wrapper like swig. Project for direct-call (ala cppyy) on hold and not working for recent versions of Julia.

The other way around

  • Python code can call Julia as well
  • C/C++ code can call Julia code

Calling Python from Julia

As simple as calling Julia code

# Enable Python call:
using PyCall

# Inport a python module:
math = pyimport("math")

# Use it as a Julia module:
math.sin(math.pi / 4)
0.7071067811865475

Calling C from Julia

path = ccall(:getenv, Cstring, (Cstring,), "SHELL")
unsafe_string(path)
"/bin/bash"

For C, you will typically write a wrapper in ''Julia'' to handle errors, like:

function getenv(var::AbstractString)
    val = ccall(:getenv, Cstring, (Cstring,), var)
    if val == C_NULL
        error("getenv: undefined variable: ", var)
    end
    return unsafe_string(val)
end
getenv (generic function with 1 method)
println(getenv("USER"))
println(getenv("SMOKE")) # ⇒ will through an exception unless you have SMOKE in your environment 
pgras



getenv: undefined variable: SMOKE



Stacktrace:

 [1] error(::String, ::String)

   @ Base ./error.jl:42

 [2] getenv(var::String)

   @ Main ./In[7]:4

 [3] top-level scope

   @ In[8]:2

 [4] eval

   @ ./boot.jl:373 [inlined]

 [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)

   @ Base ./loading.jl:1196
  • Julia code can also be embedded in C/C++

Calling Julia from python

$ python3 -m pip install julia    # install PyJulia
...                               # you may need `--user` after `install`

$ python3
>>> import julia
>>> julia.install()               # install PyCall.jl etc.
>>> from julia import Base        # short demo
>>> Base.sind(90)
1.0

Embedding Julia code in a Python notebook

Calling julia code from IPython

Let's use Julia for a HEP example

CMS dimuon analysis

⚡ It's an extremely simple analysis, way far from usual LHC analysis

Let's go,

Comparion with Python

Running times for different implentations of the dimuon analysis:

Julia Python event loop Python RDataFrame
JIT-compiled C++
Python RDataFrame
JIT-compiled python (Numba)
35 s 4h 5min 60 s 125s
Similar performance expected for a DataFrame-based Julia implementation
$\Rightarrow$ Julia runs fast out of the box
No need to think about peformance when writing the code

Conclusions

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages