Skip to content
Jim's April 8-10 PICSciE Numpy tutorials.
Branch: master
Clone or download
Latest commit 3308219 Apr 9, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data walked through everything locally again Apr 6, 2019
img done with 09 and everything Apr 6, 2019
.gitignore
01-why-python.ipynb walked through everything locally again Apr 6, 2019
02-numpy-and-pandas.ipynb walked through everything locally again Apr 6, 2019
03-numpy-skills.ipynb fix all the (very minor) bugs found while testing on the instance Apr 6, 2019
04-cupy-and-dask.ipynb fix all the (very minor) bugs found while testing on the instance Apr 6, 2019
05-day1-homework.ipynb 05 is done Apr 6, 2019
06-awkward-datasets.ipynb walked through everything locally again Apr 6, 2019
07-coding-fast-and-fast-code.ipynb done with 07; removed evaluated output Apr 6, 2019
08-day2-homework.ipynb
09-jailbreaking-python.ipynb done with 09 and everything Apr 6, 2019
10-more-stuff.ipynb done with 10 Apr 9, 2019
LICENSE Initial commit Apr 3, 2019
README.md point Binder to 1.1 Apr 9, 2019
environment.yml fix all the (very minor) bugs found while testing on the instance Apr 6, 2019

README.md

High-Performance Python and Interoperability with Compiled Code

Abstract

Python is a notoriously slow language, so why is it widely used by scientists and machine learning experts? In a numerically heavy task, an interpreted, dynamically typed environment can the hundreds to thousands of times slower than a compiled, statically typed one, which can make the difference between minutes of waiting and days of waiting, or between coarse models on small datasets and fine-grained models on large datasets. The trick is to drive compiled functions from the interpreted command line, as is done in R, and to frame your problem in array programming primitives, as is done in Matlab, but in a general-purpose language with hundreds of thousands of extensions to glue to every conceivable interface.

In this three day workshop from April 8th-10th we will examine the numerical processing ecosystem that has grown up around Python. The key library in this ecosystem is Numpy, which enables fast array programming and also provides a common data structure for sharing large, numerical datasets. We will walk through the process of restructuring "for loop" algorithms as "columnar" algorithms based on Numpy, as well as using Numba to speed up "for loop" algorithms by compiling the Python code. We'll do the same on a GPU using CuPy (a Numpy clone written for GPUs) and Numba. We'll also explore methods of mixing Python and C++, both for performance and for compatibility with existing libraries. Finally, I'll introduce Pandas as a convenient front-end to Numpy for data analysis.

Before the class

Participants are strongly encouraged to bring a laptop to work through exercises. We will use conda and pip-in-conda, so superuser ("sudo") permissions are not required. Participants should have a good general working knowledge of Python and a little C++ (enough to understand the discussion of Python-C++ bindings).

Come with conda (Miniconda or Anaconda) for Python 3 installed on your laptop or on a system you can access. Make sure you can install Numpy, Numba, Pandas, and JupyterLab. You will not be required to install GPU libraries or have access to a GPU. All of the exercises will be conducted in JupyterLab. No prior knowledge of these libraries will be assumed.

At the start of class

Binder

Use the Launch Binder button to run these exercises on the web or the following to install on your own computer.

# if you haven't added conda-forge already
conda config --add channels conda-forge

# definitely: used repeatedly in the course
conda install numpy pandas matplotlib scikit-learn awkward numba

# maybe: only used for one or two things that may be skipped
conda install dask distributed python-graphviz uproot cython pybind11 pillow psutil

# get the lessons and start the notebook
git clone https://github.com/jpivarski/2019-04-08-picscie-numpy.git
cd 2019-04-08-picscie-numpy
jupyter lab

Day 1 homework

See 05-day1-homework.ipynb: converting K-means implementation to Numpy.

Day 2 homework

See 08-day2-homework.ipynb: accelerating decision tree code.

Day 3 homework

Bring a problem related to your research that you think these tools can help. We may have some time at the end of the third day to work on it.

You can’t perform that action at this time.