# Scipy 2018 (Scientific Programming with Python) conference: a short trip report

[Nicolas Fauchereau](https://github.com/jupyter-widgets/tutorial)


Tuesday 4 September 2018, Auckland

### Scipy: a brief overview

+ Austin, Texas, AT&T Executive Education and Conference Centre at the University of Texas at Austin
+ ~ 800 Participants this year
+ Industry, Academia, National Labs, etc
+ 2 days of tutorials
+ 3 days of conference (talks, posters, birds of a feather) 
+ 2 days of `sprints`
+ [https://scipy2018.scipy.org/](https://scipy2018.scipy.org/)

# Tutorials

## day 1

+ Parallelizing Scientific Python with Dask  
[https://github.com/martindurant/dask-tutorial-scipy-2018](https://github.com/martindurant/dask-tutorial-scipy-2018)


+ An Introduction to Julia (see [https://julialang.org/](https://julialang.org/))  
[https://github.com/JuliaComputing/JuliaBoxTutorials](https://github.com/JuliaComputing/JuliaBoxTutorials)

## day 2

+ Introduction to Geospatial Data Analysis with Python  
[https://github.com/geopandas/scipy2018-geospatial-data/](https://github.com/geopandas/scipy2018-geospatial-data/)


+ Machine Learning with scikit-learn (Part 2)  
[https://github.com/amueller/scipy-2018-sklearn](https://github.com/amueller/scipy-2018-sklearn)

# Talks 

**tracks:** 

General, **Machine Learning**, **Earth, Ocean and Geoscience**, Image Processing, Astronomy, Biology and Bioinformatics, **Data Science**, Reproducibility and Software Sustainability, **Data Visualization**, Materials Science. 



# A brief selection of interesting talks

#### Wednesday 11 July 

+ **Keynote**: SciPy 1.0 and Beyond - a Story of Code and Community [*Ralf Gommers, FP Innovations, Scion*]  


+ [SatPy: A Python Library for Weather Satellite Processing](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21618382&sessionchoice=1&) 
+ [The Past, Present, and Future of Automated Machine Learning](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21618936&sessionchoice=1&)
+ [PANGEO: A Big-data Ecosystem for Scalable Earth System Science](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21618394&sessionchoice=1&) 
+ [Beyond Scraping: How to Use Machine Learning When You're not Sure Where to Start](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=22060628&sessionchoice=1&)  
+ [Apache Arrow: A Cross-language Development Platform for In-memory Data](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21618887&sessionchoice=1&)   

+ Lightning talks [https://www.youtube.com/watch?v=1HDq7QoOlI4](https://www.youtube.com/watch?v=1HDq7QoOlI4)


#### Thursday 12 July 

+ **Keynote**: democratizing data [*Tracy Teal, Data Carpentry*]  


+ [Website for Interacting with Oceanographic Data and Numerical Model Output](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21618396&sessionchoice=1&)  
+ [Detecting Anomalies Using Statistical Distances](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21618892&sessionchoice=1&)   
+ [EarthSim: Flexible Environmental Simulation Workflows Entirely Within Jupyter Notebooks](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21618398&sessionchoice=1&)  
+ [Bringing ipywidgets Support to plotly.py](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21616512&sessionchoice=1&)  
+ [Data Visualization for Scientific Discovery](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21616170&sessionchoice=1&)  
+ [Spatio-temporal Analysis of Socioeconomic Neighborhoods](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21619044&sessionchoice=1&). 
+ [Open Source Tools for 'Heterogeneous Agent' Modeling](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21619043&sessionchoice=1&). 

#### Friday 13 July 

+ **Keynote**: Black and Gold: The Role of Python in Recent Gravitational Wave Astronomy Breakthroughs with LIGO and Virgo  


+ [Building your own Weather App using NOAA Open Data and Jupyter Notebooks](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21618399&sessionchoice=1&)  
+ [Building an Object-Oriented Python Interface for the Generic Mapping Tools](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21618400&sessionchoice=1&)  
+ [Development of MetPy’s Declarative Plotting Interface](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21618401&sessionchoice=1&)  
+ [bqplot: Seamless Interactive Visualizations and Dashboards in the Jupyter Notebook](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21615982&sessionchoice=1&)  
+ [Interactive Data Visualization Leveraging Jupyter, Rust and WebAssembly](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21616614&sessionchoice=1&)  
+ [PyViz: Unifying Python Tools for In-Browser Data Visualization](https://scipy2018.scipy.org/ehome/index.php?eventid=299527&tabid=721463&cid=2264594&sessionid=21616294&sessionchoice=1&)  



# interesting projects and new developments (demos !)

# Parallelizing Scientific Python with Dask


## what is dask ?

In a nutshell, **Dask** is a parallel computing framework that allows to scale Python data structures and tools to work on multi-CPU / Multi-Core single machines or Clusters (Cloud or HPC environments) 

### Main components: 

+ `Dask Delayed`: parallelize user-defined functions
+ `Dask arrays`: dask-enabled numpy arrays
+ `Dask Dataframe`: dask-enabled Pandas DataFrames

In [None]:
from IPython.display import IFrame
IFrame('http://dask.pydata.org/en/latest/', width="100%", height=500)

## Dask delayed

In [None]:
from dask import delayed
from time import sleep

def inc(x):
    sleep(1)
    return x + 1

def add(x, y):
    sleep(1)
    return x + y

In [None]:
%%time

x = inc(1)
y = inc(2)
z = add(x, y)

In [None]:
z

In [None]:
%%time
# This runs immediately, all it does is build a graph

x = delayed(inc)(1)
y = delayed(inc)(2)
z = delayed(add)(x, y)

In [None]:
z

In [None]:
%%time
# This actually runs our computation using a local thread pool

z.compute()

In [None]:
z.visualize()

## Dask arrays

<img src="http://dask.pydata.org/en/latest/_images/dask-array-black-text.svg" width="50%" align="center">

In [None]:
import numpy as np
import dask.array as da

In [None]:
%%time 
x = np.random.normal(10, 0.1, size=(20000, 20000)) 
y = x.mean(axis=0)[::100] 
y

In [None]:
%%time
x = da.random.normal(10, 0.1, size=(20000, 20000), chunks=(1000, 1000))
y = x.mean(axis=0)[::100] 
y.compute() 

# Dask dataframes

In [None]:
import os

import dask
import dask.dataframe as dd
import pandas as pd

In [None]:
df = dd.read_csv(os.path.join('./data', 'nycflights', '*.csv'),
                 parse_dates={'Date': [0, 1, 2]},
                 dtype={'TailNum': str,
                        'CRSElapsedTime': float,
                        'Cancelled': bool,
                       'Origin':str})

In [None]:
df.DepDelay.max().compute()

In [None]:
df.DepDelay.max().visualize()

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo("mqdglv9GnM8", width=700, height=400)

# PANGEO 

## A community platform for Big Data geoscience

[http://pangeo.io/](http://pangeo.io/)



In [None]:
IFrame('http://pangeo.io/', width="100%", height=700)

+ Enable **Big Data** Geoscience (focus so far on Ocean, Atmospheric Science)  

+ Responds to the challenge posed by exponential growth in datasets produced by satellites, models, etc. (**petabyte-scale** datasets)  

+ Leverages familiar Python software stack ([Jupyter](http://jupyter.org/), [Xarray](http://xarray.pydata.org/en/stable/) / [Iris](https://scitools.org.uk/iris/docs/latest/), [Dask](http://dask.pydata.org/en/latest/), etc.) to enable the **interactive** analysis of very large datasets in **cloud or HPC environments**

# demo ! 

<br>

![Kiku](images/crossed-fingers.png)   

[http://pangeo.pydata.org/hub/login](http://pangeo.pydata.org/hub/login)
