![NASA](http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg)

---

# The Future

### GSFC Intermediate/Advanced Python Bootcamp 2017

#### Brent Smith

---

![Back to the Future](http://pbs.twimg.com/media/CR1_f8sWwAA3qjB.jpg)

# 1. Compatibility

---

One of the biggest debates within the Python community is choosing Python 2.x or Python 3.x. From the 2016 PyCon keynote from Guido van Rossum ([link](http://youtu.be/YgtL4S7Hrwo)), Python 2.7 was announced to be the last of the 2.x series.

Even more evidence: [http://pythonclock.org](https://pythonclock.org)

### Got Python 2?

---

If you have code that is still in Python 2.x, there is a very useful guide to help you transition your code.

Details:

* [Python-Future](http://python-future.org)
* [Six: Python 2 and 3 Compatibility Library](http://pythonhosted.org/six/)

### Quick Synopsis

---

* `print` is a function, not a statement
  * Python 2.x: ```print 'Hello',```
  * Python 3.x: ```print('Hello', end='')```
* Strings ([unicode link](http://docs.python.org/3/howto/unicode.html)):
  * Python 2.x: str has no encoding (not utf-8)
  * Python 3.x: str is unicode
* Division:
  * Python 2.x: 3/2 = 1
  * Python 3.x: 3/2 = 1.5
* Others: exceptions, class definitions, etc.

### Python 2/3 Compatible Code Example

In [None]:
from __future__ import print_function
from __future__ import division

print(
    'A new print function using Python 2.',
    end='...err {result}.'.format(result=3/2)
)

[Formatting Strings Reference](http://pyformat.info)

# 2. New Stuff...The Future!

---

Included is a small showcase of 4 new packages (to me) in Python for scientific/engineering applications:

* [asyncio](http://docs.python.org/3/library/asyncio.html) - asynchronous/concurrent computing (single-thread)
* [xarray](http://xarray.pydata.org/en/stable/) - pandas, but for N-dimensional arrays
* [bokeh](http://bokeh.pydata.org/en/latest/) - interactive visualizations
* [dask](http://dask.pydata.org/en/latest/) - parallel and distributed computing

### 2.1 asyncio

---



### 2.1 Bokeh

---

Python visualization package aimed at web browsers/sites for interactivity and display. It uses json to send the data to the browser and then uses javascript ([BokehJS](http://bokeh.pydata.org/en/latest/docs/dev_guide/bokehjs.html)) to add interactivity.

_Note:_ Output can be to a web page, the Jupyter notebook (inline), or even a local Bokeh server (think dashboards, visualizing large datasets).

__Links:__

* [Official Gallery](http://bokeh.pydata.org/en/latest/docs/gallery.html)

__2.1.1 Charts__

Bokeh charts are primarily used for statistical plots such as a histogram, pie (donut) chart, box plot, etc. These are very useful when needing to visualize your data quickly and dynamically (think monitoring data trends through a dashboard of visualizations).

![Charts Diagram](http://chdoig.github.io/pyladiesatx-bokeh-tutorial/images/charts.png)

__Sample Data:__ We need some data to work with, so Bokeh provides the interface to download some sample datasets that we can use for examples.

In [None]:
import bokeh
bokeh.sampledata.download()

In [None]:
# example from: http://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/master/tutorial/10%20-%20charts.ipynb
from bokeh.sampledata.iris import flowers
flowers.head()

In [None]:
from bokeh.charts import Scatter, show, output_notebook
output_notebook()
p = Scatter(flowers, x='petal_length', y='petal_width')
show(p)

In [None]:
p = Scatter(flowers, x='petal_length', y='petal_width', color='species', legend='top_left')
show(p)

__2.1.2 Plotting__

---

The most popular use of Bokeh is it's plotting interface. This is different than charts in that plotting is centered at the way you plot data (not by a specific type of plot, think boxes rather than circles).

![Plotting Diagram](http://chdoig.github.io/pyladiesatx-bokeh-tutorial/images/plotting.png)

In [None]:
# example from: http://bokeh.pydata.org/en/latest/docs/gallery/texas.html
from bokeh.io import show
from bokeh.models import (
    ColumnDataSource,
    HoverTool,
    LogColorMapper
)
from bokeh.palettes import Viridis6 as palette
from bokeh.plotting import figure, output_notebook

from bokeh.sampledata.us_counties import data as counties
from bokeh.sampledata.unemployment import data as unemployment

palette.reverse()

counties = {
    code: county for code, county in counties.items() if county["state"] == "tx"
}

county_xs = [county["lons"] for county in counties.values()]
county_ys = [county["lats"] for county in counties.values()]

county_names = [county['name'] for county in counties.values()]
county_rates = [unemployment[county_id] for county_id in counties]
color_mapper = LogColorMapper(palette=palette)

source = ColumnDataSource(data=dict(
    x=county_xs,
    y=county_ys,
    name=county_names,
    rate=county_rates,
))

TOOLS = "pan,wheel_zoom,reset,hover,save"

p = figure(
    title="Texas Unemployment, 2009", tools=TOOLS,
    x_axis_location=None, y_axis_location=None
)
p.grid.grid_line_color = None

p.patches('x', 'y', source=source,
          fill_color={'field': 'rate', 'transform': color_mapper},
          fill_alpha=0.7, line_color="white", line_width=0.5)

hover = p.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = [
    ("Name", "@name"),
    ("Unemployment rate)", "@rate%"),
    ("(Long, Lat)", "($x, $y)"),
]

show(p)

__2.1.3 Models__

---

The underlying objects that create a Bokeh plot or chart (i.e., Object-Oriented Programming entry point to Bokeh).

![Models Part 1](http://chdoig.github.io/pyladiesatx-bokeh-tutorial/images/models1.png)

![Models Part 2](http://chdoig.github.io/pyladiesatx-bokeh-tutorial/images/models2.png)

### xarray

---

Brings the power of Pandas to N-dimensional variants of the data structure.

In [None]:
import numpy as np
import pandas as pd
import xarray as xr

xr.DataArray(np.random.randn(2, 3))

In [None]:
data = xr.DataArray(np.random.randn(2, 3), coords={'x': ['a', 'b']}, dims=('x', 'y'))

In [None]:
data

### dask

---

Parallelization and distributed computing is hard to grasp because we weren't taught that at the beginning (i.e., debugging, developing, etc. but for distributed systems is hard).

Talk from Plotcon/SciPy by Matthew Rocklin.

In a nutshell:
* Parallel computing library for Python
* Task scheduler (low-latency ~10ms)
* Uses other packages to aid (PyData ecosystem): numpy, pandas, etc.
* Can be scaled (laptop to supercomputer)

A simple example...

In [None]:
from time import sleep
import random

def inc(x):
    sleep(0.2)
    return x + 1

def double(x):
    sleep(0.2)
    return 2 * x

def add(x,y):
    sleep(0.2)
    return x + y

In [None]:
%%time

data = [1, 2, 3, 4, 5, 6, 7, 8]

out = []
for x in data:
    y = inc(x)
    z = double(y)
    out.append(z)
    
total = 0
for z in out:
    total = add(total, z)

total

Now to parallelize in dask...

In [None]:
import dask

# delayed means to setup but not compute yet
inc = dask.delayed(inc)
double = dask.delayed(double)
add = dask.delayed(add)

In [None]:
x = inc(1)
y = inc(2)
z = add(x, y)
dask.visualize(z, rankdir='LR')

In [None]:
dask.compute(z)

In [None]:
%%time

data = [1, 2, 3, 4, 5, 6, 7, 8]

out = []
for x in data:
    y = inc(x)
    z = double(y)
    out.append(z)
    
total = 0
for z in out:
    total = add(total, z)

# faster due to parallelization
total

In [None]:
dask.visualize(total, rankdir='LR') # sequential dependence still evident by visualization

In [None]:
data = [1, 2, 3, 4, 5, 6, 7, 8]

out = []
for x in data:
    y = inc(x)
    z = double(y)
    out.append(z)
    
# tree reduction
while len(out) > 1:
    out = [add(out[i], out[i+1]) for i in range(0, len(out), 2)]

total = out[0]

In [None]:
dask.visualize(total, rankdir='LR')

We can instantly see which of these two algorithms will be better for parallelization. But what if we had...

In [None]:
import dask.array as da

# 15x15 array of ones chunked into 5x5 squares (uses NumPy mainly)
x = da.ones((15, 15), chunks=(5,5))

In [None]:
dask.visualize(x)

In [None]:
dask.visualize((x.dot(x.T) - x.mean(axis=0)).std())

But this is bulky. We need a way to visualize how it __performs__ on a system.

In [None]:
from dask.distributed import Client
c = Client('128.154.200.69:8786')

In [None]:
from distributed import Client
from time import sleep
import random
import dask

def inc(x):
    sleep(random.random() / 10)
    return x + 1

def dec(x):
    sleep(random.random() / 10)
    return x - 1

def add(x, y):
    sleep(random.random() / 10)
    return x + y


client = Client('128.154.200.69:8786')

incs = client.map(inc, range(100))
decs = client.map(dec, range(100))
adds = client.map(add, incs, decs)
total = client.submit(sum, adds)

del incs, decs, adds
total.result()