In [None]:
%flow deregister_tracer ipyflow.tracing.ipyflow_tracer.DataflowTracer

# Python as a Hackable Language for Interactive Data Science

### Stephen Macke
### PyData Global 2023

# About me

- Engineer at Databricks based in Seattle area

- Passionate about computational notebook technology

- When not working on OSS projects, I help my wife out in our garden, which our cat has recently figured out how to escape from

# Who this presentation is for

- Users of data-adjacent Python tools

- Implementors of Python tools

- Very basic knowledge of Python / Jupyter is assumed

# Why is Python great for data science?

- Ecosystem of data-adjacent libraries

- It has great (OSS) tooling for *interactive programming*
    - **IPython / Jupyter**, Spyder, VSCode notebooks, Marimo...

# Underappreciated aspect: language extensibility

- simplest example: top-level await in IPython

In [None]:
async def foo():
    return 42

In [None]:
await foo()

In [None]:
# much more convenient than:
import asyncio
asyncio.get_event_loop().run_until_complete(foo())

# Abstract Syntax Tree Transformations

- IPython top-level "await" is an example of an *AST transformation*

- Python exposes an API for AST transformations via *AST visitors*

- What interactive features can we enable if we leverage these to their full extent?

# Let's look at some examples!

- All examples leverage an instrumentation library called pyccolo (https://github.com/smacke/pyccolo)

- They run within the ipyflow kernel (https://github.com/ipyflow/ipyflow), which exposes an API to hook into pyccolo functionality

- This presentation is just a notebook. You can download it at https://github.com/ipyflow/pydata-global-2023 and run the examples in it yourself

# Example: Optional chaining

- "Recent" flavors of javascript have an awesome syntax for optional chaining / maybe monads

- E.g. a?.b?.c?.()?.d()?.e

- We can imbue Python with these abilities as well!

In [None]:
%flow register_tracer pyccolo.examples.OptionalChainer

In [None]:
from typing import Optional
class Foo:
    foo = 42
    bar: Optional["Foo"] = None

In [None]:
Foo?.foo

In [None]:
Foo.bar?.foo is None

In [None]:
Foo.bar?.baz()?.bam is None

# Example: make assignment to simple variables non-blocking

- Idea: make all variable assignment "instantaneous"!

- Only only block if variable is used in a load context (i.e. as an rval)

- Unless! the load context is itself part of a simple variable assignment :)

In [None]:
%flow deregister_tracer pyccolo.examples.OptionalChainer
%flow register_tracer pyccolo.examples.FutureTracer

In [None]:
import time
def expensive(x):
    time.sleep(3)
    return x

In [None]:
x = expensive(1)

In [None]:
y = x + 10

In [None]:
z = y + expensive(2)

In [None]:
z

# Example: reactive execution

- "State problem" / out-of-order execution well documented in notebooks

- (Partial) solution: reactively rerun dependent cells!

In [None]:
%flow deregister_tracer pyccolo.examples.FutureTracer
%flow register_tracer ipyflow.tracing.ipyflow_tracer.DataflowTracer

In [None]:
x = 0

In [None]:
y = x + 1

In [None]:
print("hello")

In [None]:
y

- Anything with a dot: has dataflow relationship with selected cell

- Anything with an *orange* dot: will also execute when the selected cell executes

# Reactive execution: great for interactive widgets

In [None]:
import numpy as np
import matplotlib.pyplot as plt

xs = np.linspace(0, 10, 1000)

In [None]:
from ipywidgets import IntSlider
slider = IntSlider(min=1, max=10)

In [None]:
fig = plt.figure(figsize=(5, 2))
plt.plot(xs, np.cos(xs * slider.value))

In [None]:
slider

# Why do we need AST transformation / instrumentation for reactivity?

- For easy cases, static analysis suffices

- For harder cases, there's no avoiding it:

In [None]:
x = y = 0

In [None]:
import random
if random.random() < 0.5:
    x += 1
else:
    y += 1

In [None]:
x

In [None]:
y

# Also needed for anything requiring runtime introspection

- Example: memoization. Skip code that would give the same result!

In [None]:
import time

In [None]:
class Foo:
    def slow_method(self):
        time.sleep(2)
        return 0

In [None]:
%%memoize
d = {"foo": Foo()}

In [None]:
%%memoize
ans = d["foo"].slow_method() + 7
ans

# How does all this work?

- Traditional AST transformers: hard to register multiple because they do not *compose*

- These examples all use a library called *pyccolo* (https://github.com/smacke/pyccolo)

- pyccolo allows for composable AST transformations

In [None]:
import ast
import pyccolo as pyc

class IncrementOneTracer(pyc.BaseTracer):
    @pyc.register_handler(pyc.right_binop_arg)
    def right_arg(self, ret, *_, **__):
        return ret + 1

In [None]:
%flow register_tracer IncrementOneTracer

In [None]:
2 + 2

In [None]:
class IncrementTwoTracer(pyc.BaseTracer):
    @pyc.register_handler(pyc.right_binop_arg)
    def right_arg(self, ret, *_, **__):
        return ret + 2

In [None]:
%flow register_tracer IncrementTwoTracer

In [None]:
2 + 2

# Conclusion

- Python is surprisingly hackable

- Its hackability enables all kinds of interesting interactive use cases

- Play around with these examples yourself at https://github.com/ipyflow/pydata-global-2023 or https://github.com/smacke/pyccolo/tree/master/pyccolo/examples

# Q & A

## stephen.macke@gmail.com

## https://smacke.net

## https://github.com/ipyflow/ipyflow