In [None]:
import param
import panel as pn

pn.extension('tabulator')

# HoloViz Dev Deep Dive (.rx)

The addition of `.rx` in Param has made certain operation a lot easier, particularly when working with data pipelines, to others they are still quite mysterious. In this deep dive we will discover why they were added, how they fit into the HoloViz reactivity model and then unpack how they work internally. 

## What are the benefits of reactivity?

1. It is **data-driven**, i.e. the UI updates automatically in response to changes in the data model. In Excel the data model is represented by value of a cell, while React uses state hooks. In Panel the data model is defined by "Parameters". Each component and user class defines some parameters, and changes in the parameter values drive changes in the UI. In all these frameworks, automatic updates are achieved by what we will call **data binding**, how precisely this works will be the primary focus of this section.

2. It is **declarative**, which means that the user defines what should be rendered, while the reactive framework figures out how to efficiently make the required updates. In Excel this process is relatively simple; Excel simply has to re-evaluate any formulas that depend on the changed inputs, and then update those cells' output. In React this involves something called the virtual DOM and involves "diffing" of the document model and then figuring out the most efficient updates to reflect the latest changes. In Panel the update process works either by defining a function whose output is diffed, or by binding references to a declarative component (if you don't know what that means yet, don't worry, that's what this guide is for!).

This data-driven and declarative nature allows succinct expression of how a change in some input value should be reflected in the output.

## How is Reactivity implemented in Param?

Reactivity in Param is achieved through so called "references". A reference binds some state uni-directionally to a parameter on another object, i.e. we can bind the `value` parameter of a widget to some other widget:

In [None]:
visible = pn.widgets.Checkbox(name='visible', value=True)

pn.Row(visible, pn.widgets.FloatSlider(visible=visible))

This form of reactivity is concise, readable and easy to understand, but it also has a clear limitation. What happens if there isn't a straightforward 1-to-1 mapping between the input parameter and the parameter we are binding to?

In the past we would have recommended bound functions for this, i.e. you would bind the parameter value to a function, which would transform the value and we could then bind the function to the parameter:

In [None]:
text = pn.widgets.TextInput(value='world')

def format(text):
    return f'Hello {text}!'

pn.Column(
    text,
    pn.pane.Markdown(object=pn.bind(format, text))
)

This still has a lot of the benefits, but functions have a few drawbacks:

- `Verbosity`: Functions add boilerplate which obscures the purpose of what we are doing.
- `Locality`: The function definition may not be co-located with the code that calls it, making it more difficult to trace what's happening.

### How is it implemented?

Reactive references in Param build on lower level primitives. Internally in Param each `Parameter.__set__` call tries to detect if the provided value is a real value or if it's a reference, but only if `allow_refs=True` is enabled.

In [None]:
class ParamRefExample(param.Parameterized):

    a = param.List(default=0, allow_refs=True, nested_refs=True)

    b = param.Integer(default=0, allow_refs=False)

So we can bind a reference to a parameter:

In [None]:
int_input = pn.widgets.IntInput(value=5)

ref_example = ParamRefExample(a=int_input)

ref_example

and have it reflect the changes in the parameter value:

In [None]:
int_input.value = 10

ref_example

but cannot bind it if it's not allowed:

In [None]:
with param.exceptions_summarized():    
    ParamRefExample(b=int_input)

Internally this is implemented through a number of helper functions and a dictionary of references:

`resolve_ref` resolves the references given some value:

In [None]:
param.parameterized.resolve_ref(int_input)

`resolve_value` resolves the current value of the reference:

In [None]:
param.parameterized.resolve_value(int_input)

And references are stored on the `_param__private` namespace object:

In [None]:
ref_example.a = 2

## Reactive Expressions

Until now we covered parameter references and function references and highlighted the drawbacks of each, `.rx` is designed as another tool in your toolbox to succinctly express data transformations.

In other languages and frameworks the concept of bound reactive functions and expressions is generally referred to as "derived state" or "computed properties". The main benefits of this approach includes:

- Automatically recalculates values only when dependencies change, reducing unnecessary recomputations.
- Clearly defines the relationship between state and derived values, making the code easier to read and understand.
- Guarantees that computed values are always in sync with their dependencies, reducing potential bugs.
- Optimized caching prevents redundant calculations (only applicable for `.rx`)

In Python, unlike in JS based reactive frameworks, operator overloading makes a reactive expression language like `.rx` implements, especially appealing. Standard operations such as basic arithmetic, boolean logic, and operations like `__getitem__` can be transparently wrapped to allow a reactive value to behave just like the underlying value that is wrapped.

### Constructing reactive expressions

Reactive expressions in Param can be constructed around a constant value or derived from an existing reference, i.e. we can wrap an existing value like an int:

In [None]:
rx_int = pn.rx(1)

rx_int

and then update it manually:

In [None]:
rx_int.rx.value = 2

Or we can create one from an existing reference:

In [None]:
int_input = pn.widgets.IntInput(value=5)

rx_int_input = pn.rx(int_input)

rx_int_input

Since `rx` expression are only uni-directional we cannot modify the underlying value of a derived expression:

In [None]:
with param.exceptions_summarized():    
    rx_int_input.rx.value = 2

#### How do reactive expressions store the input?

Internally reactive expressions use a mutable type to store the current input value:

In [None]:
print(rx_int._shared_obj)

rx_int.rx.value = 3

print(rx_int._shared_obj)

and uses a `Wrapper` class to set up the signaling when the value is updated: 

In [None]:
rx_int._wrapper

If the expression is derived from another reference however, it is wrapped in a bound function:

In [None]:
print(rx_int_input._fn)

### Chaining

Now we've arrived at the interesting part, how does the chaining of operations work? For operations that work via operator overloading the approach is quite simple:

In [None]:
rx_int_add = rx_int + 1

rx_int_add

Internally this will call the `__radd__` dunder method, and then it's just a matter of recording the operation:

In [None]:
rx_int_add._operation

and recording the input expression:

In [None]:
print(rx_int_add._prev)

In [None]:
text = pn.widgets.TextInput(value='world')

pn.Column(
    text,
    pn.pane.Markdown(object=pn.rx("Hello {}!").format(text))
)

We can now build a graph capturing the full expression and should any of the input references change we can re-run the expression to obtain the result. Before we unpack that further, let's dig into how this works for arbitrary method calls.

In order for tab-completion to work we also implement `__dir__`. This makes it behave like the underlying wrapped object.

Additionally the `__getattr__` implementation will check if a method is present on the underlying object and record it. A method call can be broken down into two steps:

1. Accessing the method records the current method:

In [None]:
method_access = rx_int_add.to_bytes

method_access._method

and 2. calling the method will record it as an operation:

In [None]:
rx_bytes = method_access()

print(rx_bytes._operation)

rx_bytes

## Updates

So how do the updates work? What happens if I now update the input value:

In [None]:
rx_int.rx.value = 7

By default the answer is - not much. However because we are in an interactive environment the answer is quite a lot. Let's therefore create a non-interactive `rx` variable:

In [None]:
rx_input = pn.rx(5)

rx_result = rx_input + 5

rx_input.rx.value = 5

Accessing `rx_result` we now expect a value of 10, but in fact the addition hasn't happened yet. Instead all setting a new input value did was set a flag on `rx_result` saying it is dirty:

In [None]:
rx_result._dirty

Only when something now tries to resolve the value, e.g. by accessing `rx_result.rx.value` does the actual computation get triggered:

In [None]:
rx_result.rx.value

As we can see the `rx_result` is now no longer dirty:

In [None]:
rx_result._dirty

In this way the expression is kept up-to-date but also stays lazy, avoiding unnecessary calculation until they are actually needed.

However, as soon as the reference is bound to something it is no longer lazy and it will update immediately:

In [None]:
pn.indicators.Number(value=rx_result)

In [None]:
rx_input.rx.value = 7

## When (not) to use .rx expressions?

Now that we understand a lot of the internals of `rx` it's time to discuss when to use them, and when not to use them.

### When to use them:

- Simple, inline transformations: This is where they shine, you have a boolean value and just need to negate it before binding it as a reference? Perfect and simple.
- Data Transformation: Pandas and other data libraries are arguably best used using method-chaining approaches and `rx` is a natural extension of that. Need to filter some data, multiply some columns and then compute the mean? Again, perfect, simple and clean and you can reuse the result thanks to the automatic caching.
- Branching workflows: Say you have a dataset, want to transform or filter it dynamically, and then compute a bunch of different statistics from the same subset of the data. Thanks to caching of the intermediate values it is simple, clear and efficient to compute the derived statistics. 

### Examples

#### Inline Transforms

In [None]:
slider = pn.widgets.IntSlider(start=0, end=11)

pn.Column(
    slider,
    pn.pane.Alert(
        "Warning: Are you sure you want to turn this up to eleven?",
        alert_type='warning',
        visible=slider.rx()>10
    )
)

#### Data Transforms

In [None]:
import pandas as pd

df = pd.read_parquet("/Users/philippjfr/development/lumen/windturbines.parquet")

rx_df = pn.rx(df)

states = pn.widgets.MultiChoice(options=list(df.t_state.unique()))
filtered = rx_df[rx_df.t_state.isin(states)]
capacity = filtered.t_cap.sum()
count = filtered.rx.len()

pn.Row(
    states,
    pn.indicators.Number(value=count, name='Count'),
    pn.indicators.Number(value=capacity / 1000, name="Capacity (kWh)")
)

#### Branching

### When not to use them?

- Complex conditionals: One of the biggest drawbacks of `.rx` expressions is that you have to re-learn a lot of the programming patterns. Helpers like `.rx.when` or `.rx.where` can help with simple conditional cases but as soon as you have some complex chain of these it becomes very difficult to reason about.
- Large Data: One of the biggest strengths of expressions is that they automatically perform caching, particularly useful for branching workflows, but for large data this can start working against you. Since intermediate results are cached, and Pandas is not always transparent about when it makes a copy of your data, you can end up with multiple copies of your large dataset in memory.