<a href="https://colab.research.google.com/github/pathwaycom/pathway-examples/blob/main/showcases/live-data-jupyter.ipynb" target="_parent"><img src="https://pathway.com/assets/colab-badge.svg" alt="Run In Colab" class="inline"/></a>

# Installing Pathway with Python 3.10+

In the cell below, we install Pathway into a Python 3.10+ Linux runtime.

> **If you are running in Google Colab, please run the colab notebook (Ctrl+F9)**, disregarding the 'not authored by Google' warning.
> 
> **The installation and loading time is less than 1 minute**.


In [None]:
%%capture --no-display
!pip install pathway

# Transform live data streams in Jupyter

In this notebook we implement a very simple trading algorithm that uses [Bollinger Bands](https://en.wikipedia.org/wiki/Bollinger_Bands). Concretely, we compute the 1 minute running mean of Volume Weighted Average Price (`vwap`) and the 20 minute volatility, the Volume Weighted Standard Deviation (`wvstd`) on the price timeseries. This creates two bands around the mean price, with most of price movements happening between the bands. Intuitively, when a price approaches the upper band, it means that it is abnormally high and may likely drop - it is a good moment to _SELL_. Likewise, when the price approaches the lower band, it indicates it is low and may grow to revert to the mean - it is a good moment to _BUY_. For further reliability, the BUY/SELL actions are performed only when there was a significant volume of trades, indicating that the outlying price is not a one-off event.

Let's jump right into the code now!

![image](https://github.com/pathwaycom/pathway-examples/blob/c20cd69a6c9c87fc70a9082de57666c50f2ab3c2/documentation/from_jupyter_to_deploy/jupyter-demo-final-smallest-compressed.gif?raw=true)

## Imports and setup

First we import the necessary libraries:

- [`pathway`](https://pathway.com/developers/user-guide/welcome/) for data processing,
- `datetime` for date manipulation,
- [`bokeh`](https://bokeh.org/) for interactive plotting,
- [`panel`](https://panel.holoviz.org/) for dashboard layout.

In [1]:
import datetime

import bokeh.models
import bokeh.plotting
import panel

import pathway as pw

In [2]:
!wget -nc https://gist.githubusercontent.com/janchorowski/e351af72ecd8d206a34763a428826ab7/raw/ticker.csv

File ‘ticker.csv’ already there; not retrieving.



## Data source setup

Create a streaming data source which replays data in a `csv` file. the `input_rate` parameter controls how fast the data is replayed.

No data processing happens at this time, we are building a computational graph to be executed at the end of the notebook.

In [3]:
fname = "ticker.csv"
schema = pw.schema_from_csv(fname)
data = pw.demo.replay_csv(fname, schema=schema, input_rate=1000)

# # For static data exploration use
# data = pw.io.csv.read(fname, schema=schema, mode="static")

# Parse the timestamps
data = data.with_columns(
    t=pw.apply_with_type(
        datetime.datetime.fromtimestamp, pw.DATE_TIME_NAIVE, data.t / 1000.0
    )
)

## 20 minutes rolling statistics

We use a [`sliding window`](https://pathway.com/developers/user-guide/guides/windowby-reduce-manual/#temporal-sliding-windowing) to compute every minute the volume weighted price mean and standard deviation aggretage on past past 20 minutes of data. The `behavior` option tells Pathway that the window should emit the statistics only when it is finished - we do not want to see the incomplete results.

To compute the standard deviation we use the identity

$$
\sigma(X) = \sqrt{\operatorname E\left[(X - \operatorname E[X])^2\right]} = \sqrt{\operatorname E\left[X^2\right] - (\operatorname E[X])^2},
$$

which is easily expressible using [Pathway reducers](https://pathway.com/developers/api-docs/reducers/): we first compute the total $\mathrm{volume}$, $\mathrm{price}$, and $\mathrm{price}^2$. We then postprocess them to obtain the mean ($\mathrm{vwap}$), standard deviation ($\mathrm{vwstd}$), and Bollinger Bands places at $\mathrm{vwap} \pm 2\cdot \mathrm{vwstd}$.

In [4]:
minute_20_stats = (
    data.windowby(
        pw.this.t,
        window=pw.temporal.sliding(
            hop=datetime.timedelta(minutes=1), duration=datetime.timedelta(minutes=20)
        ),
        behavior=pw.temporal.common_behavior(delay=datetime.timedelta(minutes=20)),
        instance=pw.this.ticker,
    )
    .reduce(
        ticker=pw.this._pw_instance,
        t=pw.this._pw_window_end,
        volume=pw.reducers.sum(pw.this.volume),
        transact_total=pw.reducers.sum(pw.this.volume * pw.this.vwap),
        transact_total2=pw.reducers.sum(pw.this.volume * pw.this.vwap**2),
    )
    .with_columns(vwap=pw.this.transact_total / pw.this.volume)
    .with_columns(
        vwstd=(pw.this.transact_total2 / pw.this.volume - pw.this.vwap**2) ** 0.5
    )
    .with_columns(
        bollinger_upper=pw.this.vwap + 2 * pw.this.vwstd,
        bollinger_lower=pw.this.vwap - 2 * pw.this.vwstd,
    )
)

## 1 minute rolling statistics

We now compute the mean price over the past minute of trades - the code is analogous to the 20 minute statistics, but simpler - we can use `tumbling window` and don't have to compute the standard deviation.

In [5]:
minute_1_stats = (
    data.windowby(
        pw.this.t,
        window=pw.temporal.tumbling(datetime.timedelta(minutes=1)),
        instance=pw.this.ticker,
    )
    .reduce(
        ticker=pw.this._pw_instance,
        t=pw.this._pw_window_end,
        volume=pw.reducers.sum(pw.this.volume),
        transact_total=pw.reducers.sum(pw.this.volume * pw.this.vwap),
    )
    .with_columns(vwap=pw.this.transact_total / pw.this.volume)
)

## Joining the statistics

We now join the 20 minutes and 1 minutes statistics, gathering all information needed for alerting in one place. Alert triggering is now a breeze.

In [6]:
joint_stats = (
    minute_1_stats.join(
        minute_20_stats, pw.left.t == pw.right.t, pw.left.ticker == pw.right.ticker
    )
    .select(
        *pw.left,
        bollinger_lower=pw.right.bollinger_lower,
        bollinger_upper=pw.right.bollinger_upper,
    )
    .with_columns(
        is_alert=(pw.this.volume > 10000)
        & (
            (pw.this.vwap > pw.this.bollinger_upper)
            | (pw.this.vwap < pw.this.bollinger_lower)
        )
    )
    .with_columns(
        action=pw.if_else(
            pw.this.is_alert,
            pw.if_else(pw.this.vwap > pw.this.bollinger_upper, "sell", "buy"),
            "hodl",
        )
    )
)
alerts = joint_stats.filter(pw.this.is_alert)

## Dashboard creation

We now create a `Bokeh` plot and `Panel` table vizualization: the plot shows the Bollinger Bands along with price running mean and indicates the price of buy and sell decisions. The table gathers all the decisions in a convenient for for further processing, such as reducing it to compute a historical evaluation of the gains of the strategy.

When the cell is executed we get placeholder containers for the plot and table visualization. They will be populated with live data when computation is started.

In [7]:
def stats_plotter(src):
    actions = ["buy", "sell", "hodl"]
    color_map = bokeh.models.CategoricalColorMapper(
        factors=actions, palette=("#00ff00", "#ff0000", "#00000000")
    )

    fig = bokeh.plotting.figure(
        height=400,
        width=600,
        title="20 minutes Bollinger bands with last 1 minute average",
        x_axis_type="datetime",
        y_range=(188.5, 191),
    )
    fig.line("t", "vwap", source=src)
    band = bokeh.models.Band(
        base="t",
        lower="bollinger_lower",
        upper="bollinger_upper",
        source=src,
        fill_alpha=0.3,
        fill_color="gray",
        line_color="black",
    )

    fig.scatter(
        "t",
        "vwap",
        color={"field": "action", "transform": color_map},
        size=10,
        marker="circle",
        source=src,
    )

    fig.add_layout(band)
    return fig


viz = panel.Row(
    joint_stats.plot(stats_plotter, sorting_col="t"),
    alerts.select(pw.this.ticker, pw.this.t, pw.this.vwap, pw.this.action).show(
        include_id=False, sorters=[{"field": "t", "dir": "desc"}]
    ),
)
viz

## Running the computation

Finally, we start the Pathway data processing engine. Watch how the dashboard is updated in realtime! The basic Bollinger Bands action trigger seems to be working - the green buy decision markers are frequently followed by th red sell markers at a slightly higher price.

While the computation is running, `pathway` prints important statistics such as message processing latency.

Successful evaluation of the code should result in the animation:
![image](https://github.com/pathwaycom/pathway-examples/blob/c20cd69a6c9c87fc70a9082de57666c50f2ab3c2/documentation/from_jupyter_to_deploy/jupyter-demo-final-smallest-compressed.gif?raw=true)

In [8]:
pw.run()

# What else can you do with Pathway?

* Perform machine learning in real-time. e.g. [ real-time
Classification](https://pathway.com/developers/showcases/lsh/lsh_chapter1/) , [real-time fuzzy joins](https://pathway.com/developers/showcases/fuzzy_join/fuzzy_join_chapter2/)

* Transform unstructured data to structured data using [live LLM pipelines](https://github.com/pathwaycom/llm-app)

* Making [joins](https://pathway.com/developers/tutorials/fleet_eta_interval_join/) simple with timeseries data

And so much more..  This list only skims the surface of what Pathway can do. Read more about what we can do in the [developer docs](https://pathway.com/developers/user-guide/welcome/).

We would love to have you trying out [Pathway on GitHub ](https://github.com/pathwaycom/pathway).