<center>
<a href="https://github.com/kamu-data/kamu-cli">
<img alt="kamu" src="https://raw.githubusercontent.com/kamu-data/kamu-cli/master/docs/readme_files/kamu_logo.png" width=270/>
</a>
</center>

<br/>

<div align="center">
<a href="https://github.com/kamu-data/kamu-cli">Repo</a> | 
<a href="https://docs.kamu.dev/cli/">Docs</a> | 
<a href="https://docs.kamu.dev/cli/learn/learning-materials/">Tutorials</a> | 
<a href="https://docs.kamu.dev/cli/learn/examples/">Examples</a> |
<a href="https://docs.kamu.dev/cli/get-started/faq/">FAQ</a> |
<a href="https://discord.gg/nU6TXRQNXC">Discord</a> |
<a href="https://kamu.dev">Website</a>
</div>

<center>
    
# 1. Working with Web3 data

</center>

# Introduction

No matter if you are Blockchain-cautious, Web3-curious, or already own a mansion in the Metaverse - you'll probably agree that the Distributed Ledgers (Blockchains) are quickly becoming **very large sources of various data**. And we really like data!

Web 2/3 worlds of data, however, are still very fragmented:

* Web2 data sits in thousands of silos, in hundreds of different formats, or hidden behind custom JSON APIs which take weeks to integrate. We are still very far from achieving interoperability of data even in the Web2 world alone.

* Web3 data from public ledgers, despite being freely available, presents an big challenge in terms of its volumes and non-friendly to Data Science formats. Some projects like [The Graph](https://thegraph.com/) and [Dune](https://dune.com/) make it more accessible, but don't offer much help when you want to combine blockchain data with off-chain sources, with any slightly more advanced use case often requiring you to develop your own data ingestion infrastructure.

* Data from off-chain sources is very hard to access from blockchain smart contracts, requiring you to go through expensive and cumbersome oracle networks.

What if we could solve all of these problems with a **single technology**? Make data overall much more easily **accessible** and **interoperable**, and erase the boundary between **off- and on-chain data**.

In this demo we will see how `kamu`'s data pipelines can make this possible, allowing you to:
- Move data from an to blockchains
- Share data via modern decentralized storage systems
- Combine on- and off-chain data within a single query.

<div class="alert alert-block alert-info">

This demo is intended to be standalone, but if at any point in time you feel lost you might want to revisit the _"Kamu Basics"_ chapter first. You are also very welcome on our [Discord](https://discord.gg/nU6TXRQNXC) or can create an issue in [kamu-cli](https://github.com/kamu-data/kamu-cli) GitHub repository to get help.

</div>

### Use Case

But first we need to pick a use case, so why don't we do some **personal finance**?

You like financial planning, don't you?

Neither do we... Being a grown up and having to deal with multiple bank accounts and retirement plans - what can be more boring?

But perhaps you spice things up by holding some cryptocurrency ... except now all your money are **spread over multiple different institutions and wallets** and it's very easy to **lose track** of your overall financial situation.

Most tools that banks and wallet apps offer are already mediocre, but now they are of no use at all as they only show you small parts of the whole picture.

<div class="alert alert-block alert-info">

**Fun fact:** `kamu` started in 2018 as a huge set of scripts that ingested data from multiple bank, retirement, and investment accounts, unified all currencies, and analyzed the performance of investments over time. This pipeline was so painful to maintain that we started to look for a better, fully autonomous solution.

</div>

For this demo let's assume that you **had some Ethereum**. To get more upside while holding it you decided to **"stake" it in the [Rocketpool](https://rocketpool.net/)**.

<div class="alert alert-block alert-warning">
<details>
<summary style="display:list-item"><b>What is Staking?</b></summary>

Staking is when you lock up some of your Ethereum as a collateral and to become a validator of ledger transactions. Staking pools like _Rocketpool_ allow you to invest any amount of ETH and let other people operate the transaction validator nodes for you while you all share the validation rewards.

</details>
</div>

Few month later you start wondering:
- Was that a good investment?
- How much is it worth now and what is the return?
- How did it perform over time compared to other things you invest in?

These questions are so easy to ask, but **so hard to answer**!

Throughout this demo we will create a personal data pipeline that can not only provide you an answer, but one that can **constantly stay up-to-date**, giving you full awareness of your portfolio's performance.

<div class="alert alert-block alert-danger">

This demo should not be taken as a financial advice or a comment on cryptocurrency - we are only interested in data science aspects of it.

</div>


Our first pipeline will look like this:

![blah](files/pipeline-1.png)

```
┌───────────────────────────────────────┬──────────────┬─────────────────────────────────────────────────┐
│                 Name                  │     Kind     │                  Description                    │
├───────────────────────────────────────┼──────────────┼─────────────────────────────────────────────────┤
│ net.rocketpool.reth.tokens-minted     │     Root     │ rETH contract events (pulled from Ethereum node)│
│ net.rocketpool.reth.tokens-burned     │     Root     │ rETH contract events (pulled from Ethereum node)│
│ net.rocketpool.reth.mint-burn         │  Derivative  │ Combined rETH transactions                      │
│ com.cryptocompare.ohlcv.eth-usd       │     Root     │ ETH to USD exchange rate (pulled from IPFS)     │
│ account.tokens.transfers              │     Root     │ Wallet token transfers (sourced from Etherscan) │
│ account.transactions                  │     Root     │ Wallet transactions (sourced from Etherscan)    │
│ account.tokens.portfolio              │  Derivative  │ Tokens portfolio with book prices & amount held │
│ account.tokens.portfolio.usd          │  Derivative  │ Tokens portfolio with USD book prices           │
│ account.tokens.portfolio.market-value │  Derivative  │ Tokens portfolio market value in ETH and USD    │
└───────────────────────────────────────┴──────────────┴─────────────────────────────────────────────────┘
```

<div class="alert alert-block alert-info">

The final version of the pipeline we're about to build can be found in Kamu Node running alongside this environment:

${KAMU_WEB_UI_URL}kamu/account.tokens.portfolio.usd?tab=lineage

![](files/kamu-node.png)

</div>

# Reading data from Ethereum
When you stake your `ETH` in Rocketpool - the Smart Contract takes your `ETH` and issues you a corresponding amount of `rETH` tokens to represent your stake.

Instead of periodically sending you more `rETH`, your staking rewards are "delivered" by changing the exchange rate between `ETH` and `rETH`, e.g. if you paid `1 ETH` for `1 rETH` in 2021, in 2022 you could sell `1 rETH` for `1.024 ETH` i.e. a 2.4% gain.

Let's try to visualize these exchange rates.

<div class="alert alert-block alert-success">

First, we initialize our workspace:
    
<p style="background:black">
<code style="background:black;color:white">cd "02 - Web3 Data (Ethereum trading example)"
kamu init
</code>
</p>
</div>

Since Ethereum is an open data source - we can find out the exchange rates by simply looking at all blockchain transactions involving `rETH` contract and seeing how much people buy and sell it for.

Based on the [ERC-20 Token Standard](https://ethereum.org/en/developers/docs/standards/tokens/erc-20/) we know that when tokens are issued in exchange for `ETH` - the `TokensMinted` event will be present in Ethereum transaction logs, and when `rETH` is exchanged back into `ETH` - we expect the `TokensBurned` event. If we look at the [Rocket Pool smart contract's code](https://etherscan.io/address/0xae78736cd615f374d3085123a210448e74fc6393#code) we will indeed find these events:
```c
event TokensMinted(address indexed to, uint256 amount, uint256 ethAmount, uint256 time)
event TokensBurned(address indexed from, uint256 amount, uint256 ethAmount, uint256 time)
```

Luckily, `kamu` supports multiple data sources, including **blockchain logs**!

To read these events we can define a dataset that looks like this:

```yaml
---
kind: DatasetSnapshot
version: 1
content:
  name: net.rocketpool.reth.tokens-minted
  kind: Root
  metadata:
    - kind: SetPollingSource
      fetch:
        kind: EthereumLogs
        chainId: 1 # Ethereum Mainnet
        signature: |
          TokensMinted(
            address indexed to,
            uint256 amount,
            uint256 ethAmount,
            uint256 time
          )
        # Using contract deployment block to limit scanning
        filter: |
          address = X'ae78736cd615f374d3085123a210448e74fc6393'
          and
          block_number > 13325304
      read:
        kind: Parquet
      preprocess:
        kind: Sql
        engine: datafusion
        query: |
          <some pre-processing with SQL>
      merge:
        kind: Append
```
The `fetch` block above specifies the event signature that `kamu` can use to find and **decode** the event. The optional `filter` block is similar to SQL's `WHERE` and used to limit the number of blocks a node has to scan to find these events. Combining the two `kamu` will construct the most optimal RPC request and fetch the data from an Ethereum node.

<div class="alert alert-block alert-success">

You can try ingesting this data yourself by adding and pulling our pre-made dataset:
    
<p style="background:black">
<code style="background:black;color:white">kamu add datasets/net.rocketpool.reth.tokens-minted.yaml
kamu pull net.rocketpool.reth.tokens-minted
</code>
</p>
</div>

But there's an even better way...

# Getting datasets from IPFS
The only thing better than doing an easy task is if **someone else did it for you**... and that's the whole point of [Open Data Fabric network](https://docs.kamu.dev/odf/) - doing the data prep work once and sharing results with the whole world.

Pulling data that is already in the newtwork is super easy.

<div class="alert alert-block alert-success">
    
Just run the following commands:

<p style="background:black">
<code style="background:black;color:white">kamu pull "ipns://net.rocketpool.reth.tokens-minted.ipns.kamu.dev" --as net.rocketpool.reth.tokens-minted
</code>
</p>
</div>

<div class="alert alert-block alert-warning">
Pulling from IPFS may take a few minutes, so if you'd like to sacrifice the "full decentralized data experience" for speed you can also pull data from Kamu Node:
    
<p style="background:black">
<code style="background:black;color:white">kamu pull ${KAMU_NODE_URL}kamu/net.rocketpool.reth.tokens-minted
</code>
</p>
</div>

Lots of cool things are happening in this one command:
- Beforehand we've [prepared](https://github.com/kamu-data/kamu-cli/blob/master/images/demo/user-home/02%20-%20Web3%20Data%20%28Ethereum%20trading%20example%29/datasets/) the rETH logs datasets for you
- These datasets [managed by the Kamu Node](https://platform.demo.kamu.dev/kamu/net.rocketpool.reth.mint-burn?tab=lineage) are periodically refreshed to have latest data
- Datasets are then replicated into [IPFS](https://ipfs.io) - an "Inter-Planetary File System"
- Using DNS everyone can refer to them e.g. as `ipns://net.rocketpool.reth.tokens-minted.ipns.kamu.dev`
- The DNS record resolves into an IPFS hash (`CID`) of the latest version of the dataset
- `kamu` uses the CID to download the entire dataset block-by-block
- Next time you do `kamu pull net.rocketpool.reth.tokens-minted` only the new blocks will be downloaded (a minimal update)

So we are pulling **public ledger** data that is parsed into **analytical data format** by **verifiable code** and stored in a **globally-decentralized file system** as a **near-real-time data stream**.

...Neat!

<div class="alert alert-block alert-info">

Try out the following commands (add `--help` to read what they do):
    
<p style="background:black">
<code style="background:black;color:white">kamu list
kamu tail net.rocketpool.reth.tokens-minted
kamu log net.rocketpool.reth.tokens-minted
kamu inspect schema net.rocketpool.reth.tokens-minted
kamu repo alias list
</code>
</p>
</div>

### Get the rest of data
Now let's quickly get the rest of the `rocketpool` datasets:

<div class="alert alert-block alert-success">
    
Pull existing dataset from IPFS:
    
<p style="background:black">
<code style="background:black;color:white">kamu pull "ipns://net.rocketpool.reth.tokens-burned.ipns.kamu.dev" --as net.rocketpool.reth.tokens-burned
kamu pull "ipns://net.rocketpool.reth.mint-burn.ipns.kamu.dev" --as net.rocketpool.reth.mint-burn
</code>
</p>
</div>

<div class="alert alert-block alert-warning">
Pulling from IPFS may take a few minutes, so if you'd like to sacrifice the "full decentralized data experience" for speed you can also pull data from Kamu Node:
    
<p style="background:black">
<code style="background:black;color:white">kamu pull ${KAMU_NODE_URL}kamu/net.rocketpool.reth.tokens-burned
kamu pull ${KAMU_NODE_URL}kamu/net.rocketpool.reth.mint-burn
</code>
</p>
</div>

# Visualizing Data
Data is in, now let's visualize it.

If you run `kamu tail net.rocketpool.reth.mint-burn` - you can see that it combines all mint and burn transactions of the rETH token into one dataset.

Using it we can now create the **instantaneous buy/sell exchange rate graph**:

<div class="alert alert-block alert-warning">
<details>
<summary style="display:list-item">Need a quick refresher on using <b>kamu's Jupyter notebooks</b>?</summary>

Jupyter notebook you're using now runs either on our demo server (https://demo.kamu.dev) or can be launched with `kamu notebook` command in your own workspace when you have the tool installed.
    
To start working with data:
- Import `kamu` Python library
- Create a connection to the node
- Using `file://` as a URL will start and connec to a local SQL server
- (Optionally) Load Jupyter extension to enable `%%sql` cell magic

The `%%sql` cells will execute queries in `kamu`'s powerful SQL engines and return the results as Pandas dataframe.
    
</details>
</div>

In [None]:
%load_ext kamu
import kamu

con = kamu.connect("file://")
print("Connected to kamu via", con)

In [None]:
%%sql
select * from 'net.rocketpool.reth.mint-burn' limit 3

In [None]:
%%sql -o reth_pool -q
--## The -o <name> option above downloads the SQL query result into the notebook as Pandas dataframe
--## The -q flag skips displaying the data

select 
    event_time, 
    case 
        when event_name = 'TokensMinted' then 'Mint'
        when event_name = 'TokensBurned' then 'Burn'
    end as event_name, 
    avg(eth_amount / amount) as rate
from "net.rocketpool.reth.mint-burn"
group by event_time, event_name
order by 1

In [None]:
import pandas as pd
import hvplot.pandas
pd.set_option('max_colwidth', None)

reth_pool.hvplot.step(
    x="event_time", 
    by="event_name", 
    width=900, height=600, 
    legend='top_left', grid=True, 
    title="ETH : rETH Ratio (Minting and Burning)",
)

From this we can tell that Rocketpool so far is fulfilling its promise of steady staking returns.

## Getting ETH to USD exchange rate

While we're at it, let's also use the same mechanism to get the ETH to USD exchange rate:

<div class="alert alert-block alert-success">
    
Pull existing dataset from IPFS:
    
<p style="background:black">
<code style="background:black;color:white">kamu pull "ipns://com.cryptocompare.ohlcv.eth-usd.ipns.kamu.dev" --as com.cryptocompare.ohlcv.eth-usd
</code>
</p>
</div>

<div class="alert alert-block alert-warning">
Pulling from IPFS may take a few minutes, so if you'd like to sacrifice the "full decentralized data experience" for speed you can also pull data from Kamu Node:
    
<p style="background:black">
<code style="background:black;color:white">kamu pull ${KAMU_NODE_URL}kamu/com.cryptocompare.ohlcv.eth-usd
</code>
</p>
</div>

In [None]:
%%sql
select * from "com.cryptocompare.ohlcv.eth-usd"
order by event_time desc 
limit 3

In [None]:
%%sql -o eth2usd -q
select
    event_time,
    open,
    close
from "com.cryptocompare.ohlcv.eth-usd"
order by event_time

In [None]:
eth2usd.hvplot.line(
    x="event_time",
    y="close",
    height=500, 
    width=800,
)

## Ingesting Account Data from Etherscan
Let's get data about our account now.

This dataset will be personalized, so we don't have it prepared. Instead, we will create our own Root datasets using data from the [Etherscan API](https://etherscan.io/) (free API tier will be enough for our needs).

<div class="alert alert-block alert-success">

Add datasets and pull data:

<p style="background:black">
<code style="background:black;color:white">kamu add datasets/account.tokens.transfers.yaml datasets/account.transactions.yaml
kamu pull account.tokens.transfers account.transactions
</code>
</p>
</div>

The key part of the `account.tokens.transfers` dataset manifest is:
```yaml
kind: DatasetSnapshot
version: 1
content:
  name: account.tokens.transfers
  kind: Root
  metadata:
    - kind: SetPollingSource
      fetch:
        kind: Url
        url: "https://api.etherscan.io/api\
          ?module=account\
          &action=tokentx\
          &address=0xeadb3840596cabf312f2bc88a4bb0b93a4e1ff5f\
          &page=1\
          &offset=1000\
          &startblock=0\
          &endblock=99999999"
      prepare: ...
      read: ...
      preprocess: ...
      merge:
        kind: Ledger
        primaryKey:
          - transaction_hash
```

We are asking Etherscan to return us all ERC-20 token transactions involving account `0xeadb3840596cabf312f2bc88a4bb0b93a4e1ff5f` since the beginning of time (`startblock=0`) and merging them with existing data (if any) as using the `ledger` [merge strategy](https://docs.kamu.dev/cli/ingest/merge-strategies/).

<div class="alert alert-block alert-warning">

Here we are using some **random person's account address** who performed many rETH transactions.
    
We picked it for illustration purposes only, and once you're done with the demo you can get this pipeline and **substitute your own wallet address**!

</div>

In [None]:
%%sql
select * from "account.tokens.transfers"
order by block_number desc
limit 3

In [None]:
%%sql
select
    token_name as 'Token', 
    sum(abs(cast(value as double)) / pow(10, cast(token_decimal as int))) as 'Volume Traded'
from "account.tokens.transfers"
group by 1

<div class="alert alert-block alert-info">

Try switching to **Bar** and **Pie** visualization types above.

</div>

As you can see the `account.tokens.transfers` dataset gives us the **number of tokens transferred**, and by looking at the `from` / `to` addresses we can tell if token was given or taken away out from our account.

... But, we don't know **for how much** `ETH` the tokens were bought or sold for.

This is why we need the `account.transactions` dataset that contains all account transactions along with their `ETH` value.

In [None]:
%%sql
select *
from "account.transactions"
order by block_number desc
limit 3

In [None]:
%%sql -o transactions -q
select
    *, 
    cast(value as double) / pow(10, 18) as value_eth 
from "account.transactions"
order by block_number desc

In [None]:
transactions.hvplot.scatter(
    x="block_time",
    y="value_eth",
    title="Account Transactions in ETH",
    xlabel="Time",
    ylabel="ETH",
    color="red",
    alpha=0.5,
)

## Tracking token portfolio using derivative datasets

To understand our "portfolio" of tokens we would like to have a dataset that:
- Contains individual token transactions along with book/sell price in ETH
- Tracks cumulative number of tokens held per each type
- Tracks cumulative book price in ETH

<div class="alert alert-block alert-success">

We achieve this using the following derivative dataset:

<p style="background:black">
<code style="background:black;color:white">kamu add datasets/account.tokens.portfolio.yaml
kamu pull account.tokens.portfolio
</code>
</p>
</div>

The key parts of this dataset look like this:

```yaml
---
kind: DatasetSnapshot
version: 1
content:
  name: account.tokens.investments
  kind: Derivative
  metadata:
    - kind: SetTransform
      inputs:
        - datasetRef: account.tokens.transfers
        - datasetRef: account.transactions
      transform:
        kind: Sql
        engine: flink
        queries:
          # Convert token transfers into (token_type, +/- delta) form
          - alias: token_transfers
            query: ...
          # Convert ETH transactions into (transaction, +/- delta) form
          - alias: transactions
            query: ...
          # JOIN the `token_transfers` and `transactions` datasets
          - alias: token_transactions
            query: |
              select
                tr.block_time,
                tr.block_number,
                tr.transaction_hash,
                tx.symbol as account_symbol,
                tr.token_symbol,
                tr.token_amount,
                tx.eth_amount
              from token_transfers as tr
              left join transactions as tx
              on 
                tr.transaction_hash = tx.transaction_hash
                and tr.block_time = tx.block_time
          # Use a window function to calculate cumulative balance and book value
          - query: |
              select
                *,
                sum(token_amount) over (partition by token_symbol order by block_time) as token_balance,
                sum(-eth_amount) over (partition by token_symbol order by block_time) as token_book_value_eth
              from token_transactions
```
Remember that this is a **Streaming SQL** - we are not joining tables, but rather two potentially real-time and infinite streams of data. 

This particular type is a [Stream-to-Stream JOIN](https://docs.kamu.dev/cli/transform/joins-s2s/).

In the next chapter we will explore why stream processing model is such a big deal.

In [None]:
%%sql -o portfolio -q
select * from "account.tokens.portfolio"

In [None]:
portfolio[
    portfolio.token_symbol == "rETH"
].hvplot.scatter(
    x="block_time",
    y="token_amount",
    color="orange",
    title="rETH Buy/Sell Transactions",
)

In [None]:
r = portfolio[
    portfolio.token_symbol == "rETH"
]
r.hvplot.step(
    x="block_time",
    xlabel="Time",
    y="token_balance",
    ylabel="rETH",
    title="rETH Amount Held",
) * r.hvplot.scatter(
    x="block_time",
    y="token_balance",
    c="k",
    alpha=0.5,
)

## Portfolio Market Value
The questions we would like to answer next are:
- What our token portfolio's **market value in ETH**
- What are the approximate **book and market values in USD**

For the last one we will start with an intermediate step (that will help us later) and create a derivative dataset with book values in USD per every portfolio transaction.

<div class="alert alert-block alert-success">

Add and pull the prepared dataset:
    
<p style="background:black">
<code style="background:black;color:white">kamu add datasets/account.tokens.portfolio.usd.yaml
kamu pull account.tokens.portfolio.usd
</code>
</p>
</div>

Here's how this dataset is defined:

```yaml
---
kind: DatasetSnapshot
version: 1
content:
  name: account.tokens.portfolio.usd
  kind: Derivative
  metadata:
    - kind: SetTransform
      inputs:
        - datasetRef: account.tokens.portfolio
        - datasetRef: com.cryptocompare.ohlcv.eth-usd
      transform:
        kind: Sql
        engine: flink
        # Set up temporal table functions that turn our stream of exchange rates
        # into a 3-dimensional (rows + columns + time) lookup table
        temporalTables:
          - name: com.cryptocompare.ohlcv.eth-usd
            primaryKey:
              - from_symbol
        queries:
          - alias: with_usd_amount
            # Use Temporal Table JOIN to convert ETH to USD using exchange rate
            # at the time of each individual transaction
            query: |
              select
                tr.block_time,
                tr.block_number,
                tr.transaction_hash,
                tr.account_symbol,
                tr.token_symbol,
                tr.token_amount,
                tr.eth_amount,
                tr.token_balance,
                tr.token_book_value_eth,
                'usd' as account_anchor_symbol,
                (
                  tr.eth_amount * eth2usd.`close`
                ) as eth_amount_as_usd
              from `account.tokens.portfolio` as tr
              join `com.cryptocompare.ohlcv.eth-usd` for system_time as of tr.block_time as eth2usd
              on tr.account_symbol = eth2usd.from_symbol and eth2usd.to_symbol = 'usd'
          # Cumulative sum to derive the book value in USD
          - query: |
              select
                *,
                sum(-eth_amount_as_usd) over (partition by token_symbol order by block_time) as token_book_value_eth_as_usd
              from with_usd_amount
    - kind: setVocab
      eventTimeColumn: block_time
```

When converting between two currencies it's common for accountants to use some **average exchange rates** for a moth or event a whole year periods. 

<mark>This sacrifices accuracy for the sake of simplicity</mark> and would work poorly for cryptocurrencies that still exhibit a lot of volatility. 

Can we get **both accuracy and simplicity**, so that for every single transaction we used the exchange rate as it was **at the time of that transaction**?

This is where a [Temporal-Table JOIN](https://docs.kamu.dev/cli/learn/examples/stock-trading/#calculating-current-market-value-of-held-positions) can help us. It transforms an exchange rate stream into a kind of a lookup table which can be indexed by time to get the appropriate exchange rate.

Now we have a very rich dataset containing detailed information per every portfolio transaction, and also the **cumulative balance** of every position in the portfolio, i.e. the "portfolio state".

The **market value** is basically how much money we would get at different points in time if we decided to liquidate our entire portfolio. To produce it we will use the same exact type of JOIN as before, but instead of joining exchange rates onto transactions we flip the direction and join the state of our portfolio onto every exchange rate data point.

<div class="alert alert-block alert-success">

Add and pull the prepared dataset:

<p style="background:black">
<code style="background:black;color:white">&dollar; kamu add datasets/account.tokens.portfolio.market-value.yaml
&dollar; kamu pull account.tokens.portfolio.market-value
</code>
</p>
</div>

Here's how the market value dataset is defined:


```yaml
---
kind: DatasetSnapshot
version: 1
content:
  name: account.tokens.portfolio.market-value
  kind: Derivative
  metadata:
    - kind: SetTransform
      inputs:
        - datasetRef: account.tokens.portfolio.usd
        - datasetRef: net.rocketpool.reth.mint-burn
        - datasetRef: com.cryptocompare.ohlcv.eth-usd
      transform:
        kind: Sql
        engine: flink
        temporalTables:
          - name: account.tokens.portfolio.usd
            primaryKey:
              - token_symbol
          - name: com.cryptocompare.ohlcv.eth-usd
            primaryKey:
              - from_symbol
        queries:
          - alias: market_value_reth2eth
            query: |
              select
                rp.event_time,
                tr.account_symbol,
                tr.token_symbol,
                tr.token_balance,
                tr.token_book_value_eth,
                (
                  rp.eth_amount / rp.amount * tr.token_balance
                ) as token_market_value_eth,
                tr.token_book_value_eth_as_usd
              from `net.rocketpool.reth.mint-burn` as rp
              join `account.tokens.portfolio.usd` for system_time as of rp.event_time as tr
              on rp.token_symbol = tr.token_symbol
          - query: |
              select
                rp.event_time,
                rp.account_symbol,
                rp.token_symbol,
                rp.token_balance,
                rp.token_book_value_eth,
                rp.token_market_value_eth,
                rp.token_book_value_eth_as_usd,
                (
                  rp.token_market_value_eth * eth2usd.`close`
                ) as token_market_value_usd
              from market_value_reth2eth as rp
              join `com.cryptocompare.ohlcv.eth-usd` for system_time as of rp.event_time as eth2usd
              on eth2usd.from_symbol = rp.account_symbol and eth2usd.to_symbol = 'usd'
```

In [None]:
%%sql -o market_value -q
select * from "account.tokens.portfolio.market-value"

In [None]:
market_value.hvplot.line(
    x="event_time", 
    y=["token_book_value_eth", "token_market_value_eth"],
    xlabel="Time",
    ylabel="ETH",
    legend="bottom_right",
    title="rETH: Book vs Market Value in ETH",
    height=500,
    width=800,
)

In [None]:
market_value.hvplot.line(
    x="event_time",
    y=["token_book_value_eth_as_usd", "token_market_value_usd"],
    xlabel="Time",
    ylabel="USD",
    legend="bottom_right",
    title="rETH: Book vs Market Value in USD",
    height=500,
    width=800,
)

---

## Summary

Phew... We've covered a lot of stuff!

To recap, here's the **outline of the pipeline** we just created:

![blah](files/pipeline-1.png)

```
┌───────────────────────────────────────┬──────────────┬─────────────────────────────────────────────────┐
│                 Name                  │     Kind     │                  Description                    │
├───────────────────────────────────────┼──────────────┼─────────────────────────────────────────────────┤
│ net.rocketpool.reth.tokens-minted     │     Root     │ rETH contract events (pulled from Ethereum node)│
│ net.rocketpool.reth.tokens-burned     │     Root     │ rETH contract events (pulled from Ethereum node)│
│ net.rocketpool.reth.mint-burn         │  Derivative  │ Combined rETH transactions                      │
│ com.cryptocompare.ohlcv.eth-usd       │     Root     │ ETH to USD exchange rate (pulled from IPFS)     │
│ account.tokens.transfers              │     Root     │ Wallet token transfers (sourced from Etherscan) │
│ account.transactions                  │     Root     │ Wallet transactions (sourced from Etherscan)    │
│ account.tokens.portfolio              │  Derivative  │ Tokens portfolio with book prices & amount held │
│ account.tokens.portfolio.usd          │  Derivative  │ Tokens portfolio with USD book prices           │
│ account.tokens.portfolio.market-value │  Derivative  │ Tokens portfolio market value in ETH and USD    │
└───────────────────────────────────────┴──────────────┴─────────────────────────────────────────────────┘
```

Surely putting this pipeline together takes time. Things get much faster as you get more experience with different types of streaming JOINs. They get **much-much faster** if you collaborate and reuse pipelines made by others.

The good thing is, whether you're ingesting external data or building processing pipelines with `kamu`, **you only have to do it once**. While data is flowing, your queries will continue to produce **up-to-date results with minimal maintenance effort**.

We will cover some advanced aspects of why streaming pipelines are much more autonomous than batch in the next chapter, so please follow along!