Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbsp: New FallbackKeyBatch and FallbackValBatch types. #1656

Merged
merged 7 commits into from
Apr 18, 2024
Merged

dbsp: New FallbackKeyBatch and FallbackValBatch types. #1656

merged 7 commits into from
Apr 18, 2024

Conversation

blp
Copy link
Member

@blp blp commented Apr 18, 2024

The individual commits in this series are meaningful and it's worth looking at
them individually.

These complete the set of batch types that choose between memory or
storage implementations at creation time. These fix the runtime for
the galen benchmark, which regressed when storage was introduced because
it uses valbatches.

Is this a user-visible change (yes/no): no

blp added 7 commits April 18, 2024 09:12
`cargo doc` complained about:

```
[`Antichain`](crate::time::Antichain)
```

saying that the explicit reference was the same as the default one.

Signed-off-by: Ben Pfaff <blp@feldera.com>
I'd copy-pasted this in several places and this consolidates the
implementation.

Signed-off-by: Ben Pfaff <blp@feldera.com>
…ck`.

This code was copy-pasted privately into two modules.  It could be useful
more broadly (and it will be used more broadly in upcoming commits), so
this moves it into `dbsp::trace::cursor` and makes it public.

Signed-off-by: Ben Pfaff <blp@feldera.com>
`Cursor::map_times` and related functions are usually the right way to
work with time-diff pairs in a `Cursor`.  However, a cursor interface is
sometimes useful.  This commit adds such an interface and implements it for
the batches where it will be needed in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@feldera.com>
…mes.

This will be needed in an upcoming commit.

Signed-off-by: Ben Pfaff <blp@feldera.com>
…rger`.

This merger can merge any two batch types into a third type.  This is
useful because the "fallback" implementations can need to merge one
file or vector batch with another one to produce a third.

This commit uses the merger in `FallbackIndexedWSet` and `FallbackWSet`.
An upcoming commit will use it in `FallbackKeyBatch` and `FallbackValBatch`
as well.

Signed-off-by: Ben Pfaff <blp@feldera.com>
These complete the set of batch types that choose between memory or
storage implementations at creation time.  These fix the runtime for
the `galen` benchmark, which regressed when storage was introduced because
it uses valbatches.

Signed-off-by: Ben Pfaff <blp@feldera.com>
@blp blp added DBSP core Related to the core DBSP library performance storage Persistence for internal state in DBSP operators rust Pull requests that update Rust code labels Apr 18, 2024
@blp blp added this to the April 30, 2024 milestone Apr 18, 2024
@blp blp requested a review from ryzhyk April 18, 2024 17:30
@blp blp self-assigned this Apr 18, 2024
Copy link

Benchmark results

Nexmark

  • 3 out of 21 queries have regressed ❗
  • Compared results from 1eb7bb0 (main) with e4812c5 (PR)
name main [kOp/s] PR [kOp/s] Tput change [%] Assessment Peak RSS diff
Q0 7236.92 7144.24 -1 ✔️ -2.0 MB
Q1 6010.7 6184.04 3 ✔️ -13.1 MB
Q2 7322.11 7585.99 4 ✔️ 33.4 MB
Q3 7282.66 7742.82 6 🌲 177.6 MB
Q4 3227.54 3382.96 5 ✔️ -289.6 MB
Q5 7198.01 7258.27 1 ✔️ -289.6 MB
Q6 2930.95 2921.6 0 ✔️ -203.0 MB
Q7 2782.52 1859.55 -33 ⁉️ 4.1 GB
Q8 7920.15 7364.86 -7 🔻 4.1 GB
Q9 480.431 482.875 1 ✔️ 1.3 GB
Q12 7364.91 7242.36 -2 ✔️ 1.3 GB
Q13 2649.43 2262.16 -15 🔻 1.3 GB
Q14 7289.42 7471.55 2 ✔️ 1.3 GB
Q15 3724.57 3781.96 2 ✔️ 1.3 GB
Q16 863.62 872.742 1 ✔️ 1.3 GB
Q17 2062.42 1982.59 -4 ✔️ 1.3 GB
Q18 718.643 730.684 2 ✔️ 6.6 GB
Q19 628.444 608.652 -3 ✔️ 6.6 GB
Q20 729.398 724.158 -1 ✔️ 6.6 GB
Q21 7076.55 7042.93 0 ✔️ 6.6 GB
Q22 7398.53 7127.77 -4 ✔️ 6.6 GB

Galen

  • 1 out of 1 benchmarks have regressed ❗
  • Compared results from 1eb7bb0 (main) with e4812c5 (PR)
name main [s] PR [s] Runtime change [%] Assessment
galen 7688.81 34.3237 -100 ⁉️

@blp
Copy link
Member Author

blp commented Apr 18, 2024

You can see that this worked from the galen benchmark results above, where the runtime fell from >2 hours to 34 seconds.

@blp blp merged commit 88b2fb9 into main Apr 18, 2024
6 checks passed
@blp blp deleted the keybatch branch April 18, 2024 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DBSP core Related to the core DBSP library performance rust Pull requests that update Rust code storage Persistence for internal state in DBSP operators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants