Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to clean HashAgg state based on watermark #8038

Closed
TennyZhuang opened this issue Feb 20, 2023 · 1 comment
Closed

Failed to clean HashAgg state based on watermark #8038

TennyZhuang opened this issue Feb 20, 2023 · 1 comment
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@TennyZhuang
Copy link
Collaborator

TennyZhuang commented Feb 20, 2023

Describe the bug

No response

To Reproduce

Steps

  1. Kafka-client is required

    brew install kafka
  2. Enable kafka in risedev configure

    ./risedev configure
    [x] [Component] Hummock: MinIO + MinIO-CLI
    [x] [Component] Metrics: Prometheus + Grafana
    [x] [Component] Etcd
    [x] [Component] Kafka
    [x] [Build] Rust components
    
    
  3. Start the cluster

    ./risedev d full
    ✅ tmux: session risedev
    ✅ prepare: all previous services have been stopped
    ✅ minio: api http://127.0.0.1:9301/, console http://127.0.0.1:9400/
    ✅ etcd-2388: waiting for online...
    ✅ meta-node-5690: api grpc://127.0.0.1:5690/, dashboard http://127.0.0.1:5691/
    ✅ compute-node-5688: api grpc://127.0.0.1:5688/
    ✅ frontend-4566: api postgres://127.0.0.1:4566/
    ✅ compactor-6660: compactor 127.0.0.1:6660
    ✅ prometheus: api http://127.0.0.1:9500/
    ✅ grafana: dashboard http://127.0.0.1:3001/
    ✅ zookeeper-2181: zookeeper 127.0.0.1:2181
    ✅ kafka-29092: kafka 127.0.0.1:29092
    
  4. Create a kafka topic

    kafka-topics --bootstrap-server localhost:29092 --create --topic testwm --partitions 10
  5. Create the source and table in risingwave.

    psql -h localhost -p 4566 -d dev -U root
    create source if not exists testwm (
      v1 int, v2 int, watermark for v1 as v1 - 3
    ) with (
      connector = 'kafka',
      topic = 'testwm',
      properties.bootstrap.server = 'localhost:29092'
    ) row format json;
    create materialized view mv1 as
    select sum(v2) from testwm group by v1;
  6. Ingest some records

    echo '{"v1": 10, "v2": 20}' |
    perl -ne 'for$i(0..1000){print}' |
    kafka-console-producer --bootstrap-server localhost:29092 --topic testwm
  7. Query the hashagg internal states

    SHOW INTERNAL TABLES;
    SELECT * FROM __internal_mv1_10001_hashaggresult_11003;

    We can found one record where v1 = 10

  8. Ingest more records with v1 increased by at least 4.

    echo '{"v1": 20, "v2": 20}' |
    perl -ne 'for$i(0..1000){print}' |
    kafka-console-producer --bootstrap-server localhost:29092 --topic testwm
  9. Query the hashagg internal states, and there are two records, the old record still exist, so state cleaning failed.

Investigation

In state_table.rs, end of batch_write_rows

dbg!(&range_begin_suffix, &range_end_suffix);

We can found that the delete_range is called correctly.

Expected behavior

The old record should be cleaned.

Additional context

Likely a storage bug.

@TennyZhuang TennyZhuang added the type/bug Something isn't working label Feb 20, 2023
@github-actions github-actions bot added this to the release-0.1.18 milestone Feb 20, 2023
@TennyZhuang
Copy link
Collaborator Author

Partially fixed by #8083

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants