Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiCDC can not write cdclog to s3 when table is big. #1259

Closed
dengqee opened this issue Dec 31, 2020 · 3 comments
Closed

TiCDC can not write cdclog to s3 when table is big. #1259

dengqee opened this issue Dec 31, 2020 · 3 comments
Labels
area/ticdc Issues or PRs related to TiCDC. severity/major type/bug The issue is confirmed as a bug.

Comments

@dengqee
Copy link
Contributor

dengqee commented Dec 31, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.
    I created a s3-sink changefeed, and created a 100000 rows table using sysbench.
    create changefeed command:
cdc cli changefeed create --sink-uri="s3://${BUCKET_NAME}?endpoint=http://${S3_HOST}&access-Key=${ACCESS_KEY}&secret-Access-Key=${SECRET_KEY}"  --changefeed-id=s3-sink

create table command:

sysbench --mysql-host=127.0.0.1 --mysql-user=root --mysql-port=4000 --mysql-db=test oltp_insert --tables=1 --table-size=100000 prepare
  1. What did you expect to see?
    On s3 there are ddls, t_xxx and log.meta.

  2. What did you see instead?
    On S3 there are only ddls and log.meta, no t_xxx. The state of changefeed is normal, and checkpoint-ts never updates.

  3. Versions of the cluster

    • Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | tidb_version()                                                                                                                                                                                                                                                                                                     |
      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Release Version: v4.0.9
      Edition: Community
      Git Commit Hash: 69f05ea55e8409152a7721b2dd8822af011355ea
      Git Branch: heads/refs/tags/v4.0.9
      UTC Build Time: 2020-12-21 04:26:49
      GoVersion: go1.13
      Race Enabled: false
      TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
      Check Table Before Drop: false |
      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      1 row in set (0.00 sec)
      
    • TiCDC version (execute cdc version):

      Release Version: v4.0.9
      Git Commit Hash: 04e028419387871b80ddc95377751092f03f26ae
      Git Branch: heads/refs/tags/v4.0.9
      UTC Build Time: 2020-12-19 04:52:36
      Go Version: go version go1.13 linux/amd64
@dengqee dengqee added the type/bug The issue is confirmed as a bug. label Dec 31, 2020
@dengqee
Copy link
Contributor Author

dengqee commented Dec 31, 2020

I found that the problem was caused by blocking because the l.units[hash].dataChan() was full.
https://github.com/pingcap/ticdc/blob/04e028419387871b80ddc95377751092f03f26ae/cdc/sink/cdclog/utils.go#L164-L170

@amyangfei
Copy link
Contributor

@3pointer would you please take a look?

@dengqee
Copy link
Contributor Author

dengqee commented Jan 5, 2021

I found that the problem was caused by blocking because the l.units[hash].dataChan() was full.
https://github.com/pingcap/ticdc/blob/04e028419387871b80ddc95377751092f03f26ae/cdc/sink/cdclog/utils.go#L164-L170

The l.units[hash].dataChan() is full, logSink.emitRowChangedEvents() will block, and processor.syncResolved() will block, then processor.output will no longer output row. And there is no row with OpTypeResolved, the processor.sinkEmittedResolvedTs will no longer update, then processor.sink.FlushRowChangedEvents will no longer run, which cause l.units[hash].dataChan() cant not be consumed. This is a deadlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ticdc Issues or PRs related to TiCDC. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

4 participants