Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a query is hanged when one of subqueries of CreatingSet throw exceptions #4195

Closed
2 tasks
fzhedu opened this issue Mar 8, 2022 · 1 comment · Fixed by #4202
Closed
2 tasks

a query is hanged when one of subqueries of CreatingSet throw exceptions #4195

fzhedu opened this issue Mar 8, 2022 · 1 comment · Fixed by #4202

Comments

@fzhedu
Copy link
Contributor

fzhedu commented Mar 8, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

issue 120 112.sql to tiflash

2. What did you expect to see? (Required)

3. What did you see instead (Required)

the minTSO query is blocked thus all queries are blocked.

4. What is your TiFlash version? (Required)

master

431657087262261385 dot

the task 9 is hanged, which contains three pipelines.

                CreateSet
                /       \
          build       probe
             |            \
        CreateSet       recv
          /      \
    build       probe
        |           |
     agg.         recv
       |
     recv
void CreatingSetsBlockInputStream::createAll()
{
    if (!created)
    {
        for (auto & subqueries_for_sets : subqueries_for_sets_list)
        {
            for (auto & elem : subqueries_for_sets)
            {
                if (elem.second.join)
                    elem.second.join->setFinishBuildTable(false);
            }
        }
        Stopwatch watch;
        auto thread_manager = newThreadManager();
        for (auto & subqueries_for_sets : subqueries_for_sets_list)
        {
            for (auto & elem : subqueries_for_sets)
            {
                if (elem.second.source) /// There could be prepared in advance Set/Join - no source is specified for them.
                {
                    if (isCancelledOrThrowIfKilled())
                        return;
                    thread_manager->schedule(true, "CreatingSets", [this, &item = elem.second] { createOne(item); });
                    FAIL_POINT_TRIGGER_EXCEPTION(FailPoints::exception_in_creating_set_input_stream);
                }
            }
        }

        thread_manager->wait();

        LOG_DEBUG(log, "Creat all tasks of " << mpp_task_id.toString() << " take " << watch.elapsedSeconds() << " sec. ");
        .....

we found the last log is missed, so infer the wait() is blocked. The wait() waits the future being ready from the packed_task, which executes the createOne(). the is a bug when the createOne() does not catch all exceptions, the futrue is blocked.

  • catch all exceptions for createOne()
  • check other places where this bug would occur.
@fzhedu fzhedu added the type/bug Issue for bug label Mar 8, 2022
@fzhedu
Copy link
Contributor Author

fzhedu commented Mar 8, 2022

root cause:

Query pipeline:
CreatingSets
 Union
  HashJoinBuildBlockInputStream × 20
   Expression
    Expression
     Expression
      Expression
       Expression
        SharedQuery
         ParallelAggregating
          Expression × 20
           Squashing
            TiRemoteBlockInputStream(ExchangeReceiver)
 Union
  HashJoinBuildBlockInputStream × 20
   Expression
    Expression
     Expression
      Expression
       Expression
        Squashing
         TiRemoteBlockInputStream(ExchangeReceiver)
 Union
  ExchangeSender × 20
   Expression
    Expression
     SharedQuery
      ParallelAggregating
       Expression × 20
        Expression
         Expression
          Expression
           Expression
            Expression
             Expression
              Expression
               Expression
                Expression
                 Squashing
                  TiRemoteBlockInputStream(ExchangeReceiver)

the second subquery has a join probe, whitch depends on the first subquery. if the first finished with exceptions or not, it should set the FinishBuild CV to be true, otherwise the second subquery will be blocked.
the bug comes that the first subquery finishes with exception but does not set the the FinishBuild CV to be true, thus the second subquery is blocked.

we set the log before and after the FinishBuild CV, some thread just outputs the log before the CV, lacking the after one.

431684127876710482
["Join: begin waiting finishing build"] [thread_id=10513] 找不到对应的 end
[2022/03/08 20:58:11.078 +08:00] [DEBUG] [Join.cpp:1773] ["Join: begin waiting finishing build"] [thread_id=10647]
[2022/03/08 20:58:11.078 +08:00] [DEBUG] [Join.cpp:1773] ["Join: begin waiting finishing build"] [thread_id=7465]
[2022/03/08 20:58:11.082 +08:00] [DEBUG] [Join.cpp:1773] ["Join: begin waiting finishing build"] [thread_id=9907]
[2022/03/08 20:58:11.087 +08:00] [DEBUG] [Join.cpp:1773] ["Join: begin waiting finishing build"] [thread_id=13717]

@fzhedu fzhedu self-assigned this Mar 8, 2022
JaySon-Huang pushed a commit to JaySon-Huang/tiflash that referenced this issue Mar 17, 2022
fzhedu added a commit to fzhedu/tiflash that referenced this issue Apr 14, 2022
fzhedu pushed a commit to ti-chi-bot/tiflash that referenced this issue Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment