Skip to content

Kinesis source fails with "channel closed" errors after shard resharding #5896

@earlbread

Description

@earlbread

Describe the bug
Kinesis source cannot be created or started on streams that have undergone resharding (split/merge operations). The source repeatedly fails with "channel closed" errors and enters an infinite restart loop.

Based on code investigation, the likely cause is:

  1. No differentiation between closed and active shards: When calling list_shards, Quickwit receives both closed (parent) shards and active (child) shards after resharding, but attempts to create consumers for all of them
  2. Attempting to read from closed shards: The source tries to get shard iterators and read data from shards that have ending_sequence_number set (indicating they are closed)
  3. Missing shard lineage tracking: Quickwit doesn't track parent-child relationships between shards, so it cannot properly transition from closed parent shards to their active children

Steps to reproduce (if applicable)

  1. Have a Kinesis stream that has undergone resharding (split or merge operations)
  2. Attempt to create a new Quickwit source for this stream
  3. Observe that the source starts failing with repeated restarts

Expected behavior
The Kinesis source should handle shard resharding gracefully and continue processing data from the new shard configuration.

Actual Behavior

  • Cannot successfully create a working source on a resharded stream
  • Source enters an infinite failure loop immediately after creation
  • All KinesisShardConsumer actors terminate with Failure(channel closed) errors
  • Source restarts approximately every minute but never succeeds
  • Deleting and recreating the source does not resolve the issue

Configuration:

  • Quickwit Version: qw-airmail-20250522-hotfix
  • Shard configuration:
    • Stream has 7 shards (shardId-000000000000 to shardId-000000000006)
    • Stream previously underwent resharding (1 -> 2 -> 4)

Logs
Pattern repeats every ~60 seconds

2025-09-15 18:00:22.391 | 2025-09-15T09:00:22.391Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=KinesisShardConsumer-falling-mwVX exit_status=killed |  
-- | -- | --
  |   | 2025-09-15 18:00:22.328 | 2025-09-15T09:00:22.328Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=KinesisShardConsumer-twilight-Q4N4 exit_status=killed |  
  |   | 2025-09-15 18:00:22.321 | 2025-09-15T09:00:22.321Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=KinesisShardConsumer-ancient-ixPf exit_status=killed |  
  |   | 2025-09-15 18:00:22.294 | 2025-09-15T09:00:22.294Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=KinesisShardConsumer-bitter-pE4p exit_status=killed |  
  |   | 2025-09-15 17:59:22.476 | 2025-09-15T08:59:22.476Z ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=KinesisShardConsumer-floral-WDmb exit_status=Failure(channel closed) |  
  |   | 2025-09-15 17:59:22.476 | 2025-09-15T08:59:22.476Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=KinesisShardConsumer-floral-WDmb exit_status=failure(cause=channel closed) |  
  |   | 2025-09-15 17:59:22.278 | 2025-09-15T08:59:22.278Z ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=KinesisShardConsumer-little-4Tut exit_status=Failure(channel closed) |  
  |   | 2025-09-15 17:59:22.278 | 2025-09-15T08:59:22.278Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=KinesisShardConsumer-little-4Tut exit_status=failure(cause=channel closed) |  
  |   | 2025-09-15 17:59:22.276 | 2025-09-15T08:59:22.276Z ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=KinesisShardConsumer-holy-xeKo exit_status=Failure(channel closed) |  
  |   | 2025-09-15 17:59:22.276 | 2025-09-15T08:59:22.276Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=KinesisShardConsumer-holy-xeKo exit_status=failure(cause=channel closed) |  
  |   | 2025-09-15 17:59:22.238 | 2025-09-15T08:59:22.238Z  INFO quickwit_indexing::source::kinesis::kinesis_source: Starting Kinesis source. stream_name=earlbread-kinesis-test-stream assigned_shards=shardId-000000000000, shardId-000000000001, shardId-000000000002, shardId-000000000003, shardId-000000000004, shardId-000000000005, shardId-000000000006

Stream Info

{
  "StreamDescriptionSummary": {
    "StreamName": "earlbread-kinesis-test-stream",
    "StreamARN": "arn:aws:kinesis:ap-northeast-2:314695318048:stream/earlbread-kinesis-test-stream",
    "StreamStatus": "ACTIVE",
    "StreamModeDetails": {
      "StreamMode": "PROVISIONED"
    },
    "RetentionPeriodHours": 24,
    "StreamCreationTimestamp": "2025-09-15T17:39:46+09:00",
    "EnhancedMonitoring": [
      {
        "ShardLevelMetrics": []
      }
    ],
    "EncryptionType": "NONE",
    "OpenShardCount": 4,
    "ConsumerCount": 0
  }
}

Shard List

{
  "ShardId": "shardId-000000000000",
  "Status": "CLOSED",
  "ParentShardId": null,
  "AdjacentParentShardId": null,
  "StartingHashKey": "0",
  "EndingHashKey": "340282366920938463463374607431768211455",
  "StartingSequenceNumber": "49667106115779825810338104373766199978273704206997127170",
  "EndingSequenceNumber": "49667106115790976182937369685335758911590473956837031938"
}
{
  "ShardId": "shardId-000000000001",
  "Status": "CLOSED",
  "ParentShardId": "shardId-000000000000",
  "AdjacentParentShardId": null,
  "StartingHashKey": "0",
  "EndingHashKey": "170141183460469231731687303715884105727",
  "StartingSequenceNumber": "49667106428726183181318338918936934498348221487401402386",
  "EndingSequenceNumber": "49667106428737333553917604230506493431664933512880848914"
}
{
  "ShardId": "shardId-000000000002",
  "Status": "CLOSED",
  "ParentShardId": "shardId-000000000000",
  "AdjacentParentShardId": null,
  "StartingHashKey": "170141183460469231731687303715884105728",
  "EndingHashKey": "340282366920938463463374607431768211455",
  "StartingSequenceNumber": "49667106428748483926516869542078470216620869848907382818",
  "EndingSequenceNumber": "49667106428759634299116134853648029149937581874386829346"
}
{
  "ShardId": "shardId-000000000003",
  "Status": "OPEN",
  "ParentShardId": "shardId-000000000001",
  "AdjacentParentShardId": null,
  "StartingHashKey": "0",
  "EndingHashKey": "85070591730234615865843651857942052863",
  "StartingSequenceNumber": "49667106441972825829245529065009151152301350764574408754",
  "EndingSequenceNumber": null
}
{
  "ShardId": "shardId-000000000004",
  "Status": "OPEN",
  "ParentShardId": "shardId-000000000001",
  "AdjacentParentShardId": null,
  "StartingHashKey": "85070591730234615865843651857942052864",
  "EndingHashKey": "170141183460469231731687303715884105727",
  "StartingSequenceNumber": "49667106441995126574444059688150686870573999126080389186",
  "EndingSequenceNumber": null
}
{
  "ShardId": "shardId-000000000005",
  "Status": "OPEN",
  "ParentShardId": "shardId-000000000002",
  "AdjacentParentShardId": null,
  "StartingHashKey": "170141183460469231731687303715884105728",
  "EndingHashKey": "255211775190703847597530955573826158591",
  "StartingSequenceNumber": "49667106442017427319642590311292222588846647487586369618",
  "EndingSequenceNumber": null
}
{
  "ShardId": "shardId-000000000006",
  "Status": "OPEN",
  "ParentShardId": "shardId-000000000002",
  "AdjacentParentShardId": null,
  "StartingHashKey": "255211775190703847597530955573826158592",
  "EndingHashKey": "340282366920938463463374607431768211455",
  "StartingSequenceNumber": "49667106442039728064841120934433758307119295849092350050",
  "EndingSequenceNumber": null
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions