-
Notifications
You must be signed in to change notification settings - Fork 492
Description
Describe the bug
When attempting to use ingest_v2, I noticed that sometimes the CPU usage spikes, with one core being utilized at 100%. It appears to trigger some sort of infinite loop.
Upon debugging, I found that the issue occurs when a record payload slightly exceeds batch_num_bytes (I used the default configuration DEFAULT_BATCH_NUM_BYTES=1 MiB).
Specifically, this issue is in quickwit/quickwit-ingest/src/ingest_v2/fetch.rs:
quickwit/quickwit/quickwit-ingest/src/ingest_v2/fetch.rs
Lines 148 to 151 in 8e6dc17
| if mrecord_buffer.len() + payload.len() > mrecord_buffer.capacity() { | |
| has_drained_queue = false; | |
| break; | |
| } |
Here, since the payload length exceeds batch_num_bytes, the if condition is met, and has_drained_queue is set to false. The program then loops back to the start:
| if has_drained_queue && self.shard_status_rx.changed().await.is_err() { |
Since has_drained_queue is false, the if condition fails, and the program proceeds to the subsequent code block, looping back again to:
quickwit/quickwit/quickwit-ingest/src/ingest_v2/fetch.rs
Lines 148 to 151 in 8e6dc17
| if mrecord_buffer.len() + payload.len() > mrecord_buffer.capacity() { | |
| has_drained_queue = false; | |
| break; | |
| } |
has_drained_queue is repeatedly set to false, resulting in an infinite loop.
Steps to Reproduce (if applicable)
Steps to reproduce the behavior:
When the record payload is greater than batch_num_bytes, this issue is triggered.
I wrote a test to reproduce this issue:
Stool233@33ea614
Here are the action run results:
https://github.com/Stool233/quickwit/actions/runs/10021273109/job/27699744997#step:10:2061
Expected behavior
When encountering a record payload greater than batch_num_bytes, either temporarily increase the size of batch_num_bytes to handle the record, or reject the record instead of entering an infinite loop.
I wrote a patch that temporarily increases the size of batch_num_bytes to handle such records, which works in my scenario:
- Patch: Stool233@6ddfe56
- Action run results: https://github.com/Stool233/quickwit/actions/runs/10021272530
If the community finds it appropriate, I can submit a related PR.
Configuration:
Please provide:
-
Output of
quickwit --version- Compiled from the latest main (8e6dc17)
-
The
index_config.yamlQW_ENABLE_INGEST_V2=true
indexer: enable_cooperative_indexing: true