IngestV2 plan apply loop is broken. #4174

fulmicoton · 2023-11-21T08:36:28Z

IngestV2 plan apply loop is broken.

IndexingPlanDiff(missing_tasks_by_node_id={"node-1": [IndexingTask { index_uid: "gharchive4:01HFRA32Z532P0P7D6SNG5WPP4", source_id: "_ingest-source", shard_ids: [7] }]}, , unplanned_tasks_by_node_id={"node-1": [IndexingTask { index_uid: "gharchive4:01HFRA32Z532P0P7D6SNG5WPP4", source_id: "_ingest-source", shard_ids: [] }]})

my understanding is that we don't expose the list of shards in chitchat yet, and that's where the diff is.

fix the serialization format of indexing task in chitchat to allow shards
fix the exposition of the indexing task in chitchat to have shard (right now we serialize an empty vec all of the time)

The text was updated successfully, but these errors were encountered:

shard lists. Also took it as an opportunity to clean up the code and increase the test coverage. Closes #4174

shard lists. Before this, there was no notion of shards and indexers were just exposing the number of pipelines per source_uid. With ingest v2, the list of shard is part of indexing tasks, and the control plane wants to diff against it. The serialization format goes `[1,2][3]` to express two pipelines with respectively shard [1,2] and `[3]`. For a source that does not have notion of shards, like kafka, two pipelines simply look as follows: `[][]` Also took it as an opportunity to clean up the code and increase the test coverage. Closes #4174

shard lists. Before this, there was no notion of shards and indexers were just exposing the number of pipelines per source_uid. With ingest v2, the list of shard is part of indexing tasks, and the control plane wants to diff against it. The serialization format goes `[1,2][3]` to express two pipelines with respectively shard [1,2] and `[3]`. For a source that does not have notion of shards, like kafka, two pipelines simply look as follows: `[][]` Also took it as an opportunity to clean up the code and increase the test coverage. Related #4174

…4180) * Fixed the serialize deserialization of the indexing state to include shard lists. Before this, there was no notion of shards and indexers were just exposing the number of pipelines per source_uid. With ingest v2, the list of shard is part of indexing tasks, and the control plane wants to diff against it. The serialization format goes `[1,2][3]` to express two pipelines with respectively shard [1,2] and `[3]`. For a source that does not have notion of shards, like kafka, two pipelines simply look as follows: `[][]` Also took it as an opportunity to clean up the code and increase the test coverage. Related #4174 Co-authored-by: François Massot <francois.massot@gmail.com>

If no shard is removed, we just kill the pipeline. If shards are added we reinitiate the shards. The pipeline supervisor keeps track of the shard, and reassigns them if the pipeline is respawned. Closes #4184 Closes #4174

Bugfix: IndexPipeline remembers list of shards and applies it after respawn Bugfix Assign shard logic. If no shard is removed, we just kill the pipeline. If shards are added we reinitiate the shards. The pipeline supervisor keeps track of the shard, and reassigns them if the pipeline is respawned. Closes #4184 Closes #4174

…espawn Bugfix Assign shard logic. If no shard is removed, we just kill the pipeline. If shards are added we reinitiate the shards. The pipeline supervisor keeps track of the shard, and reassigns them if the pipeline is respawned. Closes #4184 Closes #4174

fulmicoton added the bug Something isn't working label Nov 21, 2023

fulmicoton added this to the Distributed ingestion and indexing milestone Nov 21, 2023

fulmicoton self-assigned this Nov 22, 2023

fulmicoton added a commit that referenced this issue Nov 22, 2023

Fixed the serialize deserialization of the indexing state to include

e62b6cb

shard lists. Also took it as an opportunity to clean up the code and increase the test coverage. Closes #4174

fulmicoton added a commit that referenced this issue Nov 22, 2023

Fixed the serialize deserialization of the indexing state to include

7a9f4a6

shard lists. Also took it as an opportunity to clean up the code and increase the test coverage. Closes #4174

fulmicoton mentioned this issue Nov 22, 2023

Fixed the serialize deserialization of the indexing state to include #4180

Merged

fulmicoton added the high-priority label Nov 28, 2023

fulmicoton closed this as completed in e8f852c Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IngestV2 plan apply loop is broken. #4174

IngestV2 plan apply loop is broken. #4174

fulmicoton commented Nov 21, 2023 •

edited

IngestV2 plan apply loop is broken. #4174

IngestV2 plan apply loop is broken. #4174

Comments

fulmicoton commented Nov 21, 2023 • edited

fulmicoton commented Nov 21, 2023 •

edited