Fail to ingest data to index after index created #4256

mingshun · 2023-12-11T09:24:20Z

Describe the bug
I deploy the quickwit in my prod env with k8s. It works for months normally. But I cannot ingest data to recently created index with rest api. The POST /api/v1/indexes respond 200, but POST /api/v1/my-index/ingest respond 404 with following response body:

{
   "message": "Index `Index `my-index` not found.` not found."
}

I can get index info with GET /api/v1/indexes/my-index/describe. I seems the index has been created.

I found quickwit-metastore reports the following error everytime I create a new index:

ERROR quickwit_control_plane: Failed to notify control plane of index change. error=Unavailable("Request not delivered") event="add-source"

I tried to restart quickwit-control-plane. After restarted, I found quickwit-control-plane reports the following error:

2023-12-11T09:03:02.100Z  WARN quickwit_metastore::metastore::retrying_metastore::retry: Request failed attempt_count=5
2023-12-11T09:03:02.100Z ERROR quickwit_control_plane::scheduler: Error when controlling the running plan: `Internal error: `Error, message length too large: found 4270109 bytes, the limit is: 4194304 bytes` Cause: ``.`.
2023-12-11T09:03:05.149Z  WARN quickwit_metastore::metastore::retrying_metastore::retry: Request failed attempt_count=5
2023-12-11T09:03:05.149Z ERROR quickwit_actors::spawn_builder: actor-failure cause=Error when scheduling indexing plan

Caused by:
    Internal error: `Error, message length too large: found 4270109 bytes, the limit is: 4194304 bytes` Cause: ``. exit_status=Failure(Error when scheduling indexing plan

Caused by:
    Internal error: `Error, message length too large: found 4270109 bytes, the limit is: 4194304 bytes` Cause: ``.)
2023-12-11T09:03:05.149Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=IndexingScheduler-autumn-PbzF exit_status=Failure(cause=Error when scheduling indexing plan

Caused by:
    Internal error: `Error, message length too large: found 4270109 bytes, the limit is: 4194304 bytes` Cause: ``.)
2023-12-11T09:03:05.149Z ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=IndexingScheduler-autumn-PbzF exit_status=Failure(Error when scheduling indexing plan

Caused by:
    Internal error: `Error, message length too large: found 4270109 bytes, the limit is: 4194304 bytes` Cause: ``.)

Steps to reproduce (if applicable)
The problem is still in my prod env, but I cannot reproduce in my local env.

Expected behavior
How to fix it?

Configuration:
Please provide:

Output of quickwit --version

Quickwit v0.6.3 (5060b6c 2023-07-28T14:08:35Z)

The index_config.yaml

version: 0.6

index_id: my-index

doc_mapping:
  mode: strict
  field_mappings:
    - name: trace_id
      type: text
      fast: true
    - name: trace_state
      type: text
      indexed: false
    - name: service_name
      type: text
      tokenizer: raw
    - name: resource_attributes
      type: json
      tokenizer: raw
    - name: resource_dropped_attributes_count
      type: u64
      indexed: false
    - name: scope_name
      type: text
      indexed: true
    - name: scope_version
      type: text
      indexed: false
    - name: scope_attributes
      type: json
      indexed: false
    - name: scope_dropped_attributes_count
      type: u64
      indexed: false
    - name: span_id
      type: text
      tokenizer: raw
    - name: parent_span_id
      type: text
      fast: true
      tokenizer: raw
    - name: span_kind
      type: u64
    - name: span_name
      type: text
      tokenizer: raw
    - name: span_start_timestamp_nanos
      type: datetime
      input_formats: [unix_timestamp]
      output_format: unix_timestamp_nanos
      indexed: false
      fast: true
      precision: milliseconds
    - name: span_end_timestamp_nanos
      type: datetime
      input_formats: [unix_timestamp]
      output_format: unix_timestamp_nanos
      indexed: false
      fast: false
    - name: span_duration_millis
      type: u64
      indexed: false
      fast: true
      stored: false
    - name: span_attributes
      type: json
      tokenizer: en_stem
      record: position
    - name: span_dropped_attributes_count
      type: u64
      indexed: false
    - name: span_dropped_events_count
      type: u64
      indexed: false
    - name: span_dropped_links_count
      type: u64
      indexed: false
    - name: span_status
      type: json
      indexed: true
    - name: events
      type: array<json>
      tokenizer: raw
    - name: event_names
      type: array<text>
      tokenizer: default
      record: position
      stored: false
    - name: links
      type: array<json>
      tokenizer: raw

  timestamp_field: span_start_timestamp_nanos

  # partition_key: hash_mod(service_name, 100)
  # tag_fields: [service_name]

indexing_settings:
  commit_timeout_secs: 5

search_settings:
  default_search_fields: []

retention:
  period: 5 days
  schedule: daily

The text was updated successfully, but these errors were encountered:

tuziben · 2023-12-11T11:09:15Z

How many split files are in this index? Is it over 10K?

BTY, you can join this Discord channel at https://discord.com/channels/908281611840282624/908281611840282627

mingshun · 2023-12-11T11:18:18Z

The index is just created. No data has been ingested to it.

fmassot · 2023-12-11T11:54:10Z

@mingshun thanks for the report.

So, the control plane is retrieving all indexes metadata in one request and if the payload is too big, you will get this error. How many indexes do you have?

We can provide a quick fix by increasing the max size of gRPC messages but we first need to be sure if my assumption is correct.
Are you using a PostgreSQL metastore or file backed metastore?

mingshun · 2023-12-11T11:56:30Z

@fmassot There are 1027 indexes. Using PostgreSQL metastore.

fmassot · 2023-12-11T12:20:38Z

Ok, I think increasing the max size of gRPC messages should solve your issue.
Are you deploying with docker images or binaries?

(I'm preparing a release 0.6.5 here main...release-0.6)

fmassot · 2023-12-11T16:32:06Z

@mingshun, we just released the 0.6.5. Can you try it and see if it solves your issue? You should not see this problem up to 5000k indexes.
We will do a proper fix in the main branch to handle an arbitrary amount of indexes.

fmassot · 2023-12-11T16:41:34Z

@dojiong : you might be interested in the 0.6.5 release too.

mingshun · 2023-12-12T01:48:31Z

@fmassot 0.6.5 works.

mingshun added the bug Something isn't working label Dec 11, 2023

mingshun closed this as completed Dec 12, 2023

fmassot mentioned this issue Dec 12, 2023

Stream indexes metadata in metastore gRPC client list_indexes_metadata #4264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to ingest data to index after index created #4256

Fail to ingest data to index after index created #4256

mingshun commented Dec 11, 2023 •

edited

tuziben commented Dec 11, 2023

mingshun commented Dec 11, 2023 •

edited

fmassot commented Dec 11, 2023

mingshun commented Dec 11, 2023 •

edited

fmassot commented Dec 11, 2023 •

edited

fmassot commented Dec 11, 2023

fmassot commented Dec 11, 2023

mingshun commented Dec 12, 2023

Fail to ingest data to index after index created #4256

Fail to ingest data to index after index created #4256

Comments

mingshun commented Dec 11, 2023 • edited

tuziben commented Dec 11, 2023

mingshun commented Dec 11, 2023 • edited

fmassot commented Dec 11, 2023

mingshun commented Dec 11, 2023 • edited

fmassot commented Dec 11, 2023 • edited

fmassot commented Dec 11, 2023

fmassot commented Dec 11, 2023

mingshun commented Dec 12, 2023

mingshun commented Dec 11, 2023 •

edited

mingshun commented Dec 11, 2023 •

edited

mingshun commented Dec 11, 2023 •

edited

fmassot commented Dec 11, 2023 •

edited