Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to ingest data to index after index created #4256

Closed
mingshun opened this issue Dec 11, 2023 · 8 comments
Closed

Fail to ingest data to index after index created #4256

mingshun opened this issue Dec 11, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@mingshun
Copy link

mingshun commented Dec 11, 2023

Describe the bug
I deploy the quickwit in my prod env with k8s. It works for months normally. But I cannot ingest data to recently created index with rest api. The POST /api/v1/indexes respond 200, but POST /api/v1/my-index/ingest respond 404 with following response body:

{
   "message": "Index `Index `my-index` not found.` not found."
}

I can get index info with GET /api/v1/indexes/my-index/describe. I seems the index has been created.

I found quickwit-metastore reports the following error everytime I create a new index:

ERROR quickwit_control_plane: Failed to notify control plane of index change. error=Unavailable("Request not delivered") event="add-source"

I tried to restart quickwit-control-plane. After restarted, I found quickwit-control-plane reports the following error:

2023-12-11T09:03:02.100Z  WARN quickwit_metastore::metastore::retrying_metastore::retry: Request failed attempt_count=5
2023-12-11T09:03:02.100Z ERROR quickwit_control_plane::scheduler: Error when controlling the running plan: `Internal error: `Error, message length too large: found 4270109 bytes, the limit is: 4194304 bytes` Cause: ``.`.
2023-12-11T09:03:05.149Z  WARN quickwit_metastore::metastore::retrying_metastore::retry: Request failed attempt_count=5
2023-12-11T09:03:05.149Z ERROR quickwit_actors::spawn_builder: actor-failure cause=Error when scheduling indexing plan

Caused by:
    Internal error: `Error, message length too large: found 4270109 bytes, the limit is: 4194304 bytes` Cause: ``. exit_status=Failure(Error when scheduling indexing plan

Caused by:
    Internal error: `Error, message length too large: found 4270109 bytes, the limit is: 4194304 bytes` Cause: ``.)
2023-12-11T09:03:05.149Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=IndexingScheduler-autumn-PbzF exit_status=Failure(cause=Error when scheduling indexing plan

Caused by:
    Internal error: `Error, message length too large: found 4270109 bytes, the limit is: 4194304 bytes` Cause: ``.)
2023-12-11T09:03:05.149Z ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=IndexingScheduler-autumn-PbzF exit_status=Failure(Error when scheduling indexing plan

Caused by:
    Internal error: `Error, message length too large: found 4270109 bytes, the limit is: 4194304 bytes` Cause: ``.)

Steps to reproduce (if applicable)
The problem is still in my prod env, but I cannot reproduce in my local env.

Expected behavior
How to fix it?

Configuration:
Please provide:

  1. Output of quickwit --version
Quickwit v0.6.3 (5060b6c 2023-07-28T14:08:35Z)
  1. The index_config.yaml
version: 0.6

index_id: my-index

doc_mapping:
  mode: strict
  field_mappings:
    - name: trace_id
      type: text
      fast: true
    - name: trace_state
      type: text
      indexed: false
    - name: service_name
      type: text
      tokenizer: raw
    - name: resource_attributes
      type: json
      tokenizer: raw
    - name: resource_dropped_attributes_count
      type: u64
      indexed: false
    - name: scope_name
      type: text
      indexed: true
    - name: scope_version
      type: text
      indexed: false
    - name: scope_attributes
      type: json
      indexed: false
    - name: scope_dropped_attributes_count
      type: u64
      indexed: false
    - name: span_id
      type: text
      tokenizer: raw
    - name: parent_span_id
      type: text
      fast: true
      tokenizer: raw
    - name: span_kind
      type: u64
    - name: span_name
      type: text
      tokenizer: raw
    - name: span_start_timestamp_nanos
      type: datetime
      input_formats: [unix_timestamp]
      output_format: unix_timestamp_nanos
      indexed: false
      fast: true
      precision: milliseconds
    - name: span_end_timestamp_nanos
      type: datetime
      input_formats: [unix_timestamp]
      output_format: unix_timestamp_nanos
      indexed: false
      fast: false
    - name: span_duration_millis
      type: u64
      indexed: false
      fast: true
      stored: false
    - name: span_attributes
      type: json
      tokenizer: en_stem
      record: position
    - name: span_dropped_attributes_count
      type: u64
      indexed: false
    - name: span_dropped_events_count
      type: u64
      indexed: false
    - name: span_dropped_links_count
      type: u64
      indexed: false
    - name: span_status
      type: json
      indexed: true
    - name: events
      type: array<json>
      tokenizer: raw
    - name: event_names
      type: array<text>
      tokenizer: default
      record: position
      stored: false
    - name: links
      type: array<json>
      tokenizer: raw

  timestamp_field: span_start_timestamp_nanos

  # partition_key: hash_mod(service_name, 100)
  # tag_fields: [service_name]

indexing_settings:
  commit_timeout_secs: 5

search_settings:
  default_search_fields: []

retention:
  period: 5 days
  schedule: daily
@mingshun mingshun added the bug Something isn't working label Dec 11, 2023
@tuziben
Copy link
Contributor

tuziben commented Dec 11, 2023

How many split files are in this index? Is it over 10K?

BTY, you can join this Discord channel at https://discord.com/channels/908281611840282624/908281611840282627

@mingshun
Copy link
Author

mingshun commented Dec 11, 2023

The index is just created. No data has been ingested to it.

@fmassot
Copy link
Contributor

fmassot commented Dec 11, 2023

@mingshun thanks for the report.

So, the control plane is retrieving all indexes metadata in one request and if the payload is too big, you will get this error. How many indexes do you have?

We can provide a quick fix by increasing the max size of gRPC messages but we first need to be sure if my assumption is correct.
Are you using a PostgreSQL metastore or file backed metastore?

@mingshun
Copy link
Author

mingshun commented Dec 11, 2023

@fmassot There are 1027 indexes. Using PostgreSQL metastore.

@fmassot
Copy link
Contributor

fmassot commented Dec 11, 2023

Ok, I think increasing the max size of gRPC messages should solve your issue.
Are you deploying with docker images or binaries?

(I'm preparing a release 0.6.5 here main...release-0.6)

@fmassot
Copy link
Contributor

fmassot commented Dec 11, 2023

@mingshun, we just released the 0.6.5. Can you try it and see if it solves your issue? You should not see this problem up to 5000k indexes.
We will do a proper fix in the main branch to handle an arbitrary amount of indexes.

@fmassot
Copy link
Contributor

fmassot commented Dec 11, 2023

@dojiong : you might be interested in the 0.6.5 release too.

@mingshun
Copy link
Author

@fmassot 0.6.5 works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants