Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

janitor supports incremental execution #2094

Closed
guidao opened this issue Oct 12, 2022 · 19 comments
Closed

janitor supports incremental execution #2094

guidao opened this issue Oct 12, 2022 · 19 comments
Assignees
Labels
enhancement New feature or request

Comments

@guidao
Copy link

guidao commented Oct 12, 2022

During our usage we found that postgres has a lot of Lock state transactions. This caused the pipeline to restart due to fetching connection timeouts.
When I stopped the janitor execution, I noticed that there were a lot less transactions waiting. This might have something to do with the fact that janitor is getting a lot of splits, so maybe it could be executed in batches, with a fixed number of splits at a time.

image

@guidao guidao added the enhancement New feature or request label Oct 12, 2022
@guidao
Copy link
Author

guidao commented Oct 12, 2022

indexer command: quickwit --service indexer --service metastore
There are 50 indexer pods.

index config:

version: 0

index_id: logs

indexing_settings:
  timestamp_field: unix_ms

search_settings:
  default_search_fields: [log]

sources:
  - source_id: kafka
    source_type: kafka
    num_pipelines: 3
    params:
      topic: log.log-container.stdout
      client_log_level: debug
      client_params:
        bootstrap.servers: '${ip}'
        group.id: quickwit
        debug: all
        log_level: 7
        auto.offset.reset: latest

doc_mapping:
  partition_key: app_name
  max_num_partitions: 50
  tag_fields: [app_name]
  field_mappings:
    - name: app_name
      type: text
      tokenizer: raw
      fast: true
    - name: unit_name
      type: text
      tokenizer: raw
      fast: true
    - name: version
      type: text
      tokenizer: raw
      fast: true
    - name: idc
      type: text
      tokenizer: raw
      fast: true
    - name: container_name
      type: text
      tokenizer: raw
      fast: true
    - name: pod
      type: text
      tokenizer: raw
      fast: true
    - name: log
      type: text
    - name: unix_ms
      type: i64
      fast: true

@boraarslan
Copy link
Member

Interesting, @evanxg852000 can you take a look at this?

@evanxg852000
Copy link
Contributor

evanxg852000 commented Oct 12, 2022

@guidao I will try to reproduce this. At first sight, In janitor we run Gc, retention policy and delete task. Given you have one index without retention policy configured, I think that retention policy be out.

@guidao
Copy link
Author

guidao commented Oct 12, 2022

Yes, and we don't have a "delete task", so I suspect it's a gc task. We have a lot of split to be deleted, which could also be the cause of the long transaction.

 metastore=> select split_state, count(1) from splits group by split_state;
    split_state    |  count
-------------------+---------
 MarkedForDeletion | 1731338
 Published         |   95184
 Staged            |     400

@evanxg852000
Copy link
Contributor

evanxg852000 commented Oct 12, 2022

@guilload @fmassot I think using the system correctly (running all necessary services) should not yield this huge amount of deletable splits. However, we need to guard against having to load huge number of SplitMetadata in memeory by limiting the number of SplitMetadata we load at a time. Assuming a 1kb the size of SplitMetadata , that's a lot to hold in memory and possibly a lot for Postgress to transfert.
I am not entirely sure why Postgress hold locks for a while, But the place where we do a huge write transaction request to Postgres is at the end of the garbage collection, when deleting splits using their ids (String).

@fmassot
Copy link
Contributor

fmassot commented Oct 12, 2022

It's not normal at all. An explanation for this high number of MarkedForDeletion splits could be a repetitive failing garbage collection or just no janitor service running during a long time. @guidao did you see some error logs on the janitor instance? If yes, can you share the errors?

@guidao
Copy link
Author

guidao commented Oct 12, 2022

It's not normal at all. An explanation for this high number of MarkedForDeletion splits could be a repetitive failing garbage collection or just no janitor service running during a long time. @guidao did you see some error logs on the janitor instance? If yes, can you share the errors?

What I have observed is that janitor works fine at first, but from a certain time it starts to time out executing database operations, and then it never works properly. The error log has been lost for testing reasons, but I can re-run it tomorrow after clearing the data.

Our consumption has not been very stable, and I suspect that having a large number of splits in Staged state may be causing this problem or there is a large number of mergers

@guilload guilload assigned guilload and unassigned evanxg852000 Oct 12, 2022
@guilload
Copy link
Member

A few things:

  • How many indexer nodes run in your clusters? You have set the parameter num_pipelines to 3, so I expect your cluster to stage and publish at least num_indexers * 3 splits per commit_timeout_secs (in your case 60s, the default value). With a merge_factor of 10, merges will stage and publish an additional 10% splits.

  • The next time the janitor is staled, can you run those queries, BEFORE AND AFTER you kill the janitor:
    Locks (by table and lock mode)

SELECT mode, pg_class.relname, count(*) FROM pg_locks JOIN pg_class ON (pg_
locks.relation = pg_class.oid) WHERE pg_locks.mode IS NOT NULL AND pg_class.
relname NOT LIKE 'pg_%%' GROUP BY pg_class.relname, mode;

Deadlocks (per database) (9.2+)

SELECT datname, deadlocks FROM pg_stat_database;

Dead rows

SELECT relname, n_dead_tup FROM pg_stat_user_tables;

In the meantime, I'm going to add some log statements and metrics in the GC and metastore.

@guidao
Copy link
Author

guidao commented Oct 12, 2022

@guilload We have 50 Indexers in our cluster. I will try it tomorrow.

@guidao
Copy link
Author

guidao commented Oct 13, 2022

partition_key: app_name
max_num_partitions: 50

Does the number of split in the "deleted state" have anything to do with this configuration?

@guidao
Copy link
Author

guidao commented Oct 13, 2022

janitor logs(postgres statement_timeout: 300000):

2022-10-13T10:44:19.200Z ERROR quickwit_janitor::actors::garbage_collector: Failed to run garbage collection on index. index_id=logs error=Failed to delete splits from metastore: 'DbError { message: "error returned from database: canceling statement due to statement timeout" }'.
2022-10-13T10:44:19.449Z ERROR quickwit_actors::actor_handle: actor-exit-without-success actor="DeleteTaskPlanner-empty-EE06"
2022-10-13T10:44:19.449Z ERROR quickwit_janitor::actors::delete_task_pipeline: Delete task pipeline failure. index_id="logs" healthy_actors=None failed_or_unhealthy_actors=Some(["DeleteTaskPlanner-empty-EE06"]) success_actors=Some(["MergeSplitDownloader-small-pq0R", "MergeExecutor-young-Fujx"])
2022-10-13T10:44:22.451Z ERROR quickwit_actors::actor_handle: actor-exit-without-success actor="DeleteTaskPlanner-empty-EE06"
2022-10-13T10:44:22.451Z ERROR quickwit_janitor::actors::delete_task_pipeline: Delete task pipeline failure. index_id="logs" healthy_actors=None failed_or_unhealthy_actors=Some(["DeleteTaskPlanner-empty-EE06"]) success_actors=Some(["MergeSplitDownloader-small-pq0R", "MergeExecutor-young-Fujx"])
2022-10-13T10:44:25.453Z ERROR quickwit_actors::actor_handle: actor-exit-without-success actor="DeleteTaskPlanner-empty-EE06"
2022-10-13T10:44:25.453Z ERROR quickwit_janitor::actors::delete_task_pipeline: Delete task pipeline failure. index_id="logs" healthy_actors=None failed_or_unhealthy_actors=Some(["DeleteTaskPlanner-empty-EE06"]) success_actors=Some(["MergeSplitDownloader-small-pq0R", "MergeExecutor-young-Fujx"])

metastore=> SELECT mode, pg_class.relname, count(*) FROM pg_locks JOIN pg_class ON (pg_locks.relation = pg_class.oid) WHERE pg_locks.mode IS NOT NULL AND pg_class.relname NOT LIKE 'pg_%%' GROUP BY pg_class.relname, mode;
       mode       |   relname    | count
------------------+--------------+-------
 RowExclusiveLock | splits       |   143
 RowShareLock     | indexes      |   137
 ExclusiveLock    | indexes      |     5
 RowExclusiveLock | indexes_pkey |   143
 AccessShareLock  | indexes_pkey |   137
 RowExclusiveLock | indexes      |   143
 RowExclusiveLock | splits_pkey  |   143

metastore=> SELECT datname, deadlocks FROM pg_stat_database;
  datname  | deadlocks
-----------+-----------
 postgres  |         0
 metastore |         0
 template1 |         0
 template0 |         0
(4 rows)

metastore=> SELECT relname, n_dead_tup FROM pg_stat_user_tables;
     relname      | n_dead_tup
------------------+------------
 spatial_ref_sys  |          0
 splits           |      33461
 delete_tasks     |          0
 indexes          |        120
 _sqlx_migrations |          5
(5 rows)

metastore=> select count(1), split_state from splits group by split_state;
 count  |    split_state
--------+-------------------
 283923 | MarkedForDeletion
  68598 | Published
 164837 | Staged
(3 rows)

@guidao
Copy link
Author

guidao commented Oct 14, 2022

@fmassot @guilload I've posted some logs with information that I hope will help with this issue.

@ChillFish8
Copy link
Contributor

Linking this with so it's easier to follow the paper trail #2126

@ChillFish8
Copy link
Contributor

@guidao what is the general overview of your Postgres server?

  • How many active connections to the database (server sessions) are there when indexing?
  • Is there any monitoring around the server running Postgres for things like the iowait, disk speeds, etc...

@guidao
Copy link
Author

guidao commented Oct 19, 2022

@guidao what is the general overview of your Postgres server?

  • How many active connections to the database (server sessions) are there when indexing?
  • Is there any monitoring around the server running Postgres for things like the iowait, disk speeds, etc...
  1. About 300-400 connections
  2. The database does not see the machine disk related monitoring because of the cloud service we are using.

@fulmicoton
Copy link
Contributor

Can you detail where the 300 connections are coming from? I assume we have 50 indexers? The extra 250 are searchers?

@guidao
Copy link
Author

guidao commented Oct 19, 2022

Can you detail where the 300 connections are coming from? I assume we have 50 indexers? The extra 250 are searchers?

const CONNECTION_POOL_MAX_SIZE: u32 = 10;

It should be related to this configuration.

@fulmicoton
Copy link
Contributor

Ah yes we have a conneciton pool! understood.

@guilload
Copy link
Member

guilload commented Nov 26, 2022

We pushed a bunch of fixes that have increased the performance of the janitor. I'm going to close this issue. Feel free to reopen if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants