query_ingesters_within Config not right may cause query result left out logs #6043

honganan · 2022-04-28T09:15:52Z

Describe the bug
I have a log line which have several labels like: application=myapp, source=error.log, and it contains some keywords:

2022-04-28 06:11:31.239, TID:221a40a1baa748cbbeeea45b2af4cbb2.393.16511262905704693, ERROR, com.xxx.xxx.xxx.dao.BaseDAO, BaseDAO.java, 223, add id clash id=6131027,trynum=3,e=java.sql.SQLIntegrityConstraintViolationException,....

When choose only labels without pipeline, I can query it out. But when add pipeline with keyword contains in the log, the result is empty.

To Reproduce
Steps to reproduce the behavior:

Started Loki (2.4.2)
Config write component with:

  chunk_idle_period: 3h 
  chunk_retain_period: 30s 
  max_chunk_age: 2h

Config querier and query-frontend component with:

  query_store_only: false
  query_ingesters_within: 15m

execute 2 queries：
1st: {application="myapp",source="error.log"}
2nd: {application="myapp",source="error.log"} |= "SQLIntegrityConstraintViolationException"

Expected behavior
Expect both 2 queries returns the log line, but only the 1st did. After about 2 hours letter, both of them can return the log line.

Environment:

Infrastructure: VM
Deployment tool: Docker
Deployment mode: micro services

The text was updated successfully, but these errors were encountered:

honganan · 2022-04-28T09:22:00Z

I guess it's because the stream produces logs slow and the chunk is not flushed when I query, and query_ingesters_within configs to query from Ingester only within recent 15 minutes. So the logs between now-2h to now-15m cannot be queried. I want to confirm am I alright?

If that's the problem, should we do some check to the config values and give some warning?

cyriltovena · 2022-05-03T07:56:06Z

cc @sandeepsukhani

sandeepsukhani · 2022-05-04T08:09:46Z

I guess it's because the stream produces logs slow and the chunk is not flushed when I query, and query_ingesters_within configs to query from Ingester only within recent 15 minutes. So the logs between now-2h to now-15m cannot be queried. I want to confirm am I alright?

If that's the problem, should we do some check to the config values and give some warning?

Yes, you got it right. The query_ingesters_within should be at least max_chunk_age which defaults to 2h. We should add some validation for this and similar other configs where Loki operators can do mistakes which can be avoided by checks.

duj4 · 2024-03-06T08:54:03Z

Hi @sandeepsukhani ,

It seems I met the similar issue, I am running Loki in K8S with SSD mode, the max_chunk_age is set as default 2h and query_ingesters_within is set as 3h. I ingested a log snippet which was generated at March 1st :

2024-03-01 02:52:27,703+0800 jira-stats-0 INFO [c.a.j.util.stats.JiraStats] [JIRA-STATS] [NODE-START] snapshot stats: duration=PT1H0.007S, statsOverhead=n/a, data={"_statsName":"NODE-START","_statsType":"snapshot","_time":"2024-02-29T18:52:27.702Z","_timestamp":1709232747702,"_duration":"PT1H0.007S","_invocations":0,"_statsOverhead":"n/a","configuration":{},"checkIndexOnStart":{"result":"NOT_RUN","timeInSeconds":0},"getIndexBy":{"rebuildLocalIndex":{"result":"NOT_RUN","timeInSeconds":0},"pickIndexSnapshotFromSharedHome":{"result":"NOT_RUN","timeInSeconds":0},"requestIndexSnapshotFromAnotherNode":{"result":"NOT_RUN","timeInSeconds":0},"performFullForegroundReindex":{"result":"NOT_RUN","timeInSeconds":0}},"ensureFreshIndexSnapshot":{"result":"NOT_RUN","snapshotExisted":false,"snapshotCreated":false,"timeInSeconds":0}}
2024-03-01 02:52:27,704+0800 jira-stats-0 INFO [c.a.j.util.stats.JiraStats] [JIRA-STATS] [NODE-START] total stats: duration=PT205H6M19.18S, statsOverhead=n/a, data={"_statsName":"NODE-START","_statsType":"total","_time":"2024-02-29T18:52:27.702Z","_timestamp":1709232747702,"_duration":"PT205H6M19.18S","_invocations":0,"_statsOverhead":"n/a","configuration":{},"checkIndexOnStart":{"result":"NOT_RUN","timeInSeconds":0},"getIndexBy":{"rebuildLocalIndex":{"result":"NOT_RUN","timeInSeconds":0},"pickIndexSnapshotFromSharedHome":{"result":"NOT_RUN","timeInSeconds":0},"requestIndexSnapshotFromAnotherNode":{"result":"NOT_RUN","timeInSeconds":0},"performFullForegroundReindex":{"result":"NOT_RUN","timeInSeconds":0}},"ensureFreshIndexSnapshot":{"result":"NOT_RUN","snapshotExisted":false,"snapshotCreated":false,"timeInSeconds":0}}
2024-03-01 02:52:29,578+0800 plugin-transaction-0 INFO [c.a.jira.plugin.PluginTransactionListener] [JIRA-STATS] [plugin-transaction] numberStartEvents:627, numberEndEvents:627, numberSendEvents:278, numberEventsInTransactions:12196, numberOfPluginEnableEvents:276
2024-03-01 02:52:43,445+0800 Caesium-1-2 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Start to check for index consistency.
2024-03-01 02:52:43,446+0800 Caesium-1-2 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Index consistency check finished. indexConsistent: true; timeToCheckIndexConsistency: 0 ms
2024-03-01 02:52:43,446+0800 Caesium-1-2 INFO ServiceRunner [c.a.jira.servlet.ApplicationStateResolverImpl] Checking index consistency. Time taken: 1.343 ms

When I tried to query it from Grafana, it gave me the error like below:

But if I ingested the log snippet generated just now, everything looks normal:

2024-03-06 16:32:14,374+0800 Caesium-1-1 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Start to check for index consistency.
2024-03-06 16:32:14,380+0800 Caesium-1-1 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Index consistency check finished. indexConsistent: true; timeToCheckIndexConsistency: 2 ms
2024-03-06 16:32:14,380+0800 Caesium-1-1 INFO ServiceRunner [c.a.jira.servlet.ApplicationStateResolverImpl] Checking index consistency. Time taken: 5.981 ms
2024-03-06 16:33:14,375+0800 Caesium-1-4 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Start to check for index consistency.
2024-03-06 16:33:14,379+0800 Caesium-1-4 INFO ServiceRunner [c.a.j.issue.index.DefaultIndexManager] Index consistency check finished. indexConsistent: true; timeToCheckIndexConsistency: 0 ms
2024-03-06 16:33:14,380+0800 Caesium-1-4 INFO ServiceRunner [c.a.jira.servlet.ApplicationStateResolverImpl] Checking index consistency. Time taken: 3.970 ms

Per the doc:

query_ingesters_within: Maximum lookback beyond which queries are not sent to ingester

So it means if the query time range (7 days) is out of the scope of query_ingesters_within(3h), Loki will not search ingesters at all but the backend storage. However, since it is still in the window of max_chunk_age (2h) or chunk_idle_period (30min) , the logs in memory is not flushed to storage yet, then I failed to search the logs from either memory or storage.

Once I set the query_ingesters_within to 0, the elder log files are searchable again.

Please correct me if there is any misunderstanding.

Thanks.

saule1508 mentioned this issue Sep 28, 2023

Can only query the logs in last 1 hour when using filesystem as backend storage #10642

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query_ingesters_within Config not right may cause query result left out logs #6043

query_ingesters_within Config not right may cause query result left out logs #6043

honganan commented Apr 28, 2022 •

edited

honganan commented Apr 28, 2022

cyriltovena commented May 3, 2022

sandeepsukhani commented May 4, 2022

duj4 commented Mar 6, 2024 •

edited

query_ingesters_within Config not right may cause query result left out logs #6043

query_ingesters_within Config not right may cause query result left out logs #6043

Comments

honganan commented Apr 28, 2022 • edited

honganan commented Apr 28, 2022

cyriltovena commented May 3, 2022

sandeepsukhani commented May 4, 2022

duj4 commented Mar 6, 2024 • edited

honganan commented Apr 28, 2022 •

edited

duj4 commented Mar 6, 2024 •

edited