Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to store statistics: timeout 1.2.0 #8036

Closed
cbarzu opened this issue Feb 21, 2017 · 25 comments
Closed

failed to store statistics: timeout 1.2.0 #8036

cbarzu opened this issue Feb 21, 2017 · 25 comments

Comments

@cbarzu
Copy link

cbarzu commented Feb 21, 2017

Hi guys,
I saw there are some tickets with this error but without a solution. Because this still happens on influx 1.2.0 I create another one.

System info: [Include InfluxDB version, operating system name, and other relevant details]

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
$:/var/lib/influxdb/data$ uname -a
Linux influx 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

$ influxd version
InfluxDB v1.2.0 (git: master b7bb7e8359642b6e071735b50ae41f5eb343fd42)

32GB ram
50GB ssd for /var/lib/influxdb
4 cores
  1. systemctl start influxd

  2. Wait some minutes

Actual behavior:
2017-02-21T16:37:50Z failed to store statistics: timeout service=monitor

Querys and writes works correctly, although some writes timeouts.

Thank you,
Claudiu

@cbarzu
Copy link
Author

cbarzu commented Feb 21, 2017

Any information you could need feel free to ask.

@jwilder
Copy link
Contributor

jwilder commented Feb 21, 2017

We need more information in order to diagnose what is going on. Can you update the issue description with the instructions listed in our issue template? Profile data when the timeouts occur would be useful.

@cbarzu
Copy link
Author

cbarzu commented Feb 22, 2017

Ok, I have just installed on new vm influxdb and this is the info:

System info:

$ uname -a
Linux influx 3.10.0-327.10.1.el7.x86_64 #1 SMP Sat Jan 23 04:54:55 EST 2016 x86_64 x86_64 x86_64 GNU/Linu

$ cat /etc/*release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.2 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.2"
PRETTY_NAME="Red Hat Enterprise Linux"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.2:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.2
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.2"
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Red Hat Enterprise Linux Server release 7.2 (Maipo)

$ influxd version
InfluxDB v1.2.0 (git: master b7bb7e8359642b6e071735b50ae41f5eb343fd42)

$ df -h  #this is a ssd disk
/dev/mapper/vg_influx-influx_data   49G   63M   46G   1% /var/lib/influxdb

Steps to reproduce:

  1. wget https://dl.influxdata.com/influxdb/releases/influxdb-1.2.0.x86_64.rpm
  2. sudo yum localinstall influxdb-1.2.0.x86_64.rpm
  3. sudo systemctl start influxdb

Expected behavior:
service=monitor shouldn't timeout

Actual behavior:

 2017-02-21T19:58:48Z retention policy shard deletion check commencing service=retention
feb 21 21:03:10 influx[1078]: [I] 2017-02-21T20:03:10Z failed to store statistics: timeout service=monitor
feb 21 21:03:20 influx[1078]: [I] 2017-02-21T20:03:20Z failed to store statistics: timeout service=monitor

Additional info:
There is no custom configuration, just install and run. I'm not writing or reading anything from influxdb.
logs: https://gist.github.com/claubrz/02b973e8d4c6ab198d5689a09ff8943d
block: https://gist.github.com/claubrz/1411ca3c371f6e24cb2a64d6cf05a691
goroutine: https://gist.github.com/claubrz/a737d399596c161ef6356bbe188cc538
heap: https://gist.github.com/claubrz/338153a34a9288336c3c9484ce8607e4
vars: https://gist.github.com/claubrz/5a5f0420c3d9e4bdc442a1bb7fa1b283
iostat: https://gist.github.com/claubrz/9668562d7fdc1a277d159c4d9962599c
shards:
name: _internal
id database retention_policy shard_group start_time end_time expiry_time owners


1 _internal monitor 1 2017-02-21T00:00:00Z 2017-02-22T00:00:00Z 2017-03-01T00:00:00Z
2 _internal monitor 2 2017-02-22T00:00:00Z 2017-02-23T00:00:00Z 2017-03-02T00:00:00Z

stgats: https://gist.github.com/claubrz/6cb969b66a32955614a68ab1091fce62
diagonostics: https://gist.github.com/claubrz/eded534953d006754dd989c4052a76c0

I hope this helps.

Regards,
Claudiu

@tnachen
Copy link

tnachen commented Apr 19, 2017

+1 keep getting this as well

@derbeneviv
Copy link

derbeneviv commented Jun 15, 2017

+1 same for 1.2.2

@andremiller
Copy link

andremiller commented Jul 6, 2017

I'm also getting this error occasionally, and after a few days, no more data is written to InfluxDB. If I restart InfluxDB it works again for a few days and then stops receiving data until I restart it again.
Edit: I'm using InfluxDB 1.3.0. I downgraded to to 1.2.4 to see if it still happens, will report back in a few days

@derbeneviv
Copy link

+1 same for 1.2.4

@lnicola
Copy link

lnicola commented Aug 21, 2017

I'm also seeing this since 1.3.1, I think:

Aug 21 18:07:00 ubik influxd[2669]: [I] 2017-08-21T15:07:00Z failed to store statistics: timeout service=monitor
Aug 21 18:07:10 ubik influxd[2669]: [I] 2017-08-21T15:07:10Z failed to store statistics: timeout service=monitor
Aug 21 18:07:20 ubik influxd[2669]: [I] 2017-08-21T15:07:20Z failed to store statistics: timeout service=monitor
Aug 21 18:07:30 ubik influxd[2669]: [I] 2017-08-21T15:07:30Z failed to store statistics: timeout service=monitor
Aug 21 18:07:40 ubik influxd[2669]: [I] 2017-08-21T15:07:40Z failed to store statistics: timeout service=monitor
Aug 21 18:07:50 ubik influxd[2669]: [I] 2017-08-21T15:07:50Z failed to store statistics: timeout service=monitor
Aug 21 18:08:00 ubik influxd[2669]: [I] 2017-08-21T15:08:00Z failed to store statistics: timeout service=monitor
Aug 21 18:08:10 ubik influxd[2669]: [I] 2017-08-21T15:08:10Z failed to store statistics: timeout service=monitor
Aug 21 18:08:20 ubik influxd[2669]: [I] 2017-08-21T15:08:20Z failed to store statistics: timeout service=monitor
Aug 21 18:08:30 ubik influxd[2669]: [I] 2017-08-21T15:08:30Z failed to store statistics: timeout service=monitor
Aug 21 18:08:40 ubik influxd[2669]: [I] 2017-08-21T15:08:40Z failed to store statistics: timeout service=monitor
Aug 21 18:08:50 ubik influxd[2669]: [I] 2017-08-21T15:08:50Z failed to store statistics: timeout service=monitor
Aug 21 18:09:00 ubik influxd[2669]: [I] 2017-08-21T15:09:00Z failed to store statistics: timeout service=monitor
Aug 21 18:09:10 ubik influxd[2669]: [I] 2017-08-21T15:09:10Z failed to store statistics: timeout service=monitor
Aug 21 18:09:20 ubik influxd[2669]: [I] 2017-08-21T15:09:20Z failed to store statistics: timeout service=monitor
Aug 21 18:09:30 ubik influxd[2669]: [I] 2017-08-21T15:09:30Z failed to store statistics: timeout service=monitor
Aug 21 18:09:50 ubik influxd[2669]: [I] 2017-08-21T15:09:50Z failed to store statistics: timeout service=monitor
Aug 21 18:10:00 ubik influxd[2669]: [I] 2017-08-21T15:10:00Z failed to store statistics: timeout service=monitor
Aug 21 18:10:10 ubik influxd[2669]: [I] 2017-08-21T15:10:10Z failed to store statistics: timeout service=monitor

Disabling monitor.store-enabled makes the issue go away.

@Thinkscape
Copy link

Same here. All writes end up in a timeout after a few hours of operation. No workaround seems to help. Restarting helps but then it gets stuck again.

> use some_database
> insert some_metric,whatever=foo,uuid=d1bffb9339bf value=20.5 1503395025755619588
ERR: {"error":"timeout"}
  • A single host
  • Using tsm1
  • InfluxDB version: 1.3.0
  • Linux 4.12.4-1-ARCH
  • ~ 20 writes / minute, 1-3 connections
  • Memory usage seems really high for the use case, although it doesn't hit OOM.
  • CPU usage is flat
  • Example load average: 0.73, 1.45, 0.90
             total       used       free     shared    buffers     cached
Mem:          7.7G       6.6G       1.0G       2.1M       1.6G       3.0G
-/+ buffers/cache:       2.0G       5.6G
Swap:         5.8G         0B       5.8G

SHOW STATS

> show diagnostics
name: build
Branch Build Time Commit                                   Version
------ ---------- ------                                   -------
master            76124df5c121e411e99807b9473a03eb785cd43b 1.3.0

name: config
bind-address   reporting-disabled
------------   ------------------
127.0.0.1:8088 false

name: config-coordinator
log-queries-after max-concurrent-queries max-select-buckets max-select-point max-select-series query-timeout write-timeout
----------------- ---------------------- ------------------ ---------------- ----------------- ------------- -------------
0s                0                      0                  0                0                 0s            10s

name: config-cqs
enabled run-interval
------- ------------
true    1s

name: config-data
cache-max-memory-size cache-snapshot-memory-size cache-snapshot-write-cold-duration compact-full-write-cold-duration dir                    max-concurrent-compactions max-series-per-database max-values-per-tag wal-dir               wal-fsync-delay
--------------------- -------------------------- ---------------------------------- -------------------------------- ---                    -------------------------- ----------------------- ------------------ -------               ---------------
1073741824            26214400                   10m0s                              4h0m0s                           /var/lib/influxdb/data 0                          1000000                 100000             /var/lib/influxdb/wal 0s

name: config-graphite
enabled bind-address protocol database retention-policy batch-size batch-pending batch-timeout
------- ------------ -------- -------- ---------------- ---------- ------------- -------------
true    :2003        tcp      graphite                  5000       10            1s

name: config-httpd
bind-address enabled https-enabled max-connection-limit max-row-limit
------------ ------- ------------- -------------------- -------------
:8086        true    false         0                    0

name: config-meta
dir
---
/var/lib/influxdb/meta

name: config-monitor
store-database store-enabled store-interval
-------------- ------------- --------------
_internal      true          10s

name: config-precreator
advance-period check-interval enabled
-------------- -------------- -------
30m0s          10m0s          true

name: config-retention
check-interval enabled
-------------- -------
30m0s          true

name: config-subscriber
enabled http-timeout write-buffer-size write-concurrency
------- ------------ ----------------- -----------------
true    30s          1000              40

name: graphite:tcp::2003
local remote connect time
----- ------ ------------

name: network
hostname
--------
87e69fc37624

name: runtime
GOARCH GOMAXPROCS GOOS  version
------ ---------- ----  -------
amd64  4          linux go1.8.3

name: system
PID currentTime                    started                        uptime
--- -----------                    -------                        ------
1   2017-08-22T21:51:21.200698369Z 2017-08-22T11:46:38.093657177Z 10h4m43.107041749s
# cat /etc/influxdb/influxdb.conf
[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  engine = "tsm1"
  wal-dir = "/var/lib/influxdb/wal"

# env | grep INFLUX
INFLUXDB_GRAPHITE_ENABLED=true
INFLUXDB_ADMIN_ENABLED=true
INFLUXDB_VERSION=1.3.0

@hpbieker
Copy link
Contributor

I get this for Influx 1.3.7 as well.

November 15th 2017, 22:32:18.000	ERROR	 - 	failed to store statistics: timeout service=monitor	 - 	influx-log
November 15th 2017, 22:06:12.000	ERROR	 - 	failed to store statistics: timeout service=monitor	 - 	influx-log
November 15th 2017, 21:44:15.000	ERROR	 - 	failed to store statistics: timeout service=monitor	 - 	influx-log
November 15th 2017, 21:24:17.000	ERROR	 - 	failed to store statistics: timeout service=monitor	 - 	influx-log
November 15th 2017, 20:56:17.000	ERROR	 - 	failed to store statistics: timeout service=monitor	 - 	influx-log
November 15th 2017, 18:35:17.000	ERROR	 - 	failed to store statistics: timeout service=monitor	 - 	influx-log
November 15th 2017, 17:23:13.000	ERROR	 - 	failed to store statistics: timeout service=monitor	 - 	influx-log

@krambox
Copy link

krambox commented Dec 6, 2017

I get this also sometimes (every 2-3 days ) at midnight

[I] 2017-12-06T00:00:10Z failed to store statistics: timeout service=monitor 
[httpd] 172.17.0.1 - root [06/Dec/2017:00:00:57 +0000] "POST /write?db=smarthome&p=%5BREDACTED%5D&precision=n&rp=&u=root HTTP/1.1" 500 20 "-" "-" 893c8ee6-da18-11e7-b64b-000000000000 10014690
[E] 2017-12-06T00:01:07Z [500] - "timeout" service=httpd | stderr
...

@ghost
Copy link

ghost commented Apr 30, 2018

I am also getting this but very infrequently (and without a pattern).
The last logs April 26th were with influxdb version: 1.5.2-1

Mar 31 10:51:33 hostname influxd[1271]: ts=2018-03-31T09:51:33.073975Z lvl=info msg="failed to store statistics" log_id=077M7CBl000 service=monitor error=timeout
Mar 31 20:51:40 hostname influxd[1271]: ts=2018-03-31T19:51:40.213325Z lvl=info msg="failed to store statistics" log_id=077M7CBl000 service=monitor error=timeout
Apr 07 07:02:40 hostname influxd[3197]: ts=2018-04-07T06:02:40.101796Z lvl=info msg="failed to store statistics" log_id=07Egike0000 service=monitor error=timeout
Apr 26 14:42:10 hostname influxd[1258]: ts=2018-04-26T13:42:10.467667Z lvl=info msg="failed to store statistics" log_id=07hJ4RHG000 service=monitor error=timeout
Apr 26 17:30:10 hostname influxd[1258]: ts=2018-04-26T16:30:10.575370Z lvl=info msg="failed to store statistics" log_id=07hJ4RHG000 service=monitor error=timeout

@cheesedosa
Copy link

+1.
I happened to get this on version: 1.6.0
One thing I noticed was that, I was running another process which involved writing to disk (postgres writes) when influxdb started throwing these errors. Not sure if that has any relation to this, but just throwing it out there in case someone else also has similar correlation.

@lobocobra
Copy link

lobocobra commented Dec 26, 2018

I have the same problem with influx. I use a raspi 3 with the latest influx InfluxDB v1.7.2
=> the problem started after the update to the latest version in December, the last previous update I did around in July (maybe a bit before).
=> I had also a power outage in November, but the system worked ok after it.

When the issues started, I saw that some of the files at /var/lib/influxdb/ where at once belonging to root and influx could not write to it. As consequence, influx started to make the systen in-responsive and used >90% of CPU (usually it is at 0.3%)
=> I did a chown of /var/lin/influxdb dir, but had an in-responsive system within 1h
=> I now rebooted the system and hope that the error is gone (hope dies last)

This is the last message before influx crashed:

Dez 26 07:00:22 OpenHabCat influxd[606]: ts=2018-12-26T06:00:20.087443Z lvl=error msg="[500] - \"timeout\"" log_id=0Caz0HYW000 service=httpd
Dez 26 07:00:32 OpenHabCat influxd[606]: ts=2018-12-26T06:00:30.151848Z lvl=info msg="failed to store statistics" log_id=0Caz0HYW000 service=monitor error=timeout

@rdslw
Copy link

rdslw commented Mar 12, 2019

@cbarzu what is your settings for wal-fsync-delay in influxdb.conf ?

@dracula92107
Copy link

dracula92107 commented Apr 4, 2019

+1
I have the same issue on version 1.7.5 (docker latest version)

2019-04-04T03:24:00.958588Z error [500] - "timeout" {"log_id": "0Ea_2oP0000", "service": "httpd"} [httpd] 172.17.0.1 - admin [04/Apr/2019:03:24:00 +0000] "POST /write?db=waf_log&rp=autogen&precision=n&consistency=one HTTP/1.1" 500 20 "-" "okhttp/3.11.0" 171a2d77-5689-11e9-83dd-0242ac110002 10000297 2019-04-04T03:24:10.961376Z error [500] - "timeout" {"log_id": "0Ea_2oP0000", "service": "httpd"} [httpd] 172.17.0.1 - admin [04/Apr/2019:03:24:10 +0000] "POST /write?db=waf_log&rp=autogen&precision=n&consistency=one HTTP/1.1" 500 20 "-" "okhttp/3.11.0" 1d10694b-5689-11e9-83de-0242ac110002 10000663

@conet
Copy link

conet commented Apr 9, 2019

@dracula92107 1.7.5 is broken, either use 1.7.4 or wait for 1.7.6, see #13010.

@dracula92107
Copy link

Thank @conet ,
Let me try it.

@timhallinflux
Copy link
Contributor

the fix for #13010 is in the 1.7 branch if you are building from source and our plan is to have a 1.7.6 tagged and built next week.

Also useful to review the best practices related to monitoring Influx itself. http://docs.influxdata.com/platform/monitoring/influxdata-platform

It was noted earlier in the thread that turning off monitor.store-enabled in the config addressed some of the issues prior to 1.7 where timeout errors were being thrown. Turning this off eliminates some resource contention, but eliminates the ability for you to gather stats within the database itself. But, if you are working on a constrained environment/resources to begin with this turning this off will help.

@stale
Copy link

stale bot commented Jul 23, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 23, 2019
@stale
Copy link

stale bot commented Jul 30, 2019

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

@stale stale bot closed this as completed Jul 30, 2019
@KlDec
Copy link

KlDec commented Feb 8, 2021

I'm running V1.8.3 on a virtual server and ran into the error message. Using influxdb to store temperature data (one set per 5min, so small database, low load) und visualize the data via grafana installed on the same server. Server runs Ubuntu 18.04. influxd is running, telegraf is running to ship influxd data to a seperate server that is used for monitoring the main server.

During grafana queries the "failed to store statistics" messages appeared repeatedly, at the end, influxd was killed by the kernel. The influxd started again and went down during startup a second time. After another restart, influxd survided and is still running and serving queries as speak.

During the problem yesterday, grafana couldn' get the data for 30+mins.

Complete logfiles are available when needed. Seems that fixes done so far didn't completely solve the problem.

Any advise/fix to avoid such an incident in the future is highly appreciated! Thanks in advance.

Logfile excerpts:

Main server: influxd.log:
...
Feb 07 09:59:35 hostname influxd[745643]: ts=2021-02-07T08:59:35.792809Z lvl=info msg="Executing query" log_id=0SA3OkpG000 service=query query="SELECT mean(ESE_KI_EMP_AMT1_tempore_gradC) FROM bmonlive.bmonrp.ESEKI_Controller_v WHERE time >= 438479h AND time <= 1614553199s GROUP BY time(10m)"
Feb 07 10:00:28 hostname influxd[745643]: ts=2021-02-07T09:00:28.332067Z lvl=info msg="failed to store statistics" log_id=0SA3OkpG000 service=monitor error=timeout
Feb 07 10:01:04 hostname influxd[745643]: ts=2021-02-07T09:01:03.438886Z lvl=info msg="failed to store statistics" log_id=0SA3OkpG000 service=monitor error=timeout
Feb 07 10:01:19 hostname influxd[745643]: ts=2021-02-07T09:01:18.847438Z lvl=info msg="failed to store statistics" log_id=0SA3OkpG000 service=monitor error=timeout
Feb 07 10:01:58 hostname influxd[745643]: ts=2021-02-07T09:01:57.551540Z lvl=info msg="failed to store statistics" log_id=0SA3OkpG000 service=monitor error=timeout
Feb 07 10:02:43 hostname influxd[745643]: ts=2021-02-07T09:02:43.271768Z lvl=info msg="Executing query" log_id=0SA3OkpG000 service=query query="SELECT mean(ESE_KI_EMP_AMT1_tempore_gradC) FROM bmonlive.bmonrp.ESEKI_Controller_v WHERE time >= 438479h AND time <= 1614553199s GROUP BY time(1d)"
...

Main server: syslog:
...
Feb 7 09:59:35 hostname influxd[745643]: ts=2021-02-07T08:59:35.792809Z lvl=info msg="Executing query" log_id=0SA3OkpG000 service=query query="SELECT mean(ESE_KI_EMP_AMT1_tempore_gradC) FROM bmonlive.bmonrp.ESEKI_Controller_v WHERE time >= 438479h AND time <= 1614553199s GROUP BY time(10m)"
Feb 7 10:00:28 hostname influxd[745643]: ts=2021-02-07T09:00:28.332067Z lvl=info msg="failed to store statistics" log_id=0SA3OkpG000 service=monitor error=timeout
Feb 7 10:00:59 hostname telegraf[522843]: 2021-02-07T09:00:59Z E! [outputs.influxdb] When writing to [https://XXX:8086]: Post "https://XXX:8086/write?db=telegraf": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Feb 7 10:00:59 hostname telegraf[522843]: 2021-02-07T09:00:59Z E! [agent] Error writing to outputs.influxdb: could not write any address
Feb 7 10:01:04 hostname influxd[745643]: ts=2021-02-07T09:01:03.438886Z lvl=info msg="failed to store statistics" log_id=0SA3OkpG000 service=monitor error=timeout
Feb 7 10:01:19 hostname influxd[745643]: ts=2021-02-07T09:01:18.847438Z lvl=info msg="failed to store statistics" log_id=0SA3OkpG000 service=monitor error=timeout
Feb 7 10:01:58 hostname influxd[745643]: ts=2021-02-07T09:01:57.551540Z lvl=info msg="failed to store statistics" log_id=0SA3OkpG000 service=monitor error=timeout
Feb 7 10:02:43 hostname influxd[745643]: ts=2021-02-07T09:02:43.271768Z lvl=info msg="Executing query" log_id=0SA3OkpG000 service=query query="SELECT mean(ESE_KI_EMP_AMT1_tempore_gradC) FROM bmonlive.bmonrp.ESEKI_Controller_v WHERE time >= 438479h AND time <= 1614553199s GROUP BY time(1d)"
Feb 7 10:02:46 hostname influxd[745643]: [httpd] XXX, XXX,127.0.0.1 - bmon [07/Feb/2021:09:59:22 +0100] "POST /query?db=bmonlive&epoch=ms HTTP/1.1" 200 3295 "-" "Grafana/7.3.2" c54d2383-6922-11eb-8050-9600007b6ed0 203525793
Feb 7 10:02:46 hostname influxd[745643]: ts=2021-02-07T09:02:46.687033Z lvl=info msg="Executing query" log_id=0SA3OkpG000 service=query query="SELECT mean(ESE_KI_ASB_AMF1_fa_proz) FROM bmonlive.bmonrp.ESEKI_Controller_v WHERE time >= 438479h AND time <= 1614553199s GROUP BY time(1h)"
Feb 7 10:03:03 hostname telegraf[522843]: 2021-02-07T09:03:03Z E! [outputs.influxdb] When writing to [https://XXX:8086]: Post "https://XXX:8086/write?db=telegraf": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
...
Feb 7 10:18:51 hostname kernel: [6976969.348619] influxd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Feb 7 10:18:51 hostname kernel: [6976969.348626] CPU: 0 PID: 745648 Comm: influxd Not tainted 5.4.0-52-generic #57-Ubuntu
Feb 7 10:18:51 hostname kernel: [6976969.348626] Hardware name: Hetzner vServer, BIOS 20171111 11/11/2017
Feb 7 10:18:51 hostname kernel: [6976969.348627] Call Trace:
Feb 7 10:18:51 hostname kernel: [6976969.348653] dump_stack+0x6d/0x9a
Feb 7 10:18:51 hostname kernel: [6976969.348656] dump_header+0x4f/0x1eb
Feb 7 10:18:51 hostname kernel: [6976969.348657] oom_kill_process.cold+0xb/0x10
Feb 7 10:18:51 hostname kernel: [6976969.348658] out_of_memory.part.0+0x1df/0x3d0
Feb 7 10:18:51 hostname kernel: [6976969.348659] out_of_memory+0x6d/0xd0
Feb 7 10:18:51 hostname kernel: [6976969.348663] __alloc_pages_slowpath+0xd5e/0xe50
Feb 7 10:18:51 hostname kernel: [6976969.348665] __alloc_pages_nodemask+0x2d0/0x320
Feb 7 10:18:51 hostname kernel: [6976969.348667] alloc_pages_current+0x87/0xe0
Feb 7 10:18:51 hostname kernel: [6976969.348670] __page_cache_alloc+0x72/0x90
Feb 7 10:18:51 hostname kernel: [6976969.348672] pagecache_get_page+0xbf/0x300
Feb 7 10:18:51 hostname kernel: [6976969.348673] filemap_fault+0x6b2/0xa50
Feb 7 10:18:51 hostname kernel: [6976969.348676] ? unlock_page_memcg+0x12/0x20
Feb 7 10:18:51 hostname kernel: [6976969.348678] ? page_add_file_rmap+0xff/0x1a0
Feb 7 10:18:51 hostname kernel: [6976969.348680] ? xas_load+0xd/0x80
Feb 7 10:18:51 hostname kernel: [6976969.348681] ? xas_find+0x17f/0x1c0
Feb 7 10:18:51 hostname kernel: [6976969.348682] ? filemap_map_pages+0x24c/0x380
Feb 7 10:18:51 hostname kernel: [6976969.348684] ext4_filemap_fault+0x32/0x46
Feb 7 10:18:51 hostname kernel: [6976969.348686] __do_fault+0x3c/0x130
Feb 7 10:18:51 hostname kernel: [6976969.348687] do_fault+0x24b/0x640
Feb 7 10:18:51 hostname kernel: [6976969.348690] ? __switch_to_asm+0x40/0x70
Feb 7 10:18:51 hostname kernel: [6976969.348691] __handle_mm_fault+0x4c5/0x7a0
Feb 7 10:18:51 hostname kernel: [6976969.348692] handle_mm_fault+0xca/0x200
Feb 7 10:18:51 hostname kernel: [6976969.348696] do_user_addr_fault+0x1f9/0x450
Feb 7 10:18:51 hostname kernel: [6976969.348698] __do_page_fault+0x58/0x90
Feb 7 10:18:51 hostname kernel: [6976969.348701] ? exit_to_usermode_loop+0x8f/0x160
Feb 7 10:18:51 hostname kernel: [6976969.348703] do_page_fault+0x2c/0xe0
Feb 7 10:18:51 hostname kernel: [6976969.348705] do_async_page_fault+0x39/0x70
Feb 7 10:18:51 hostname kernel: [6976969.348707] async_page_fault+0x34/0x40
Feb 7 10:18:51 hostname kernel: [6976969.348714] RIP: 0033:0x12bcce8
Feb 7 10:18:51 hostname kernel: [6976969.348717] Code: cc cc 64 48 8b 0c 25 f8 ff ff ff 48 3b 61 10 76 69 48 83 ec 40 48 89 6c 24 38 48 8d 6c 24 38 48 8b 42 08 48 8b 08 48 8b 40 08 <48> 8b 49 28 48 89 04 24 48 8b 44 24 48 48 89 44 24 08 48 8b 44 24
Feb 7 10:18:51 hostname kernel: [6976969.348718] RSP: 002b:000000c006013b38 EFLAGS: 00010202
Feb 7 10:18:51 hostname kernel: [6976969.348719] RAX: 000000c00f6f7790 RBX: 000000c05f3f2fd8 RCX: 000000000258cd60
Feb 7 10:18:51 hostname kernel: [6976969.348720] RDX: 000000c007b50490 RSI: 0000000000000001 RDI: 000000c00a361ce0
Feb 7 10:18:51 hostname kernel: [6976969.348720] RBP: 000000c006013b70 R08: 0000000000000001 R09: 0000000000000001
Feb 7 10:18:51 hostname kernel: [6976969.348721] R10: 0000000000000001 R11: 000000c00813a6c0 R12: 0000000000000002
Feb 7 10:18:51 hostname kernel: [6976969.348722] R13: 00000000000001fc R14: 00000000000001fb R15: 0000000000000400
Feb 7 10:18:51 hostname kernel: [6976969.348723] Mem-Info:
Feb 7 10:18:51 hostname kernel: [6976969.348731] active_anon:459581 inactive_anon:83 isolated_anon:0
Feb 7 10:18:51 hostname kernel: [6976969.348731] active_file:145 inactive_file:261 isolated_file:0
Feb 7 10:18:51 hostname kernel: [6976969.348731] unevictable:0 dirty:0 writeback:0 unstable:0
Feb 7 10:18:51 hostname kernel: [6976969.348731] slab_reclaimable:6247 slab_unreclaimable:10431
Feb 7 10:18:51 hostname kernel: [6976969.348731] mapped:292 shmem:190 pagetables:1950 bounce:0
Feb 7 10:18:51 hostname kernel: [6976969.348731] free:13131 free_pcp:137 free_cma:0
Feb 7 10:18:51 hostname kernel: [6976969.348733] Node 0 active_anon:1838324kB inactive_anon:332kB active_file:580kB inactive_file:1044kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1168kB dirty:0kB writeback:0kB shmem:760kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 18432kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Feb 7 10:18:51 hostname kernel: [6976969.348740] Node 0 DMA free:7916kB min:364kB low:452kB high:540kB active_anon:7760kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:12kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Feb 7 10:18:51 hostname kernel: [6976969.348743] lowmem_reserve[]: 0 1890 1890 1890 1890
Feb 7 10:18:51 hostname kernel: [6976969.348744] Node 0 DMA32 free:44608kB min:44688kB low:55860kB high:67032kB active_anon:1830564kB inactive_anon:332kB active_file:580kB inactive_file:1044kB unevictable:0kB writepending:0kB present:2031472kB managed:1970364kB mlocked:0kB kernel_stack:2176kB pagetables:7788kB bounce:0kB free_pcp:548kB local_pcp:548kB free_cma:0kB
Feb 7 10:18:51 hostname kernel: [6976969.348747] lowmem_reserve[]: 0 0 0 0 0
Feb 7 10:18:51 hostname kernel: [6976969.348748] Node 0 DMA: 34kB (E) 88kB (UE) 616kB (UME) 6032kB (UME) 4364kB (UME) 4128kB (E) 4256kB (UE) 3512kB (UME) 01024kB 02048kB 04096kB = 7916kB
Feb 7 10:18:51 hostname kernel: [6976969.348755] Node 0 DMA32: 708
4kB (UMEH) 6348kB (UEH) 29616kB (UEH) 21932kB (UME) 12664kB (UME) 56128kB (UME) 18256kB (UE) 6512kB (UM) 21024kB (UM) 02048kB 04096kB = 44608kB
Feb 7 10:18:51 hostname kernel: [6976969.348763] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Feb 7 10:18:51 hostname kernel: [6976969.348764] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Feb 7 10:18:51 hostname kernel: [6976969.348764] 607 total pagecache pages
Feb 7 10:18:51 hostname kernel: [6976969.348771] 0 pages in swap cache
Feb 7 10:18:51 hostname kernel: [6976969.348772] Swap cache stats: add 0, delete 0, find 0/0
Feb 7 10:18:51 hostname kernel: [6976969.348772] Free swap = 0kB
Feb 7 10:18:51 hostname kernel: [6976969.348773] Total swap = 0kB
Feb 7 10:18:51 hostname kernel: [6976969.348773] 511866 pages RAM
Feb 7 10:18:51 hostname kernel: [6976969.348774] 0 pages HighMem/MovableOnly
Feb 7 10:18:51 hostname kernel: [6976969.348774] 15298 pages reserved
Feb 7 10:18:51 hostname kernel: [6976969.348774] 0 pages cma reserved
Feb 7 10:18:51 hostname kernel: [6976969.348775] 0 pages hwpoisoned
Feb 7 10:18:51 hostname kernel: [6976969.348775] Tasks state (memory values in pages):
Feb 7 10:18:51 hostname kernel: [6976969.348776] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Feb 7 10:18:51 hostname kernel: [6976969.348779] [ 255] 0 255 29416 523 225280 0 -250 systemd-journal
Feb 7 10:18:51 hostname kernel: [6976969.348781] [ 279] 0 279 622 26 45056 0 0 none
Feb 7 10:18:51 hostname kernel: [6976969.348782] [ 346] 0 346 4665 278 57344 0 -1000 systemd-udevd
Feb 7 10:18:51 hostname kernel: [6976969.348784] [ 392] 100 392 6718 265 73728 0 0 systemd-network
Feb 7 10:18:51 hostname kernel: [6976969.348785] [ 437] 101 437 6080 1105 81920 0 0 systemd-resolve
Feb 7 10:18:51 hostname kernel: [6976969.348786] [ 440] 102 440 22599 235 77824 0 0 systemd-timesyn
Feb 7 10:18:51 hostname kernel: [6976969.348788] [ 491] 0 491 59073 449 90112 0 0 accounts-daemon
Feb 7 10:18:51 hostname kernel: [6976969.348789] [ 496] 0 496 1703 67 53248 0 0 cron
Feb 7 10:18:51 hostname kernel: [6976969.348790] [ 497] 103 497 1891 210 49152 0 -900 dbus-daemon
Feb 7 10:18:51 hostname kernel: [6976969.348792] [ 514] 0 514 7252 1915 90112 0 0 networkd-dispat
Feb 7 10:18:51 hostname kernel: [6976969.348792] [ 515] 0 515 1610 55 53248 0 0 qemu-ga
Feb 7 10:18:51 hostname kernel: [6976969.348794] [ 516] 104 516 56083 501 81920 0 0 rsyslogd
Feb 7 10:18:51 hostname kernel: [6976969.348795] [ 518] 0 518 4255 274 73728 0 0 systemd-logind
Feb 7 10:18:51 hostname kernel: [6976969.348796] [ 521] 0 521 948 46 49152 0 0 atd
Feb 7 10:18:51 hostname kernel: [6976969.348797] [ 540] 0 540 24974 334 81920 0 0 dhclient
Feb 7 10:18:51 hostname kernel: [6976969.348799] [ 553] 0 553 1400 29 53248 0 0 agetty
Feb 7 10:18:51 hostname kernel: [6976969.348800] [ 565] 0 565 1457 28 49152 0 0 agetty
Feb 7 10:18:51 hostname kernel: [6976969.348801] [ 572] 0 572 26966 1902 110592 0 0 unattended-upgr
Feb 7 10:18:51 hostname kernel: [6976969.348803] [ 578] 0 578 59126 413 94208 0 0 polkitd
Feb 7 10:18:51 hostname kernel: [6976969.348804] [ 370731] 0 370731 3042 231 65536 0 -1000 sshd
Feb 7 10:18:51 hostname kernel: [6976969.348805] [ 373565] 0 373565 100156 4352 139264 0 0 f2b/server
Feb 7 10:18:51 hostname kernel: [6976969.348807] [ 423150] 0 423150 17463 1351 151552 0 0 td-agent-bit
Feb 7 10:18:51 hostname kernel: [6976969.348808] [ 522843] 997 522843 1391196 4064 372736 0 0 telegraf
Feb 7 10:18:51 hostname kernel: [6976969.348809] [ 727705] 110 727705 341630 8140 360448 0 0 grafana-server
Feb 7 10:18:51 hostname kernel: [6976969.348814] [ 727728] 110 727728 465797 7231 1654784 0 0 plugin_start_li
Feb 7 10:18:51 hostname kernel: [6976969.348816] [ 745643] 998 745643 764051 422758 5009408 0 0 influxd
Feb 7 10:18:51 hostname kernel: [6976969.348817] [ 749131] 0 749131 3443 363 65536 0 0 sshd
Feb 7 10:18:51 hostname kernel: [6976969.348821] [ 749139] 1003 749139 4632 370 77824 0 0 systemd
Feb 7 10:18:51 hostname kernel: [6976969.348822] [ 749143] 1003 749143 42924 1130 90112 0 0 (sd-pam)
Feb 7 10:18:51 hostname kernel: [6976969.348823] [ 749166] 1003 749166 3512 473 65536 0 0 sshd
Feb 7 10:18:51 hostname kernel: [6976969.348825] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/influxdb.service,task=influxd,pid=745643,uid=998
Feb 7 10:18:51 hostname kernel: [6976969.348860] Out of memory: Killed process 745643 (influxd) total-vm:3056204kB, anon-rss:1691032kB, file-rss:0kB, shmem-rss:0kB, UID:998 pgtables:4892kB oom_score_adj:0
Feb 7 10:18:51 hostname telegraf[522843]: 2021-02-07T09:18:51Z W! [inputs.netstat] Collection took longer than expected; not complete after interval of 10s
Feb 7 10:18:51 hostname kernel: [6976969.565451] oom_reaper: reaped process 745643 (influxd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Feb 7 10:18:51 hostname systemd[1]: influxdb.service: Main process exited, code=killed, status=9/KILL
Feb 7 10:18:51 hostname systemd[1]: influxdb.service: Failed with result 'signal'.
...

@jbubik
Copy link

jbubik commented Jul 7, 2021

I have version 1.8.6 running on RPi4. Got the same timeouts:
Jul 7 12:37:40 srv-rpi4 influxd[1607]: ts=2021-07-07T10:37:40.085034Z lvl=info msg="failed to store statistics" log_id=0VBNlMQ0000 service=monitor error=timeout

It helped me to reconfigure the default values in /etc/influxdb/influxdb.conf:

[data]
  wal-fsync-delay = "10s"
[coordinator]
  write-timeout = "10s"

The timeouts disappear when write-timeout is longer than wal-fsync-delay, eg:

[data]
  wal-fsync-delay = "10s"
[coordinator]
  write-timeout = "30s"

Hope this helps someone in the future.
It would be beneficial to change the default values in influxdb.conf to something more reasonable.

@BenAhrdt
Copy link

BenAhrdt commented Jul 7, 2023

i have only this mouted to docker container:
image

@BenAhrdt
Copy link

I have this config:

image

The error is still there .... 2 Weeks ago the last time... but today the error comes back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests