does not start up after corrupted meta.json file #4058

haraldschilly · 2018-04-07T09:07:56Z

This ticket is a follow up of #2805 (there are similar comments at the bottom after closing it)

What did you do?

Run prometheus in a kubernetes cluster. On a GCE PD disk.

What did you see instead? Under which circumstances?

It crashed upon start, logfile:

level=error ts=2018-04-07T04:28:53.784390578Z caller=main.go:582
err="Opening storage failed unexpected end of JSON input"
level=info ts=2018-04-07T04:28:53.784418708Z caller=main.go:584
msg="See you next time!"

The point here is, that the meta.json file has a size of zero:

> ls -l /data/*/meta.json
[...]
-rw-rw-r--    1 1000     2000           283 Apr  7 03:05 01CAF10VV5FNDJ6PG84E6RSEV3/meta.json
-rw-rw-r--    1 1000     2000             0 Apr  7 03:15 01CAF1K5SQZT4HBQE9P6W7J56E/meta.json

Manual resolution

I've deleted that directory 01CAF1K5SQZT4HBQE9P6W7J56E with the problematic meta.json file in it and now it start up fine again.

Environment

System information:

Linux 4.10.0-40-generic x86_64
Prometheus version:

prometheus, version 2.2.1 (branch: HEAD, revision: bc6058c81272a8d938c05e75607371284236aadc)
  build user:       root@149e5b3f0829
  build date:       20180314-14:15:45
  go version:       go1.10

("official" docker build)

Logs:

level=error ts=2018-04-07T08:51:28.789897822Z caller=main.go:582 err="Opening storage failed unexpected end of JSON input"

Expected behavior

What I would wish is that prometheus starts up and doesn't CrashLoop. It should either

ignore that directory, saying in the log that meta.json is faulty
maybe move it to [directoryname].broken/ ?
reconstruct the meta.json file from the data
delete the problematic directory (bit harsh, ignoring might be better)

The text was updated successfully, but these errors were encountered:

krasi-georgiev · 2018-04-07T20:00:56Z

I did a PR that should address this issue. @fabxc and @gouthamve will advise soon.
prometheus-junkyard/tsdb#283

TimSimmons · 2018-04-16T19:45:12Z

I ran into this today, had a single Prometheus server with the exact same behavior as above.

$ ls -l 01CB7X7ZX2E1C6R1SQHVTVVX9J
total 180008
drwxr-xr-x 2 prometheus prometheus      4096 Apr 16 19:00 chunks
-rw-r--r-- 1 prometheus prometheus 184323907 Apr 16 19:00 index
-rw-r--r-- 1 prometheus prometheus         0 Apr 16 19:00 meta.json
-rw-r--r-- 1 prometheus prometheus         0 Apr 16 19:00 tombstones

I moved the dir out of the data directory and Prometheus restarted happily.

bmihaescu · 2018-10-02T14:16:11Z

I faced this problem also today.
Finding and deleting the folder with the empty json solved the issue.

root@ip-xxx-xxx-xxx-xxx:/var/lib/kubelet/pods/ce0fba7f-c649-11e8-128370c697fc/volumes/kubernetes.io~aws-ebs/pvc-a7cf0ff9-11e8-acf2-020ae50968fe/prometheus-db# find ./ -name meta.json -size 0
./01CRRG0QYZS8GJQGS96QZYR5TJ/meta.json

krasi-georgiev · 2018-10-02T14:39:08Z

do you by any chance use NFS?

There was some discussion in the past and NFS sometimes behaves weird and I don't think there was anything we could do to prevent this so for this reason NFS is considered unsupported.

Vlaaaaaaad · 2018-10-08T11:35:21Z

@krasi-georgiev: working with @bmihaescu so I can comment on this. The error is from a kops-deployed cluster running on AWS using EBS.

krasi-georgiev · 2018-10-08T16:51:40Z

That would be hard to troubleshoot. I would need some specific steps how to replicate.

How often does it happen and can you replicate with latest release?

krasi-georgiev · 2018-10-08T19:52:15Z

@Vlaaaaaaad , @bmihaescu are you sure you have enough free space?. (suggested by Brian on IRC so worth checking.)
During compaction it needs a bit for temporary holder during the joining.

Vlaaaaaaad · 2018-10-09T09:15:03Z

@krasi-georgiev oh, that is likely to be the issue. We do have some free space, but not much. Is there some documentation on how much space should be free( a certain value, a percentage)?
@bmihaescu will have more details as he was the one who did the investigation and fix.

This is happening on two older clusters( k8s 1.10.3 and 1.9.6), with prometheus-operator v0.17.0 and prometheus v2.2.1 so the issue might be fixed in newer versions.

Tagging @markmunozoz too.

krasi-georgiev · 2018-10-09T09:28:08Z

I am not 100% sure, but logically I would say at least 5 times your biggest block.
Maybe a bit less will do, but storage is not expensive these days so better be on the safe side.

btw there are plans to add storage based retention so should help use cases where storage is limited. prometheus-junkyard/tsdb#343

krasi-georgiev · 2018-11-09T13:31:27Z

anyone else wants to add anything else before we marked as resolved?

@haraldschilly did you find the cause for your case?

delete the problematic directory (bit harsh, ignoring might be better)

This is implemented as part of the tsdb cli scan tool which is still in review.

slomo · 2018-12-10T09:13:42Z

I am running into the same issue on prometheus 2.4.3 with vagrant. When I suspend my machine virtual box seems to crash and after the crash I reboot the machine and usually one but sometimes up to 90% of my meta.json files are empty.

[vagrant@prometheus ~]$ sudo find /var/lib/prometheus  -iname meta.json -size 0b | wc
    345     345   19665
[vagrant@prometheus ~]$ sudo find /var/lib/prometheus  -iname meta.json | wc
    357     357   20349

I am not seeing this in production yet. I guess simply because my machines rarely ever crash.
I will try to upgrade to the latest version. But I am a bit scared now that I might loose also data in production on power outage.

krasi-georgiev · 2018-12-10T10:34:48Z

I double checked the code again and the only way I could see this happening is if using nfs or other non POSIX filesystem.

@slomo can you replicate this every time?

krasi-georgiev · 2018-12-10T10:43:23Z

are you maybe mounting a dir from the host to use as a data dir?

haraldschilly · 2018-12-10T10:43:35Z

@krasi-georgiev I'm the one originally reporting this. In case it helps, this did thappen a GCE PD disk, mounted via /dev/sda1 on /prometheus type ext4 (rw,relatime,data=ordered) and /dev/sdd on /data type ext4 (rw,relatime,discard,data=ordered) in a docker container.

krasi-georgiev · 2018-12-10T10:47:36Z

yeah GCE PD disk is ok.
It would help to trace the logs for any errors or warnings before the crash.

haraldschilly · 2018-12-10T10:51:49Z

well, I don't remember seeing any logs specific to that with useful info. it usually happens when there is an OOM event and the kernel kills the prometheus job or the whole VM is shutdown. I think the main underlying reason is that ext4 isn't 100% atomic. This makes me think I should try using zfs or btrfs.

slomo · 2018-12-10T11:15:51Z

It is an ext4 inside an virtualbox vm. I would say it happens on every virtual box crash, I'll try to reproduce it.

krasi-georgiev · 2018-12-10T11:18:36Z

steps to reproduce would really help so I can try to replicate as well. Thanks!

krasi-georgiev · 2018-12-12T12:03:07Z

@slomo any luck with steps to replicate this?
If possible please ping me on IRC (@krasi-georgiev #prometheus-dev) to speed up the troubleshooting.

slomo · 2018-12-12T15:46:14Z

Well. In my setup (which contains a lot of consul sd) hosts I can reproduce it by resetting the virtualbox vm. I tried to create a smaller setup with just a few static node_exporters that are queried and I cant trigger the corruption anymore.

krasi-georgiev · 2018-12-12T15:50:14Z

so you think it is related to the SD being used?

slomo · 2018-12-13T10:31:26Z

@krasi-georgiev I think it would be jumping a bit fast to conclusions to say that sd is at fault, but it definitely requires a certain complexity to occur. I have 6 jobs with a total of ca. 400 target, all targets are added using service discovery with consul.

@haraldschilly Could you roughly describe your setup? Do you use service discovery and how many hosts/applications do you monitor.

krasi-georgiev · 2018-12-13T11:31:08Z

@slomo thanks for the update, any chance to ping me on IRC to speed this up?

krasi-georgiev · 2019-02-26T16:21:18Z

@slomo maybe the main difference is the load. Try it with a higher load - 400 static targets

mmencner · 2019-02-28T00:08:36Z

We are hitting the same issue on single prometheus instance (version 2.6.0 with local storage), running inside a docker container. So far it happened twice, out of ~50 deployed instances:

First time, only the last block had an empty meta.json
Second time, it was last two blocks with an empty meta.json

It's not directly connected with the container restart, as in majority of cases it starts without any issues. It's also not a matter of not enough disk space.

As discussed with @gouthamve, we are planning to mitigate this, by introducing a check for an empty meta.json and deleting directories where this condition is true, but it's definitely not a perfect solution.

brian-brazil · 2019-02-28T08:17:28Z

Are we not creating the meta.json atomically?

krasi-georgiev · 2019-02-28T10:44:20Z

@mmencner would you mind trying it with the latest release as there have been a lot of changes to fix such issues since 2.6. Would also need the full logs to start some useful troubleshooting.

@brian-brazil just had a quick look and it does indeed happen atomically. My guess is that something happens during compaction when creating a new block.

brian-brazil · 2019-02-28T13:51:31Z

I can see no way that code can produce a zero-length file. It'd have to be the kernel saying it's successfully written and closed, but then not having space for it.

krasi-georgiev · 2019-02-28T16:50:47Z

I can see no way that code can produce a zero-length file. It'd have to be the kernel saying it's successfully written and closed, but then not having space for it.

yes I suspect something similar. especially the case of resetting the VM.

pborzenkov · 2019-03-26T09:36:39Z

@brian-brazil @krasi-georgiev We are facing the same issue. Sometimes lots of meta.json files are zero-sized. We run Prometheus on local ext4 FS.

Looking at the writeMetaFile, shouldn't the file contents itself be synced before close?

pborzenkov · 2019-03-26T11:08:29Z

Strangely enough, all the meta.json files have the same modification time and zero size:

[root@xxx]# ls -la /var/lib/prometheus/data/*/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5NZP8MVJXC9P2TQ9QJB1FEV/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5PM9ED3MV7HJSVG5ES6W6KR/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5Q8WM6TR1AH1STJXNRWA55R/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5QXFSYARB5E8M7T4K8MWX8E/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5RJ2ZMZZENCT48KD2KJ0B85/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5S6P5D1WV8G6FKE5RHHEETA/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5SV9B4H5A5BPZ6AYMVD2KX3/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5TFWGXREGZJ4EJ51EREXH9D/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5V4FPQCHJ2VTMZECGNBCM0T/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5VS2WEDDKTHV2WEBG1BT5MB/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5WDP25JN7T2QX2AM3WPK3V3/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5X29FTXJ6C0Z9P1R2GN9PY3/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5XPWDRF57JT697BZN5KDY14/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5YBF6R8F6ZFG17TT2JQ91H9/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5Z02CF9RG1WYPAHMARNHJWG/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D5ZMNJ7CG1XZTT295HH68NCB/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D6098R1BQQ0N04N9FARC4ZWJ/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D60XW6GTZCDAYWDC2PXYBMXP/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D61JF921X3HHNESWTM1A5HF7/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D6272H50VCEDRHJ53NBXXTAQ/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D62VNYAA6TA5MX11GE3NQPWJ/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D63G8XJ907DQK9MC5PAE8CJP/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D644W0QY5AYHW9Z943Y4PFEQ/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D64SF6PQ5WAPMGN4V7BRV4BQ/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D65E2CP1N2C6WTPBPCXW0N70/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D662NSQ94HJZ05XHHZ75E683/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D66Q8T3C8YV3PSKZWW6572B4/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D67BVVRJXG5KJY01RBFBZ1W2/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D67BVWYTJRQ4KVNB5GSECZ43/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D67JQK0EEPFJ50EAF9900P1Z/meta.json
-rw-r--r-- 1 prometheus prometheus 0 Mar 18 08:54 /var/lib/prometheus/data/01D67SKA8EGHH9FAHVRWQVP1Y4/meta.json

There are no errors in prometheus log...

krasi-georgiev · 2019-03-26T11:12:22Z

hm I would expect that the kernel should handle the file sync so don't think this is the culprit.

How long does it take to replicate?

can you ping me on #prometheus-dev , @krasi-georgiev and will try to replicate and find the culprit as this has been a pending issue for a while now.

pborzenkov · 2019-03-26T11:19:45Z

hm I would expect that the kernel should handle the file sync so don't think this is the culprit.

Close definitely doesn't guarantee sync. Also, if a node crashes before the kernel flushes its write-back cache, then we can end-up with a file with no contents, yet successful write/close/rename.

How long does it take to replicate?

Not sure. Happens sporadically. What I can say is that we've seen it only after a node crashed. Everything is fine during normal operation.

krasi-georgiev · 2019-03-26T11:33:13Z

@pborzenkov yeah maybe you are right, just checked the code and Fsync is called for the other block Write operations.
https://github.com/prometheus/tsdb/blob/7757fe6f21d4d9f21bf7a1da92fe573b7d4318c3/compact.go#L629

I will open a PR with the fsync for writeMetaFile and will add will attach a binary to it if you want to test it.

pborzenkov · 2019-03-26T13:56:14Z

@krasi-georgiev I'll be happy to test (though, definitely not in production :)), but crash-related bugs are notoriously hard to reproduce. I tried to check the bug using ALICE (http://research.cs.wisc.edu/adsl/Software/alice/doc/adsl-doc.html) which greatly helped me in the past and that is what I got:

Here is the write part of the test (tsdb.WriteMetaFile just calls tsdb.writeMetaFile):

package main

import (
        "math/rand"
        "os"
        "time"

        "github.com/oklog/ulid"
        "github.com/prometheus/tsdb"
)

func main() {
        now := time.Now()

        err := tsdb.WriteMetaFile(".", &tsdb.BlockMeta{
                ULID:    ulid.MustNew(ulid.Now(), rand.New(rand.NewSource(123))),
                MinTime: now.Add(-2*time.Hour).Unix() * 1000,
                MaxTime: now.Unix() * 1000,
                Version: 1,
        })
        if err != nil {
                os.Exit(1)
        }
        os.Exit(0)
}

And here the checker:

package main

import (
        "os"

        "github.com/prometheus/tsdb"
)

func main() {
        _, err := tsdb.ReadMetaFile(os.Args[1])
        if err != nil && !os.IsNotExist(err) {
                os.Exit(1)
        }
        os.Exit(0)
}

This is what I got with unmodified tsdb:

❯ alice-check --traces_dir=../traces_dir --checker=../../read-test/read-test --debug_level 1
-------------------------------------------------------------------------------
ALICE tool version 0.0.1. Please go through the documentation, particularly the
listed caveats and limitations, before deriving any inferences from this tool.
-------------------------------------------------------------------------------
Parsing traces to determine logical operations ...
Logical operations:
0       creat("meta.json.tmp", parent=107377493, mode='0666', inode=107377690)
1       append("meta.json.tmp", offset=0, count=159, inode=107377690)
2       rename(dest="meta.json", source="meta.json.tmp", source_hardlinks=1, source_parent=107377493, dest_size=0, dest_hardlinks=0, source_size=159, dest_pare
nt=107377493, source_inode=107377690, dest_inode=False)
3       fsync(".", size=60, inode=107377493)
-------------------------------------
Finding vulnerabilities...
(Dynamic vulnerability) Ordering: Operation 1 needs to be persisted before 2
(Static vulnerability) Ordering: Operation /usr/lib/golang/src/syscall/asm_linux_amd64.s:27[syscall.Syscall] needed before /usr/lib/golang/src/syscall/asm_linu
x_amd64.s:53[syscall.Syscall6]
Done finding vulnerabilities.

And this is what I got after adding f.Sync() call before f.Close in writeMetaFile:

❯ alice-check --traces_dir=../traces_dir --checker=../../read-test/read-test --debug_level 1
-------------------------------------------------------------------------------
ALICE tool version 0.0.1. Please go through the documentation, particularly the
listed caveats and limitations, before deriving any inferences from this tool.
-------------------------------------------------------------------------------
Parsing traces to determine logical operations ...
Logical operations:
0       creat("meta.json.tmp", parent=107379907, mode='0666', inode=107378562)
1       append("meta.json.tmp", offset=0, count=159, inode=107378562)
2       fsync("meta.json.tmp", size=159, inode=107378562)
3       rename(dest="meta.json", source="meta.json.tmp", source_hardlinks=1, source_parent=107379907, dest_size=0, dest_hardlinks=0, source_size=159, dest_pare
nt=107379907, source_inode=107378562, dest_inode=False)
4       fsync(".", size=60, inode=107379907)
-------------------------------------
Finding vulnerabilities...
Done finding vulnerabilities.

While this is definitely not a proof that the bug is indeed fixed, the tool has great track record and usually finds real problems.

krasi-georgiev · 2019-03-27T05:53:13Z

wow, that is amazing. Thanks for spending the time. I will open a PR soon.

krasi-georgiev · 2019-04-01T08:44:53Z

Just opened a PR that would close this issue and will go in the next release. Feel free to reopen if you still experience the same issue after that.

gouthamve added the component/local storage label May 9, 2018

krasi-georgiev mentioned this issue May 10, 2018

Add tsdb.Scan() to unblock from a corrupted db. prometheus-junkyard/tsdb#320

Open

3 tasks

krasi-georgiev closed this as completed Nov 12, 2018

krasi-georgiev reopened this Dec 10, 2018

haraldschilly mentioned this issue Jan 26, 2019

crash on startup: open /data/*/chunks no such file or directory #5138

Closed

krasi-georgiev added the kind/more-info-needed label Feb 28, 2019

This was referenced Mar 27, 2019

cleanup tmp files prometheus-junkyard/tsdb#570

Merged

use fsync to force persisting the meta file. prometheus-junkyard/tsdb#573

Merged

krasi-georgiev closed this as completed in prometheus-junkyard/tsdb#573 Apr 1, 2019

simonpasquier mentioned this issue Apr 15, 2019

opening storage failed: block dir: : unexpected end of JSON input #5457

Closed

lock bot locked and limited conversation to collaborators Sep 28, 2019

does not start up after corrupted meta.json file #4058

does not start up after corrupted meta.json file #4058

Comments

haraldschilly commented Apr 7, 2018 • edited Loading

krasi-georgiev commented Apr 7, 2018 • edited Loading

TimSimmons commented Apr 16, 2018

bmihaescu commented Oct 2, 2018

krasi-georgiev commented Oct 2, 2018 • edited Loading

Vlaaaaaaad commented Oct 8, 2018

krasi-georgiev commented Oct 8, 2018 • edited Loading

krasi-georgiev commented Oct 8, 2018

Vlaaaaaaad commented Oct 9, 2018

krasi-georgiev commented Oct 9, 2018

krasi-georgiev commented Nov 9, 2018

slomo commented Dec 10, 2018

krasi-georgiev commented Dec 10, 2018

krasi-georgiev commented Dec 10, 2018

haraldschilly commented Dec 10, 2018 • edited Loading

krasi-georgiev commented Dec 10, 2018 • edited Loading

haraldschilly commented Dec 10, 2018

slomo commented Dec 10, 2018

krasi-georgiev commented Dec 10, 2018

krasi-georgiev commented Dec 12, 2018

slomo commented Dec 12, 2018

krasi-georgiev commented Dec 12, 2018

slomo commented Dec 13, 2018

krasi-georgiev commented Dec 13, 2018

krasi-georgiev commented Feb 26, 2019

mmencner commented Feb 28, 2019

brian-brazil commented Feb 28, 2019

krasi-georgiev commented Feb 28, 2019

brian-brazil commented Feb 28, 2019 • edited Loading

krasi-georgiev commented Feb 28, 2019

pborzenkov commented Mar 26, 2019

pborzenkov commented Mar 26, 2019

krasi-georgiev commented Mar 26, 2019

pborzenkov commented Mar 26, 2019

krasi-georgiev commented Mar 26, 2019

pborzenkov commented Mar 26, 2019 • edited Loading

krasi-georgiev commented Mar 27, 2019

krasi-georgiev commented Apr 1, 2019

haraldschilly commented Apr 7, 2018 •

edited

Loading

krasi-georgiev commented Apr 7, 2018 •

edited

Loading

krasi-georgiev commented Oct 2, 2018 •

edited

Loading

krasi-georgiev commented Oct 8, 2018 •

edited

Loading

haraldschilly commented Dec 10, 2018 •

edited

Loading

krasi-georgiev commented Dec 10, 2018 •

edited

Loading

brian-brazil commented Feb 28, 2019 •

edited

Loading

pborzenkov commented Mar 26, 2019 •

edited

Loading