Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening storage failed read meta information ... invalid character #3924

Closed
mlanner opened this Issue Mar 7, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@mlanner
Copy link

mlanner commented Mar 7, 2018

What did you do?

Disk filled up. Prometheus crashed. Cleaned up. Restarted machine.

What did you expect to see?

Prometheus starting up cleanly again.

What did you see instead? Under which circumstances?

Prometheus refuses to start with:

# systemctl status prometheus
● prometheus.service - Prometheus Server
   Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Wed 2018-03-07 21:19:49 UTC; 3s ago
  Process: 372 ExecStart=/opt/prometheus/prometheus --config.file /opt/prometheus/prometheus.yml (code=exited, status=0/SUCCESS)
 Main PID: 372 (code=exited, status=0/SUCCESS)

Mar 07 21:19:49 prometheus prometheus[372]: level=info ts=2018-03-07T21:19:49.084790523Z caller=main.go:382 msg="Scrape discovery manager stopped"
Mar 07 21:19:49 prometheus prometheus[372]: level=info ts=2018-03-07T21:19:49.085141368Z caller=main.go:424 msg="Stopping scrape manager..."
Mar 07 21:19:49 prometheus prometheus[372]: level=info ts=2018-03-07T21:19:49.085162083Z caller=manager.go:460 component="rule manager" msg="Stopping rule manager..."
Mar 07 21:19:49 prometheus prometheus[372]: level=info ts=2018-03-07T21:19:49.085177378Z caller=manager.go:466 component="rule manager" msg="Rule manager stopped"
Mar 07 21:19:49 prometheus prometheus[372]: level=info ts=2018-03-07T21:19:49.085185311Z caller=notifier.go:493 component=notifier msg="Stopping notification manager..."
Mar 07 21:19:49 prometheus prometheus[372]: level=info ts=2018-03-07T21:19:49.085194827Z caller=main.go:570 msg="Notifier manager stopped"
Mar 07 21:19:49 prometheus prometheus[372]: level=info ts=2018-03-07T21:19:49.085204304Z caller=main.go:396 msg="Notify discovery manager stopped"
Mar 07 21:19:49 prometheus prometheus[372]: level=info ts=2018-03-07T21:19:49.085216998Z caller=main.go:418 msg="Scrape manager stopped"
Mar 07 21:19:49 prometheus prometheus[372]: level=error ts=2018-03-07T21:19:49.08527161Z caller=main.go:579 err="Opening storage failed read meta information data/01C7SKWJHTCJVKQ3F7SCWBTYGD: invalid character '\\x19' looking for beginning of value"
Mar 07 21:19:49 prometheus prometheus[372]: level=info ts=2018-03-07T21:19:49.085298148Z caller=main.go:581 msg="See you next time!"

Environment

  • System information:
Linux 4.13.4-1-pve x86_64
  • Prometheus version:
prometheus, version 2.1.0 (branch: HEAD, revision: 85f23d82a045d103ea7f3c89a91fba4a93e6367a)
  build user:       root@6e784304d3ff
  build date:       20180119-12:01:23
  go version:       go1.9.2
  • Alertmanager version:

N/A

  • Prometheus configuration file:

N/A

  • Alertmanager configuration file:

N/A

  • Logs:

N/A

@mlanner

This comment has been minimized.

Copy link
Author

mlanner commented Mar 7, 2018

After going through and selectively moving the corrupted data in data/01C7SKWJHTCJVKQ3F7SCWBTYGD etc, and so on, starting Prometheus again finally worked. However, there is obviously data loss in terms of historical metrics that no longer exist. Would be nice to be able to run some kind of "rescue" tool to get the metrics back.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Mar 8, 2018

Would you be able to share the meta.json file? Did you happen to edit/open any of the files in an editor before the error?

While it should be possible to recover the meta file from the data, it is a curious corruption as we do an mv and not write to meta.json directly.

@mlanner

This comment has been minimized.

Copy link
Author

mlanner commented Mar 9, 2018

Hi @gouthamve,

No, we didn't edit/open any of the files. I discovered the instance had run out of space and fixed that problem. However, even after freeing up space, the Prometheus wouldn't start, as outlined above. Moving one directory after another until Prometheus stopped complaining about corrupted files was the only thing we did to get it working again.

Here's the output of two of the meta.json files in directories that were corrupted:

$ cat /tmp/01C7WTWGEETK3SSX5D1W0AKVW3/meta.json
{
	"ulid": "01C7WTWGEETK3SSX5D1W0AKVW3",
	"minTime": 1520301600000,
	"maxTime": 1520308800000,
	"stats": {
		"numSamples": 5765502,
		"numSeries": 10442,
		"numChunks": 48101
	},
	"compaction": {
		"level": 1,
		"sources": [
			"01C7WTWGEETK3SSX5D1W0AKVW3"
		]
	},
	"version": 2
}
$ cat /tmp/01C7T8FR9YWZC507EW9TR36ECV/meta.json
{
	"ulid": "01C7T8FR9YWZC507EW9TR36ECV",
	"minTime": 1520215200000,
	"maxTime": 1520222400000,
	"stats": {
		"numSamples": 5761404,
		"numSeries": 10434,
		"numChunks": 48063
	},
	"compaction": {
		"level": 1,
		"sources": [
			"01C7T8FR9YWZC507EW9TR36ECV"
		],
		"failed": true
	},
	"version": 2
}
@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Nov 9, 2018

@mlanner after some discussion in prometheus/tsdb#283 It was decided to implement this as part of the tsdb cli tool.

Closing now, but feel free to reopen if need to add more details to the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.