Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): implement backup and restore #16504

Merged
merged 15 commits into from
Jan 21, 2020
Merged

Conversation

jacobmarble
Copy link
Member

@jacobmarble jacobmarble commented Jan 10, 2020

Closes #15603
Closes #15604
Closes #15605
Closes #15606

Implements online backup and offline restore.

To test functionality it was necessary to combine #15558 and #16352 into one PR based on recent master/HEAD. Then:

  • wrote some data
  • read it back
  • backed it up with influx backup
  • terminated influxd
  • restored it with influxd restore to existing database directory
  • started influxd, read it back, terminated influxd
  • restored it with influxd restore to empty database directory
  • started influxd, read it back, terminated influxd

Check before merging:

  • "restored it with influxd restore to empty database directory" fails
  • CHANGELOG.md updated with a link to the PR (not the Issue)
  • Well-formatted commit messages
  • Rebased/mergeable
  • Tests pass
  • Documentation updated or issue created (provide link to issue/pr)

cmd/influx/backup.go Show resolved Hide resolved
@jacobmarble
Copy link
Member Author

@tmgordeeva this is weird. Restore to existing directory works properly, but restore to empty dir looks like this:

$ influx query -o my-org 'from(bucket: "my-bucket") |> range(start: -3h)'
See 'influx query -h' for help
Error: Unauthorized access.
See 'influx query -h' for help

I think you can ignore the extra "See 'influx query -h' for help", it's the access that's the issue. I checked the bolt file MD5 sum before and after restore, and it looks identical in both scenarios. I've reproduced this error twice. Please take a look Monday.

@tmgordeeva
Copy link

@jacobmarble I think the problem is the credentials aren't part of this backup/restore process. I got the same, but restored it to working by copying the credentials back in.

@tmgordeeva
Copy link

@jacobmarble building idpe fails against this with:

etcd/kvservice.go:34:21: cannot use cache (type *KVCache) as type "github.com/influxdata/influxdb/kv".Store in argument to "github.com/influxdata/idpe/kv".NewClient:
	*KVCache does not implement "github.com/influxdata/influxdb/kv".Store (missing Backup method)
etcd/kvservice.go:56:21: cannot use store (type *KVStore) as type "github.com/influxdata/influxdb/kv".Store in argument to "github.com/influxdata/idpe/kv".NewClient:
	*KVStore does not implement "github.com/influxdata/influxdb/kv".Store (missing Backup method)

Created https://github.com/influxdata/idpe/pull/5776 to hopefully fix that.

@jacobmarble jacobmarble force-pushed the jgm-tg-backup-restore branch 2 times, most recently from 96752f8 to 880570c Compare January 16, 2020 23:07
@tmgordeeva
Copy link

@jacobmarble latest commit should fix the bolt being copied to credentials issue. Problem with my testing was I was still using the token explicitly in queries which I didn't realize overrides whatever's in credentials, retested without it to verify and verified credentials files match before and after.

Copy link
Contributor

@sebito91 sebito91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great, a couple of small nits included but one major question:

Why do we perform os.MkdirAll with 0777 permissions? That seems way too open for my liking, could we make do with 0755 instead? Do we truly need these files to be read/write/execute enable for other?

authorizer/backup.go Outdated Show resolved Hide resolved
cmd/influxd/restore/command.go Outdated Show resolved Hide resolved
@jacobmarble
Copy link
Member Author

I think this is great, a couple of small nits included but one major question:

Why do we perform os.MkdirAll with 0777 permissions? That seems way too open for my liking, could we make do with 0755 instead? Do we truly need these files to be read/write/execute enable for other?

I scanned the code to figure out what we currently do. In the storage engine, 0777 and 0666 is very common, and in #5171 @benbjohnson argues that the user of InfluxDB can umask if the defaults are 0777 and 0666. Therefore, I made the change in that direction.

@sebito91
Copy link
Contributor

Thanks for that @jacobmarble, appreciate the history on this. I'm fine with the changes as long as you are OK with that return err comment.

Copy link
Contributor

@sebito91 sebito91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants