Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update restore functionality to run in online mode, consume Enterprise backup files. #9207

Merged
merged 16 commits into from
Jan 10, 2018

Conversation

aanthony1243
Copy link
Contributor

@aanthony1243 aanthony1243 commented Dec 7, 2017

Added a flag -online that will run a database restore on a live DB instance. A side effect of this change is that the existing DB's in the destination system will be unaffected, while the imported database must not exist in the system.

Also added a -enterprise flag that will enable restore to consume enterprise data files. This mode always runs as an online update.

Existing restore functionality is preserved.

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
Required only if applicable

You can erase any checkboxes below this note if they are not applicable to your Pull Request.

  • Update man page when modifying a command
  • InfluxData Documentation: issue filed or pull request submitted <link to issue or pull request>

@ghost ghost assigned aanthony1243 Dec 7, 2017
@ghost ghost added the review label Dec 7, 2017
@@ -0,0 +1,20 @@
// +build !windows
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied from plutonium.

@@ -0,0 +1,19 @@
package tar
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied from plutonium

"time"
)

func StreamFunc(w io.Writer, dir, relativePath string, writeFunc func(f os.FileInfo, shardRelativePath, fullPath string, tw *tar.Writer) error) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied and adapted from plutonium to be more flexible for the OSS use-case where we filter files before they go to the archive. the original API is below, and is functionally equivalent to the original.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported? Also, needs docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamFunc is an odd name for a function to call. Maybe have Stream take an optional func arg that defaults to the TarFile if nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 needs to be exported, esp since it now replaces Stream(...)

@aanthony1243 aanthony1243 changed the title Aa 8880 liverestore Update restore functionality to run in online mode, consume Enterprise backup files. Dec 7, 2017

message EnterpriseData {
required bytes Data = 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if cmd.enterprise || cmd.online {
if cmd.enterprise {
if filepath.Ext(cmd.backupFilesPath) != ".manifest" {
return fmt.Errorf("when using -enterprise, must provide path to a manifest file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right. Enterprise backups can be taken incrementally, in which case a directory containing the manifests should be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had my reasons for this. maybe we can discuss in person.

if cmd.metadir != "" {
if err := cmd.unpackMeta(); err != nil {
return err
if cmd.enterprise || cmd.online {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be broken up into two method calls instead of single if statement with two large blocks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if cmd.enterprise || cmd.online {
// validate the arguments
if cmd.sourceDatabase == "" {
return fmt.Errorf("-db is a required parameter")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also support a complete restore, no? Why is this limited to restoring a single database at a time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

func (cmd *Command) updateMetaLive() error {

var metaBytes []byte
if cmd.enterprise {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing if cmd.enterprise scattered around far too much for my liking. Can you abstract away the behavior into a type/interface, and only make that decision once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with this point. I did tidy up the logic a bit so it's not "scattered around" so much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still adds large blocks of code that a split by a single else. This needs to either be abstracted away as a common interface, or just split the codepaths (call different methods).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return shardIDMap, errors.New("database already exists")
}

for _, rpImport := range dbImport.RetentionPolicies {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly recommend updating this to create new info objects throughout this code, rather than mutating what's already here. The metadata structs are full of slices, so you may be inadvertently modifying shared data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solved using dbInfo.clone() which does a deep copy.

d := json.NewDecoder(conn)

if err := d.Decode(&r); err != nil {
return r, []byte{}, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil should be used for an empty slice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


remainder := d.Buffered()

//// no time to investigate. It seems the JSON encoder puts one extra byte on the stream,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is going on here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I just read through... and it looks like at one point the protocol is [byte].[byte].[JSON-object].[binary-data]... That seems far too fragile to be useful or maintainable.

if err != nil && err != io.EOF {
return r, bits, err
}
return r, bits[1:], nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is one byte being stripped off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the JSON encoder seems to put an extra byte on the stream on the client side. I'll add a comment.

// it is a bit random but sometimes the Json decoder will consume all the bytes and sometimes
// it will leave a few behind.
if err != io.EOF && n < int(r.UploadSize+1) {
n, err = conn.Read(bits[n:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, prefer ioutil.ReadAll or io.ReadFull to ensure draining of readers or reading sufficient data.

"time"
)

func StreamFunc(w io.Writer, dir, relativePath string, writeFunc func(f os.FileInfo, shardRelativePath, fullPath string, tw *tar.Writer) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported? Also, needs docs.

return StreamFunc(w, dir, relativePath, TarFile)
}

func SinceFilterTarFile(since time.Time) func(f os.FileInfo, shardRelativePath, fullPath string, tw *tar.Writer) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
}

func TarFile(f os.FileInfo, shardRelativePath, fullPath string, tw *tar.Writer) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, has a stutter tar.TarFile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will doc new Stream function, but yes this should be exported. The design is that you can write a custom filter for Stream. For example, see timeStampFilterTarFile in engine.go, which is too specific to be in pkg/tar. A custom function may, and in many cases should, use TarFile internally. Will rename it to remove stutter.

"time"
)

func StreamFunc(w io.Writer, dir, relativePath string, writeFunc func(f os.FileInfo, shardRelativePath, fullPath string, tw *tar.Writer) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamFunc is an odd name for a function to call. Maybe have Stream take an optional func arg that defaults to the TarFile if nil?

Copy link
Contributor

@joelegasse joelegasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have some structural/organizational issues in the restore command that need to be cleaned up, along with a confusing naming scheme.

The tar package needs to be standardized around forward slash paths so that backups will be compatible across platforms.

Missing godoc comments, and a couple other quality issues.

Some potential issues with the importing of metadata. We need to be explicit in the restore, and not blindly copy everything and replace a few fields.

I still really don't like the "write a JSON object then an arbitrary stream" format that was added here. I think we need to do some actual planning. Either the request should be a single JSON object, or we should use protocol buffers as we do in other places to better handle the binary data.

@@ -0,0 +1,151 @@
package tar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs godoc comments on all exported functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what commit this is. Latest HEAD has docs on every function in this file.

@@ -783,6 +784,91 @@ func (data *Data) hasAdminUser() bool {
return false
}

// ImportData imports selected data into the current metadata.
// opts provides options to possibly rename the Database or Retention Policy, or change the Replication Factor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment doesn't line-up with args. There is no opts parameter, and the list of parameters doesn't mention anything about replication factor. The comment should also describe what the returned map is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return fmt.Errorf("imported metadata does not have datbase named %s", backupDBName)
}

dbImport := dbPtr.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually going to insist that a new DatabaseInfo object get created and populated manually. There are other fields (continuous queries, default RP) that are being copied without concern, and any new fields would get copied without consideration of what that means for an import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return nil, fmt.Errorf("updating metadata on influxd service failed: err=%v, n=%d", err, n)
}

resp := b.Bytes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario, prefer ioutil.ReadAll rather than creating a bytes.Buffer, using ReadFrom, and then only calling Bytes().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


func (c *Client) decodeUintPair(bits []byte) (uint64, uint64, error) {
if len(bits) != 16 {
return 0, 0, fmt.Errorf("array must have exactly 16 bytes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors.New, there is no formatting occurring here. Also, the error message says "array", when it is talking about a slice.

if cmd.backupRetention == "" || cmd.backupRetention == file.Policy {
if cmd.shard == "" || cmd.shard == strconv.FormatUint(file.ShardID, 10) {
cmd.StdoutLogger.Printf("Restoring shard %d live from backup %s\n", file.ShardID, file.FileName)
f, err := os.Open(filepath.Join(cmd.backupFilesPath, file.FileName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎 It is not closed in the case of an error

if err != nil {
return err
}
f, err := os.Open(fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎 handle the error case as well

}
}
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this off somehow. Two giant blocks of code split by a single else makes it difficult to maintain.

if err != nil {
return err
}
if cmd.manifest.Platform == "OSS" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, this needs to be called something else, then. "enterprise" is not a state of mind, it is a different version of the software. The open-source database does not produce "enterprise" backups, it's just an updated backup format on-disk. Calling it "enterprise" but allowing it come from the open-source database is very confusing.

// should get us ["db","rp", "00001", "00"]
pathParts := strings.Split(filepath.Base(tarFile), ".")
if len(pathParts) != 4 {
return fmt.Errorf("backup tarfile name incorrect length")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the length that is incorrect, it's the format.

@aanthony1243
Copy link
Contributor Author

@joelegasse should have covered all your change requests. I rolled in a small feature #3019 to backup that collects a file listing. Initially I was using it to drive some tests, though your change requests designed that out. Also added is support for incremental manifest file collections.

@aanthony1243 aanthony1243 force-pushed the aa_8880_liverestore branch 6 times, most recently from ba5254e to a059547 Compare January 5, 2018 04:47
Copy link
Contributor

@joelegasse joelegasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove myself as a reviewer at this point. Previous issues have not been addressed, and more changes are being added in.

@@ -57,6 +57,8 @@ type Command struct {
enterprise bool
manifest backup_util.Manifest
enterpriseFileBase string

BackupFiles []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment. What does this do, why is it exported, etc.

@@ -7,7 +7,9 @@
### Features

- [#8495](https://github.com/influxdata/influxdb/pull/8495): Improve CLI connection warnings
- [#9146](https://github.com/influxdata/influxdb/issues/9146): Backup can produce data in the same format as the enterprise backup/restore tool.
- [#3019](https://github.com/influxdata/influxdb/issues/3019): Backup utility prints a list of backup files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to stop in the future... We can't just keep adding more and more issues to a PR. If there's a dependency, get it merged first. Adding extra issues and extra code to the same PR makes review longer and more difficult to tell what changes are related to which issue.

@@ -237,6 +244,9 @@ func (cmd *Command) backupShard(db, rp, sid string) error {

// TODO: verify shard backup data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a TODO?

@rbetts rbetts self-requested a review January 5, 2018 16:59
@rbetts
Copy link
Contributor

rbetts commented Jan 5, 2018

This PR has been open for a long time and has been read and iterated against by reviewers and by Adam. I understand that there are some lingering differences but at this point I've decided to allow this code to master.

The most recent addition to the PR is largely my fault - I asked Adam to simultaneously hold this work back from master until TSI landed and to go ahead and implement the remaining functionality to support restore of incremental backups to OSS.

Adam is going to put together docs for the supported changes. I'll arrange a review of those with PM.

Thank you everyone for working together on this - I know it has been a challenging experience.

@aanthony1243 aanthony1243 dismissed joelegasse’s stale review January 8, 2018 15:41

final approval by ryan

@aanthony1243 aanthony1243 force-pushed the aa_8880_liverestore branch 2 times, most recently from 49e99ea to 478ceab Compare January 8, 2018 20:26
@aanthony1243 aanthony1243 merged commit 938db68 into master Jan 10, 2018
@ghost ghost removed the review label Jan 10, 2018
@aanthony1243 aanthony1243 deleted the aa_8880_liverestore branch January 10, 2018 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants