New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hard links #763

Closed
wants to merge 37 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@jgfrm
Copy link

jgfrm commented Jan 30, 2017

Made struct for HardlinkIndex
Reworked global variable and added creation of HardlinkIndex in restorer
Bind functions to type

Jaap Gordijn
Changed Hardlink() in Link()
Made struct for HardlinkIndex
Reworked global variable and added creation of HardlinkIndex in restorer
Bind functions to type
@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 1, 2017

Does this PR supersede #752?

@fd0
Copy link
Member

fd0 left a comment

I've added my review comments.

"sync"
)

// HardlinkKey is a composed key for finding inodes on a specific device

This comment has been minimized.

@fd0

fd0 Feb 1, 2017

Member

Make all comments which describe something full sentences, see https://github.com/golang/go/wiki/CodeReviewComments#comment-sentences

}

// ExistsLink checks wether the link already exist in the index
func (idx *HardlinkIndex) ExistsLink(inode uint64, device uint64) bool {

This comment has been minimized.

@fd0

fd0 Feb 1, 2017

Member

I'd prefer this function to be named Has() instead of ExistsLink(), that's the same style as the index.

}

// AddLink adds a link to the index
func (idx *HardlinkIndex) AddLink(inode uint64, device uint64, name string) {

This comment has been minimized.

@fd0

fd0 Feb 1, 2017

Member

This function can just be named Add().

}

// GetLinkName obtains the filename from the index
func (idx *HardlinkIndex) GetLinkName(inode uint64, device uint64) string {

This comment has been minimized.

@fd0

fd0 Feb 1, 2017

Member

Hm, GetFilename would be more appropriate as a function name I think.

This function also needs to return an error, what if there isn't a file in the index with this inode and device?

}

// RemoveLink removes a link from the index
func (idx *HardlinkIndex) RemoveLink(inode uint64, device uint64) {

This comment has been minimized.

@fd0

fd0 Feb 1, 2017

Member

Remove the Link from the function name, Remove() is clear enough I think.


var sresult string
sresult = idx.GetLinkName(1, 2)
Assert(t, sresult == "inode1-file1-on-device2",

This comment has been minimized.

@fd0

"golang.org/x/crypto/ssh"

This comment has been minimized.

@fd0

fd0 Feb 1, 2017

Member

This does not belong into the PR, did you accidentally run gofmt on the vendor directory?

@@ -309,14 +309,14 @@ func TestInvalidEntry(t *testing.T) {
}

var knownHostsParseTests = []struct {
input string

This comment has been minimized.

@fd0

fd0 Feb 1, 2017

Member

Same here, do not change the vendored libraries.

func (node Node) createFileAt(path string, repo Repository, idx *HardlinkIndex) error {
if node.Links > 1 {
if !idx.ExistsLink(node.Inode, node.Device) {
idx.AddLink(node.Inode, node.Device, path)

This comment has been minimized.

@fd0

fd0 Feb 1, 2017

Member

I think the node should only be added to the hardlink index when the restore was successful, so you need to move this to the end of the function. Otherwise restoring a file may fail, and with the current code all hardlinked versions of that file will also not be restored because the hardlink index contains a filename that may not exist (or even point to different content).

jgfrm
shortened function names
add file to link index only if the file is successfully restored
changed comments
@jgfrm

This comment has been minimized.

Copy link

jgfrm commented Feb 1, 2017

Can someone explain why the CI failed (and how I can see that)?

jgfrm
@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 1, 2017

Click on the red x, then on Details, e.g. for Travis. Then you'll get to the build's page, e.g. here:
https://travis-ci.org/restic/restic/builds/197350087

The tests for Travis fail at the moment because the infrastructure for running tests on Darwin/OS X is broken: https://www.traviscistatus.com/incidents/k79mjcv403c4

Let's just wait a bit.

@jgfrm

This comment has been minimized.

Copy link

jgfrm commented Feb 1, 2017

should I do something about the gofmt on the vendor libs?

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 2, 2017

Yes: Remove your changes of the files in vendor/. Do you need help doing that?

@jgfrm

This comment has been minimized.

Copy link

jgfrm commented Feb 2, 2017

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 3, 2017

First, find out which commit(s) touch the files in the vendor directory (this assumes you are in the branch issue25):

$ git log --stat master..HEAD -- vendor/
commit 369f9804114d45876078cb26593bc5552e1ccf54
Author: Jaap Gordijn <jaap@intra.gordijn.org>
Date:   Tue Jan 31 00:14:20 2017 +0100

    Changed Hardlink() in Link()
    Made struct for HardlinkIndex
    Reworked global variable and added creation of HardlinkIndex in restorer
    Bind functions to type

 vendor/src/golang.org/x/crypto/ssh/agent/example_test.go | 14 +++++++-------
 vendor/src/golang.org/x/crypto/ssh/keys_test.go          | 20 ++++++++++----------
 2 files changed, 17 insertions(+), 17 deletions(-)

Start an interactive rebase:

$ git rebase --interactive master

This opens an editor with a few lines at the top, one for each commit. Replace the word pick in the line for the commit 369f980 with the word edit, then save and quit the editor.

It then looks like this:

$ git rebase --interactive master
Stopped at 369f980... Changed Hardlink() in Link() Made struct for HardlinkIndex Reworked global variable and added creation of HardlinkIndex in restorer Bind functions to type
You can amend the commit now, with

	git commit --amend 

Once you are satisfied with your changes, run

	git rebase --continue

Then discard the previous commit (so we can create a new one):

$ git reset HEAD~
Unstaged changes after reset:
M	src/restic/fs/file.go
M	src/restic/node.go
M	src/restic/node_test.go
M	src/restic/restorer.go
M	vendor/src/golang.org/x/crypto/ssh/agent/example_test.go
M	vendor/src/golang.org/x/crypto/ssh/keys_test.go

Discard all changes to files in the vendor/ directory:

$ git checkout vendor/
$ git status
[...]

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   src/restic/fs/file.go
	modified:   src/restic/node.go
	modified:   src/restic/node_test.go
	modified:   src/restic/restorer.go

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	src/restic/hardlinks_index.go
	src/restic/hardlinks_index_test.go

Then add the files and make a commit as usual. This time, please adhere to the commit message format (short summary in first line, then an empty line, then more explanations):

$ git add src
$ git commit

Finish the rebase:

$ git rebase --continue
Successfully rebased and updated refs/heads/issue25.

Then push the changes to github, overriding the old ones (hence the --force):

$ git push --force

That should do it :)

@jgfrm

This comment has been minimized.

Copy link

jgfrm commented Feb 3, 2017

I have tried what you suggested.
However, the step
it reset HEAD~
results in
M src/restic/hardlinks_index_test.go
and does not show the checked in vendor files.

Also: what am I supposed to do at the
git commit --amend
step?

thanks

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 3, 2017

Hm, interesting. Thanks for trying, though. Just leave it in, I'll fix that myself later. You should be able to abort the rebase by running git reset --hard and then git rebase --abort.

@jgfrm

This comment has been minimized.

Copy link

jgfrm commented Feb 3, 2017

Ok. Thanks.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 5, 2017

Ok, your changes look good, thanks! The last thing that I'd like to discuss with you is: I think we need an option for restore that allows users not to restore the hard linked files. What do you think?

@fd0

fd0 approved these changes Feb 5, 2017

@jgfrm

This comment has been minimized.

Copy link

jgfrm commented Feb 6, 2017

Well, rsync has the option --hard-links. Then you have to opt in to have hard link support.

@pvgoran

This comment has been minimized.

Copy link

pvgoran commented Feb 6, 2017

I believe that by default a backup application should capture and restore as much details as possible. rsync is a copy tool, it's expected to have a different set of defaults.

P.S. I'm glad that hardlinks and xattrs finally get support in Restic, without this I couldn't consider it a viable solution for my backup needs.

@zcalusic

This comment has been minimized.

Copy link
Member

zcalusic commented Feb 6, 2017

I share the sentiment with @pvgoran, xattr support is greatly appreciated @jgfrm. 👍

@jgfrm

This comment has been minimized.

Copy link

jgfrm commented Feb 6, 2017

Above commit fixes some issues with hardlinks and the repo. The repo did not store hardlink info for dev and symlink.
Hardlinks for fuse work also, but is not included in this commit.
Once this commit in included in master, I will upload a new PR for fuse support

@@ -495,13 +495,17 @@ func (node *Node) fillExtra(path string, fi os.FileInfo) error {
node.Size = uint64(stat.size())
node.Links = uint64(stat.nlink())
case "dir":
node.Links = uint64(stat.nlink())

This comment has been minimized.

@fd0

fd0 Feb 6, 2017

Member

Directories cannot be hard-linked, so this line is not needed.

This comment has been minimized.

@jgfrm
@fd0
Copy link
Member

fd0 left a comment

Ok, I think this is good so far. Thanks for your comments, I agree that restoring hardlinks should be the default. We can add an option when users request that.

So from my point of view the only thing missing is an integration test which archives hardlinked files, restores them again and verifies that the restored files are indeed hard links.

@jgfrm

This comment has been minimized.

Copy link

jgfrm commented Feb 6, 2017

Can you give me a clue where to add the integration test?

jgfrm added some commits Feb 6, 2017

jgfrm
jgfrm
jgfrm
jgfrm
@@ -24,7 +24,7 @@ func (node Node) restoreSymlinkTimestamps(path string, utimes [2]syscall.Timespe

type statWin syscall.Win32FileAttributeData

func toStatT(i interface{}) (statT, bool) {
func ToStatT(i interface{}) (statT, bool) {

This comment has been minimized.

@houndci-bot

houndci-bot Feb 6, 2017

exported function ToStatT should have comment or be unexported

jgfrm added some commits Feb 6, 2017

jgfrm
jgfrm
@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 7, 2017

Ah, it seems you found a place for the integration test. :)

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 7, 2017

So, just format the remaining files with gofmt (that's why Travis failed), then we're good to merge I think :)

jgfrm added some commits Feb 7, 2017

jgfrm
jgfrm
jgfrm
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Feb 7, 2017

Codecov Report

Merging #763 into master will not impact coverage by -0.1%.

@@            Coverage Diff            @@
##           master     #763     +/-   ##
=========================================
- Coverage   54.28%   54.18%   -0.1%     
=========================================
  Files          95       96      +1     
  Lines        7277     7343     +66     
=========================================
+ Hits         3950     3979     +29     
- Misses       2750     2783     +33     
- Partials      577      581      +4
Impacted Files Coverage Δ
src/restic/restorer.go 0% <ø> (ø)
src/restic/hardlinks_index.go 100% <100%> (ø)
src/restic/node.go 37.82% <30.76%> (-0.61%)
src/cmds/restic/main.go 0% <ø> (-15.39%)
src/restic/checker/checker.go 66.96% <ø> (-2.22%)
src/restic/repository/repack.go 50% <ø> (-2.18%)
src/restic/blob.go 27.65% <ø> (-1.89%)
src/cmds/restic/cmd_backup.go 28.68% <ø> (-1.1%)
src/restic/archiver/archive_reader.go 76.27% <ø> (+3.38%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c7d6027...0ff027a. Read the comment docs.

@jgfrm

This comment has been minimized.

Copy link

jgfrm commented Feb 7, 2017

Thanks.
A question: is it possible to run the integration test locally?

@jgfrm

This comment has been minimized.

Copy link

jgfrm commented Feb 8, 2017

I supposed that the code was already merged into the master branch, but apparently, it is not. Am I supposed to do anything?

@fd0 fd0 changed the title Changed Hardlink() in Link() Support hard links Feb 8, 2017

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 8, 2017

Thanks for asking, I didn't find the time yet to review the latest commits you pushed. I'll add a comment when something needs to be changed. Once your code is merged, I'll close this PR.

In order to cleanup the commit history, I will squash all your commits into one, and remove the changes gofmt did on the files in the vendor folder.

@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 8, 2017

Did you find the file CONTRIBUTING.md, and especially the section about the development environment?

Most integration tests can be run locally by running gb test, Travis runs a few extra commands to e.g. check for proper gofmt, you can check that locally by running:

$ find src -name '*.go' -exec gofmt -d '{}' ';'

@fd0 fd0 referenced this pull request Feb 8, 2017

Closed

Support hard links #152

@fd0
Copy link
Member

fd0 left a comment

So, next round of comments is done. Just a few minor things, mainly the tests needs an additional check :)

@@ -37,5 +37,15 @@ func (e *dirEntry) equals(other *dirEntry) bool {
return false
}

if stat.Nlink != stat2.Nlink {
fmt.Fprintf(os.Stderr, "%v: Number of links doe not match (%v != %v)\n", e.path, stat.Nlink, stat2.Nlink)

This comment has been minimized.

@fd0

fd0 Feb 8, 2017

Member

Typo: "doe not match" -> "do not match"

This comment has been minimized.

@jgfrm
restoredir := filepath.Join(env.base, fmt.Sprintf("restore%d", i))
t.Logf("restoring snapshot %v to %v", snapshotID.Str(), restoredir)
testRunRestore(t, gopts, restoredir, snapshotIDs[0])
Assert(t, directoriesEqualContents(env.testdata, filepath.Join(restoredir, "testdata")),

This comment has been minimized.

@fd0

fd0 Feb 8, 2017

Member

This test is not enough: It only tests that the link count for each item in the the restored directory is the same as in the archived directory. This only verifies that files which were hardlinks before, are hardlinks afterwards, but does not test whether the same files are hardlinks of each other.

Suppose we have the files A and B, and the files a, which is a hard link of A and b, which is a hard link of B: A <-> a and B <-> b.

The test will succeed for A <-> a and B <-> b and also for A <-> b and B <-> a, because only the link count is verified.

I think we need to make a list before and after backup and check that the right files are linked together.

Do you agree?

This comment has been minimized.

@jgfrm
func (node Node) createFileAt(path string, repo Repository) error {
func (node Node) createFileAt(path string, repo Repository, idx *HardlinkIndex) error {
if node.Links > 1 {
if idx.Has(node.Inode, node.Device) {

This comment has been minimized.

@fd0

fd0 Feb 8, 2017

Member

You can combine the two if.

This comment has been minimized.

@jgfrm

jgfrm added some commits Feb 9, 2017

jgfrm
jgfrm
jgfrm
jgfrm
jgfrm
jgfrm
jgfrm
jgfrm
jgfrm
@fd0

fd0 approved these changes Feb 10, 2017

Copy link
Member

fd0 left a comment

Looks good, thanks a lot for your work!

fd0 added a commit that referenced this pull request Feb 10, 2017

Merge pull request #763 from jgfrm/issue25
Support hard links
@fd0

This comment has been minimized.

Copy link
Member

fd0 commented Feb 10, 2017

Merged in 6300c8d, congrats :)

@fd0 fd0 closed this Feb 10, 2017

func (node Node) createFileAt(path string, repo Repository) error {
func (node Node) createFileAt(path string, repo Repository, idx *HardlinkIndex) error {
if node.Links > 1 && idx.Has(node.Inode, node.Device) {
err := fs.Link(idx.GetFilename(node.Inode, node.Device), path)

This comment has been minimized.

@ThomasWaldmann

ThomasWaldmann Mar 21, 2017

Contributor

if you do a partial extract and the "hardlink master" was not extracted, you can not link to it.

This comment has been minimized.

@fd0

fd0 Mar 21, 2017

Member

Indeed, but then the file is extracted normally, because idx does not contain the other file in this case.

This comment has been minimized.

@ThomasWaldmann

ThomasWaldmann Mar 21, 2017

Contributor

ah, ok. and the current item not only has the reference back to the "hardlink master", but also the list of chunks, so it can be normally extracted?

This comment has been minimized.

@fd0

fd0 Mar 21, 2017

Member

Exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment