Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix path parser in add command #1933

Merged
merged 1 commit into from
Dec 22, 2015
Merged

Fix path parser in add command #1933

merged 1 commit into from
Dec 22, 2015

Conversation

djdv
Copy link
Contributor

@djdv djdv commented Nov 2, 2015

Parse paths with filepath instead of just path
#1922

@jbenet jbenet added the backlog label Nov 2, 2015
@djdv djdv mentioned this pull request Nov 2, 2015
@jbenet
Copy link
Member

jbenet commented Nov 2, 2015

this LGTM -- cc @rht this should be changed in dev0.4.0 too.

@jbenet
Copy link
Member

jbenet commented Nov 2, 2015

can someone else with windows confirm this works? (we dont yet have windows CI)

@jbenet
Copy link
Member

jbenet commented Nov 2, 2015

we need a windows CI -- ipfs/infra#110

@rht
Copy link
Contributor

rht commented Nov 3, 2015

@djdv what about the case for ipfs add X:\path\to\file1 X:\path\to\file2 ...? This case is not in test yet. If you can change https://github.com/ipfs/go-ipfs/blob/master/commands/cli/parse.go#L387 to filepath, that would be great.

Also, optionally, if you can possibly refactor filepath.Split -> filepath.Base, if you grep the code with '_, name :=.*split'.

How many test in windows fail before/after the PR?

@djdv
Copy link
Contributor Author

djdv commented Nov 3, 2015

ipfs add X:\path\to\file1 X:\path\to\file2
Adds both/n files and then creates a directory hash for them. The files contain the full path as their name though which does not seem to be the intended behavior. I.e. "/ipfs/

/X:\path\to\file1" instead of "/ipfs//file1".
It seems like only files underneath a directory are being formed correctly with the change. I'll have to look into this and compare with the behavior on other platforms, I'm not sure if that directory hash is intended to exist on multiple file adds without the -w argument, need clarification on that.

That's the only instance of .Split in add.go so I've changed it to be .Base.

There appears to be one less failed test with this change however I don't think I'm getting the count right, I'm running go test -tags nofuse "./..." | tee test.log in go-ipfs and then doing a grep -o FAIL test.log | wc -l
With that I'm getting a count of 35 (before) and 34(after).

@rht
Copy link
Contributor

rht commented Nov 3, 2015

@djdv

It seems like only files underneath a directory are being formed correctly with the change.

You're describing the #1878 case, which should be fixed by path -> filepath https://github.com/ipfs/go-ipfs/blob/master/commands/cli/parse.go#L387. Notice that Base only returns one argument, and should fix the ci test as well.

That's the only instance of .Split in add.go so I've changed it to be .Base.

Right, though I was referring to the entire codebase. There are two more in
https://github.com/ipfs/go-ipfs/blob/master/core/corehttp/gateway_handler.go#L164 and https://github.com/ipfs/go-ipfs/blob/master/core/coreunix/add.go#L139. But only modify the corehttp/gateway_handler.go one, because coreunix/add.go has been refactored in dev0.4.0. I will finish this part of your PR in dev0.4.0 branch.

I'm running go test -tags nofuse "./..." | tee test.log in go-ipfs and then doing a grep -o FAIL test.log | wc -l
With that I'm getting a count of 35 (before) and 34(after).

There is no windows option in travis yet, so we need a real and breathing windows machine for now.
This test.log for before/after will be super useful if you post it on an ipfsbin.
Also if you don't mind to include the output to the sharness test as well? Just run TEST_VERBOSE=1 make in test/sharness dir.

@rht
Copy link
Contributor

rht commented Nov 3, 2015

It seems like only files underneath a directory are being formed correctly with the change.

Looks like this statement referred to the test you did before the Split -> Base refactor? Which means the filepath doesn't fix the relative path issue?

@rht
Copy link
Contributor

rht commented Nov 3, 2015

If this is fixed, will close all #1676, #1730, #1878, #1922 at once.

@djdv
Copy link
Contributor Author

djdv commented Nov 3, 2015

@rht
I wouldn't yet, there seems to still be issues with some of the filenames, particularly ones not under a directory. Gathering test logs and writing up examples now.

@djdv
Copy link
Contributor Author

djdv commented Nov 3, 2015

@rht

Notice that Base only returns one argument, and should fix the ci test as well.

Woops, I'm being a bit hasty.

Here are the logs:
Origin (f36ada8)
http://ipfsbin.xyz/#QmRRk8djHj7sftpqnJn8i9McMJ2wzXJFhqVj86KhjT1Qxa
Fork (e849262)
http://ipfsbin.xyz/#QmRCdMWtR8HFcVxh5EVLGH7eGa6EgPz2zt1RVeWvbZbigE
Error count went down to 33. If any other tests are needed let me know.

Also if you don't mind to include the output to the sharness test as well?

Due to the way Windows cmd handles "*.go" I can't run those make files "natively" but I can inside of mingw bash, however I don't know how to pass the "nofuse" build constraint to the go tool unless I edit the makefiles.

Looks like this statement referred to the test you did before the Split -> Base refactor? Which means the filepath doesn't fix the relative path issue?

That seems to be the case, 4c300d9 - e849262 only fixed filenames for files under a directory. Explore the hashes below and compare with the tree in #1922 to see what I mean.

ipfs revision, command, output as ipfs hash

f36ada8 :
ipfs add -r "windows directory"
QmYtNSri8H4wstyaEwHe9gWRM1cSADCNNJBNcw3mtQuB2T
ipfs add -r "J:\demo\ipfs\windows directory"
QmfPYceFpW2JNJcN8NpyncM7vT8tkasnd7pG5A6xQs7RUd

e849262 :
ipfs add -r "windows directory"
QmVu8MgHzuF7asFYx8VxsXXt1QxPS3GYu3QBx17pWFg9Mr
ipfs add -r "J:\demo\ipfs\windows directory"
QmVu8MgHzuF7asFYx8VxsXXt1QxPS3GYu3QBx17pWFg9Mr

Notice the level 3 files appearing in the root of the structure and the filenames of those files, the ones inside of the actual directory seem to be fixed though compared to before. I'm not sure what's causing that to happen.

-Edit-
More examples:
These are the same on both revisions
J:\demo\ipfs>ipfs add "Listing 1.PNG" "Listing 1B.PNG" "Listing 2.PNG" Listings.png
QmS4djeiuH1HDRFfnu4Lo3ebvd8382LSL4DUr9qyJgy8Gs
J:\>ipfs add "J:\demo\ipfs\Listing 1.PNG" "J:\demo\ipfs\Listing 1B.PNG" "J:\demo\ipfs\Listing 2.PNG" J:\demo\ipfs\Listings.png
QmbJkHK4WGGDUcQpJ9hTP76sFwdHd6TEEtX2i6GLvANMkr

-Edit 2015-11-07-
The previous hash for revision e849262 (QmYnEVf8mS3d1yQFMXrgopvaimyQm53tV6x9YEeJCoJXh7) is NOT correct. The hash matches the other command, the above has been edited to reflect this.

@rht
Copy link
Contributor

rht commented Nov 4, 2015

I checked the log, the only path error is
\tassets_test.go:40: asset ..\\vendor\\dir-index-html-v1.0.0\\dir-index.html: could not read vcs file: open ..\\vendor\\dir-index-html-v1.0.0\\dir-index.html: The system cannot find the path specified.
It appears that ioutil.ReadFile doesn't parse '..' in windows.
There are merkledag key errs as well.
The rest (blockservice, bitswap, repo lock, routing ...) was reported in #959.

To summarize the filename cases:

  1. ipfs add -r pth/to/file1 ... when the files are within $PWD and the arg doesn't contain abs path.
    symptom: duplicates are created
    fixed by using filepath (this PR for now)
  2. ipfs add -r /abs/pth/to/file1 ... when the files are specified with absolute path
    symptom: filenames contain absolute path
  3. ipfs add -r ../file
    symptom: filename still contains '../' prefix (Path sanitization #1878)

@djdv https://github.com/ipfs/go-ipfs/blob/master/commands/files/serialfile.go#L72 is where each file's filename is created during the recursive add. I wonder if applying fp.Base to the second arg of fp.Join in fileName := fp.Join(f.name, stat.Name()) could fix the problem in case 2 and 3.

Edit: s/either or both of the/the second/

@djdv
Copy link
Contributor Author

djdv commented Nov 4, 2015

@rht
Unfortunately that change doesn't seem to have fixed the issue, I'll have to step through the process at some point to see what's going on where. My knowledge of the project and its structures is low though so continued assistance is appreciated!

In addition I'm getting an inconsistent error count on my tests, it seems I am getting between 33-38 failures on the same revision(both in the origin and this fork) , I'll have to log several runs and find the discrepancies, unfortunately the tests take some time on my machine.

@rht
Copy link
Contributor

rht commented Nov 6, 2015

@djdv I went through the ipfs add process and couldn't find where else the absolute path could have remained.

I found a small bug, that https://github.com/ipfs/go-ipfs/blob/master/commands/cli/parse.go#L360 needs to be wrapped with filepath.Clean, i.e. fpath := filepath.Clean(inputs[0]). If you can add this.
See http://plan9.bell-labs.com/sys/doc/lexnames.html.
Also, filepath.Clean parses ipfs add -r "" as ipfs add -r ..
It likely doesn't fix the '../' case.

If only there are further cases to characterize the err. Whether it is caused by the cli parse, or the recursive add.

Yes, the test failures in #959 were reported to vary as well.

@djdv
Copy link
Contributor Author

djdv commented Nov 7, 2015

Latest commit is just a squashed version of all previous commits with the addition of cleaning the input along with a change related to gateway_handler.go. A previous commit used filepath instead of regular path in that file, this was reverted because it caused the gateway to return incorrect ipfs paths.

serialfile.go is unaltered for now since the suggested change seems to have no effect, further testing required.

Worth noting is the edit to a comment of mine above, edited 2015-11-07 the previous (as well as this) commit resulted in the same hash for both the relative and absolute command which is a step in the right direction.

As it stands right now for the directory structure in #1922 we get the following hash on Windows:
QmVu8MgHzuF7asFYx8VxsXXt1QxPS3GYu3QBx17pWFg9Mr
The intended hash and structure is (taken from ipfs running on FreeBSD) :
QmPbRRZjtJgfX25tJzYUPceQXxUX9tWb5eRqFSMChLUKKy

Issue is still unresolved, this is just a status update mainly to show the target result/hash for a recursive add.

@rht

If only there are further cases to characterize the err. Whether it is caused by the cli parse, or the recursive add.

I intend to run the binary through GDB sometime soon to see what values are at various times as well as see the code paths that are traversed. I worry I may have issues with this approach on Windows but we'll see, if there is a better approach than this I'd like to know.

@djdv
Copy link
Contributor Author

djdv commented Nov 8, 2015

No luck with GDB with Go files on Windows (segfaults on everything).

It looks like there may be a problem with the creation of the file object itself and the names being passed in, if not here it is somewhere else further down the chain. Issue is still unresolved and this commit breaks recursive adding on other platforms, it's looking closer to target here so more testing required.
Current hash on Windows:
QmRuhkiQ6DNwqLKmv4freHdvGet1Xg5aaHhNRgJLUEJdJv

The accidental extra line will be squashed in the next commit.

Worth noting is that the same call is made in the unix adder as well.
https://github.com/ipfs/go-ipfs/blob/master/core/coreunix/add.go#L51

-Edit-
serialfile.go should be unchanged actually, only the arguments in calls to NewSerialFile should be cleaned, not cleaned inside the method itself. That commit accidentally calls Base on the same path.

@rht
Copy link
Contributor

rht commented Nov 8, 2015

@djdv in this case you may just fmt.Printf the info you need, e.g. in https://github.com/ipfs/go-ipfs/pull/1933/files#diff-bd0ca9f1e2d5c7ba128af01ec4c1131cL441, fmt.Printf("addDir: %s", filepath.Base(file.FileName())) and see where things went wrong.

Yes, there seems to be duplication of usage of filepath.Base in the PR.

From the latest hash, it appears that the fullpath has now been trimmed into name?

The remaining problem is duplication of files being added? This is as if https://github.com/ipfs/go-ipfs/blob/master/commands/files/serialfile.go#L34 lists the entire content of a directory recursively.

@djdv
Copy link
Contributor Author

djdv commented Nov 8, 2015

This commit is a squashed version of the previous commits with any experiments removed.
This should solve #1922 as well as all 3 cases mentioned by @rht above on Windows.
Prior to this commit ipfs add always acted as if -w was invoked, that is not to say that the actual wrap flag was set to true but that ipfs had been adding the directory contents then wrapping everything into its own node as well (regardless of if -w was specified or not).

While this commit fixes that issue as well, I am unsure as to why this behavior was happening to begin with or why this solves it. Outputting the paths and filenames at every stage during add on the same directory on 2 different platforms resulted in the same output excluding an additional erroneous directory hash on Windows which held the correct contents as well as some malformed duplicates (for example "

/level 2 A\level 3 file A.txt") .

Since I cannot step through the code on this platform currently, I made the assumption that the filenames need to use forward slashes to be parsed correctly while the filepaths needed to use the platform specific delimiters for reading etc. but I couldn't actually confirm that this was the reason.

Given the uncertainty I'll need someone who knows better than me to scrutinize the change as well as the ci. It builds and works on my machine as well as inside of a VM with the test directory and real data. After merging with the latest origin it builds and runs the same on FreeBSD as before the merge. The circleci failures seem to all be flagged with "# TODO known breakage".
pinging @jbenet @rht

@jbenet
Copy link
Member

jbenet commented Nov 8, 2015

@whyrusleeping want to CR this one?

@@ -69,7 +69,7 @@ func (f *serialFile) NextFile() (File, error) {
f.files = f.files[1:]

// open the next file
fileName := fp.Join(f.name, stat.Name())
fileName := fp.ToSlash(fp.Join(f.name, stat.Name()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of scope for this PR (probably) but I dont like this being fp here and filepath everywhere else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a likely cause of why forward slash is required here.
Because it is used in addNode in core/commands/add.go, which uses file.FileName() as an argument to InsertNodeAtPath.

https://github.com/ipfs/go-ipfs/blob/master/merkledag/utils/utils.go#L57 assumes a forward-slash-separated path.

@whyrusleeping
Copy link
Member

LGTM

@djdv
Copy link
Contributor Author

djdv commented Nov 9, 2015

@whyrusleeping
If there are additional changes to be made I'll make sure to use the full name for filepath rather than "fp" in the next commit, there doesn't seem to be any name conflict to warrant the shortname except maybe one variable "filePath" on 2 lines.

@rht
I see, are file paths expected to be standardized and stored as forward slash paths, or could/should this be changed to use filepath for parsing rather than the strings package? strings is used again for the path on line 101 as well for removing a link.

@rht
Copy link
Contributor

rht commented Nov 9, 2015

If the fp naming is to be purged, then there are 2 places where it is used: thirdparty/tar/extractor.go, commands/file/serialfile.go.
(@whyrusleeping on code cleanup in general (out of scope of this PR), there are several inconsistencies in package import naming as well, e.g. mdag/merkledag/dag for github.com/ipfs/go-ipfs/merkledag)

I initially thought standardizing toward forward slash is preferable (initial rationale: windows path appears only in ipfs add, and so can be standardized locally), but:

  1. '/' is strictly required only in the http handler part of the code.
  2. I recalled this was the initial choice in golang until path/filepath was written (e.g. there was a redundant function in os.Stat's (basename) where filepath.Base could have been used).
  3. strings.Split with hard-coded '/' is less 'semantic' than https://golang.org/pkg/path/filepath/#SplitList.
  4. the lib needs to be modular, e.g. dagutils are expected to be used various situations and so shouldn't assume the path separator.

+1 for filepath.SplitList instead of strings.Split.*/.

@rht
Copy link
Contributor

rht commented Nov 9, 2015

There are too many places in the code where hardcoded '/' is assumed, however.

I think a sufficient fix just for ipfs add is to standardize commands/cli/parse.go, leaving aside the full cross-platform path (e.g. in ipfs resolve, parts of the lib, etc) issue for now.

@djdv
Copy link
Contributor Author

djdv commented Nov 9, 2015

@whyrusleeping
I can cleanup the fp names in the suggested files as a separate commit inside this PR if that's desired or leave it be if there's an ongoing cleanup initiative.

@rht
In relation to the slash issue, translating the path from back slashes to forward slashes in the call to NewSerialFile everywhere except in serialfile.go results in the "implied and incorrect -w" issue mentioned before where it's inserting duplicates in the root (the root that was not asked for), I'm still actually unsure as to why that particular issue is happening, without a way to step through it's a little hard to diagnose so I could be missing something obvious. I tried taking the input in parse.go and converting it to forward slashes before it hits anything else after it gets Clean()'d. It may have something to do with names returned by an os.FileInfo somewhere else but I'm not actually sure.

The attached image may better explain the -w thing I'm talking about.
ipfs add w issue

@rht
Copy link
Contributor

rht commented Nov 9, 2015

Understood the -w case.

Just to make sure:

  1. I tried taking the input in parse.go and converting it to forward slashes before it hits anything else after it gets Clean()'d.

  2. Inserting path = filepath.ToSlash(path) before https://github.com/ipfs/go-ipfs/blob/master/core/commands/add.go#L365

Both fix the fullpath, '../', file duplication bugs, but not the -w bug?

@djdv
Copy link
Contributor Author

djdv commented Nov 9, 2015

Partially. When those fixes are applied we generate and get the correct hash but we also get an additional unrequested hash, the files and names inside the target hash are fine but not in the unwanted hash. In addition when -w is passed we get the same outcome.

To put that another way when those 2 things are applied we get QmPbRRZjtJgfX25tJzYUPceQXxUX9tWb5eRqFSMChLUKKy which is the target as well as QmVu8MgHzuF7asFYx8VxsXXt1QxPS3GYu3QBx17pWFg9Mr which is unwanted AND contain malformed filenames such as "/level 2 A\level 3 file A.txt".
When -w is supplied the target hashes include the previous target plus Qme1Pd6PjA6Gd7278LrXwiDL9NUmNNSQ7KHnAebttkHL5d which is the intended -w target.

The current PR commit will get the intended result in all cases (full, relative, ../relative, -w or not), it keeps the native path all the way up to here where it gets translated: https://github.com/ipfs/go-ipfs/pull/1933/files#diff-030475adc730fdd79d7d8c459a8a7ccfR72
The ToSlash is the thing that resolves the -w bug particularly.

Sorry for the confusion, it's hard to explain when I don't know the root cause of it myself. There may be another method that solves this in other places that may or maynot be preferable. My proposal seems like a hacky fix at the end of the code path rather than a proper solution in the middle.

@rht
Copy link
Contributor

rht commented Nov 9, 2015

Confirming, so at least '/'-standardizing the commands/cli/parse.go does fix all the naming issues.
Worth checking is the !wrap condition to get the RootNode in https://github.com/ipfs/go-ipfs/blob/master/core/commands/add.go#L342.

It would better to investigate the underlying cause so that the code can be further cleaned/clarified, but since the PR already works, LGTM.

@djdv
Copy link
Contributor Author

djdv commented Nov 9, 2015

Agreed, it seems like the best course of action would be to implement this now so long as it appears fine to everyone and does not cause conflicts on other platforms/tests. This would be good for Windows users in the short term, in the future someone else or I can reassess this and look for the underlying issue if it doesn't get resolved naturally through refactoring other parts of the project.

As for code cleanup I'll leave that alone since it's not introduced here and would be better as a separate broader request on its own.

@jbenet
Ready to be merged if no objections.

@rht
Copy link
Contributor

rht commented Nov 22, 2015

Yes, rebase would be better in any situation regardless whether the osx-sharness will fail or not.

ping @djdv

@djdv
Copy link
Contributor Author

djdv commented Nov 22, 2015

Rebased and pushed. Still getting failures on OS X, paging @jbenet @rht for direction. I'm unsure how to proceed with this one at the moment.

@rht
Copy link
Contributor

rht commented Nov 22, 2015

@djdv currently all PRs are thwarted by the travis-osx failures. // not your fault

@rht
Copy link
Contributor

rht commented Nov 23, 2015

Actually,

  • someone runs the sharness test on an osx to check if the test passes.

would do, too.

@djdv
Copy link
Contributor Author

djdv commented Nov 23, 2015

I'm out of town at the moment and won't be able to run tests on OS X or Windows until I return, if someone else wants to beat me to it I'd appreciate it.

@rht
Copy link
Contributor

rht commented Nov 24, 2015

osx sharness just passed https://travis-ci.org/ipfs/go-ipfs/builds/92901951 in #1975

@rht rht added the RFM label Nov 24, 2015
@jbenet
Copy link
Member

jbenet commented Nov 30, 2015

@rht im reluctant to merge this until it passes every time. the problems -- we think -- are not coming from this PR, but we should not be merging more things while we have random failures on osx -- we should be fixing the random failures first-- then rebasing and proving these indeed do not introduce more random failures.

Sorry @djdv -- we need to fix master first before i'd be comfortable merging this in. We could also use this time to setup a windows CI system -- @harlantwood any recs?

@rht
Copy link
Contributor

rht commented Nov 30, 2015

appveyor: #1982

@harlantwood
Copy link
Contributor

No idea on windows, sorry. I know @whyrusleeping dug into that a bit wrt Travis.

@djdv
Copy link
Contributor Author

djdv commented Nov 30, 2015

@jbenet
No worries, I'm all for reliable builds and certainly don't want to induce failures for other platforms. The reason I'm so eager to merge is so that I can point other people to a working official Windows binary that allows them to add content, some people don't seem to trust random binaries from an individual for some reason. ;)

@rht
Copy link
Contributor

rht commented Dec 1, 2015

The windows test has plenty goroutine errs. It is not sensible at all to merge this after all those goroutine errs have been eliminated. #1933 (comment). I thought this is pretty clear?

Travis doesn't have windows. The appveyor windows CI is added in #1982. To get this rolling (if want soon), it needs someone with go-ipfs admin access to enable the webhook.

@rht
Copy link
Contributor

rht commented Dec 1, 2015

(...not sure of what kind of bureaucracy is going on here)

License: MIT
Signed-off-by: Dominic Della Valle <ddvpublic@gmail.com>
@djdv
Copy link
Contributor Author

djdv commented Dec 22, 2015

@jbenet
Checking in for a status update.

@ghost
Copy link

ghost commented Dec 22, 2015

@djdv we're all half on holiday at the moment (especially jbenet), we'll get to it right when we're back in january -- thanks for your patience!

@djdv
Copy link
Contributor Author

djdv commented Dec 22, 2015

@lgierth
No rush, just checking.
Happy holidays all. ⛄

@whyrusleeping
Copy link
Member

The OSX failures here are unrelated, this has been waiting around for too long, merging.

whyrusleeping added a commit that referenced this pull request Dec 22, 2015
Fix path parser in add command
@whyrusleeping whyrusleeping merged commit e007d1e into ipfs:master Dec 22, 2015
@jbenet jbenet removed the backlog label Dec 22, 2015
@djdv
Copy link
Contributor Author

djdv commented Dec 22, 2015

@whyrusleeping
Much appreciated, I look forward to seeing more Windows users adding content now.

For the sake of completeness I just want to reference @rht's comment: #1933 (comment)
These people may be interested in this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFM topic/commands Topic commands
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants