Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow nested paths for multipart upload #2951

Merged
merged 1 commit into from
May 18, 2022
Merged

Conversation

aloknerurkar
Copy link
Contributor

@aloknerurkar aloknerurkar commented May 12, 2022

Checklist

  • I have read the coding guide
  • My change requires a documentation update and I have done it
  • I have added tests to cover my changes.

Description

Motivation and context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):


This change is Reviewable

@AuHau
Copy link
Contributor

AuHau commented May 12, 2022

This LGTM, but if I correctly recall, @acud was concerned about making sure this works correctly, so maybe add some tests that the nested paths are correctly handled?

@aloknerurkar
Copy link
Contributor Author

This LGTM, but if I correctly recall, @acud was concerned about making sure this works correctly, so maybe add some tests that the nested paths are correctly handled?

So the tests are enabled in this PR. The tests upload using both tar and multipart once the doMultipart is set.

The problem was w.r.t tar, by virtue of the tar format being correct we know that the directory structure is valid. In the case of multipart, users can specify incorrect paths and there is no way to actually verify this. With this PR, if used correctly the feature will be enabled. If this assumption is okay we dont need any more changes I believe.

@AuHau
Copy link
Contributor

AuHau commented May 12, 2022

Ok sounds good to me.

Regarding bad paths. To my understanding, we support UNIX-like paths in Bee, right? If so, then I think there are a few checks that can be done here, which in the tar upload use-case are most probably handled by the tar implementation itself, but with multipart are not treated. Things like:

  • calling path.Clean on the path
  • verification if it is a relative or absolute path - what should be the required format?
  • verification if the path does not contain invalid characters - see for example here
  • verification if the folder/file names are not longer than 255 characters

And maybe more, these are just from top of my head.

@aloknerurkar
Copy link
Contributor Author

Ok sounds good to me.

Regarding bad paths. To my understanding, we support UNIX-like paths in Bee, right? If so, then I think there are a few checks that can be done here, which in the tar upload use-case are most probably handled by the tar implementation itself, but with multipart are not treated. Things like:

  • calling path.Clean on the path
  • verification if it is a relative or absolute path - what should be the required format?
  • verification if the path does not contain invalid characters - see for example here
  • verification if the folder/file names are not longer than 255 characters

And maybe more, these are just from top of my head.

So this was the problem why we disallowed the multipart upload in the first place. Handling all the corner cases would be too many checks in my opinion. The manifest is a key-value store and should work fine for any unique path names. This way if there are any developer-induced errors for some reason, this would manifest when they try to access the files. If the feature is used correctly everything should work.

If we are okay with this we can use this change as is. I am not really sure what are all the cases covered by the tar format. But since it needs a filesystem to create, we can rest assured it is some valid structure. Unlike this case.

@acud
Copy link
Member

acud commented May 13, 2022

So I agree with @aloknerurkar that it can be a bit more tricky than it might seem in the first place.

Don't forget that stuff like /somefolder/././././././somesubfolder/.././././somesubfolder/somefile.html is a valid *nix path too. It would be good to follow something more concrete than a linux Q&A website.

Not sure what is the question/input you need from me now though. Is it about how to go about creating further test coverage?

@AuHau
Copy link
Contributor

AuHau commented May 16, 2022

Sooo, this is up to you guys if you want to merge it in the current form, but here are my 5 cents.
While yes Swarm can be used to store generic data, it is still very much related to the file system's concepts like files, folders, paths and it support these and uses these in its terminology. I think it is generally expected to follow *NIX style paths in project like this. I think it is quite important to stay consistent in the rules that applies for those, because as @acud mentioned in the PR you can have for example some/path and /some/./path which suddenly points to the same file and this should be handled. And dismissing these problems with saying that "we use manifests which are key-value store" is IMHO shortsighted, because the user will expect the behavior of paths and folders as we are using this terminology. If we would like to be free of these restrictions different names should had been chosen for these concepts.
While I agree that there might be some weird edge-cases that we might not cover, I still think it is fairly simple to cover like 80% of those with the few rules that I have wrote down in the PR and they are easy to implement.
My prediction is that if we won't do input sanitation (which is a basic think to do) we can expect fairly soon bug reports like "file after uploading disappeared" or something similar.
Btw. I would like to work at some point on hack project writing FUSE mount for Swarm and then that is another part where proper path handling is essential and I have no idea how I would handle manifests with invalid paths

@AuHau
Copy link
Contributor

AuHau commented May 18, 2022

So the outcome of our API Guild call yesterday is that bzz endpoint does not follow any conventions and as such does not require any path validation or similar. Therefore it is good to merge as-is.

@aloknerurkar aloknerurkar merged commit 54eb69e into master May 18, 2022
@aloknerurkar aloknerurkar deleted the dirupload.0 branch May 18, 2022 11:50
@istae istae added this to the 1.6.1 milestone Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants